Natural language processing (NLP) tasks encompass a wide range of use cases, from extracting key information within text to classifying content or generating summaries. These tasks are foundational in applications such as analyzing customer feedback, automating document processing, or improving conversational AI systems.
Large Language Models (LLMs) are revolutionizing how we tackle data annotation, offering speed and efficiency that were previously unattainable. With Label Studio's Prompts, you can now handle multiple NLP annotation types for a single data task in a unified Label Studio workflow.
It is often challenging to ensure quality when directly using an LLM to label data. This is where Label Studio helps you maintain both efficiency and quality. You can compare your prompt’s performance to ground truth, manage different prompt versions, and use a human-in-the-loop workflow to ensure data quality remains high.
Let’s dive into how you can leverage these capabilities.
Imagine you’re working on a project involving product reviews. For each review, you want to extract named entities, perform two types of classification, and generate free-text reasoning. Traditional workflows might require manual annotations for each task, but with Prompts in Label Studio, you can consolidate all these tasks into a single Prompt.
This not only speeds up the annotation process but also reduces redundant work for human annotators, who will now be empowered to enforce label quality.
Before we dive into the written tutorial, here's a quick video that walks you through each step:
To begin, navigate to the Prompts page via the hamburger menu. Click Create Prompt and name your new prompt (e.g., Demo Prompt: Product Reviews). Next, link it to your target project.
For this tutorial, we’ll use a project called Demo Project: Product Reviews. Once selected, Label Studio automatically populates the expected outputs for each control tag based on your project’s labeling configuration. These include:
Hit Create to finalize your Prompt setup.
Select the LLM you’d like to use. In our demo, we’ll opt for OpenAI’s GPT-4o-mini. Since Prompts is designed for iterative improvements, you don’t need to over-engineer your first attempt at writing the prompt. For instance, you can even start with:
"Given the text and title, answer all the questions."
Make sure to include any data variables you want to be used in the LLMs prediction. In our demo, we include {text} and {title} of the product review. Save the Prompt and run it on your sample of tasks.
Once the Prompt has executed, you’ll see a view of the LLM’s predictions. For any incomplete or inaccurate annotations, you can click into the task row for a deeper inspection.
For example, in one task:
These results provide a baseline to evaluate your prompt’s initial performance.
Pro Tip! If you have ground truth annotations (https://docs.humansignal.com/guide/quality#Define-ground-truth-annotations-for-a-project ) in your Project, you can actually run inference against ground truth, to get a side-by-side comparison of the ground truth vs. LLM prediction, and metrics like accuracy, precision, recall, and F1 Score.
You can create new Prompt versions to compare predictions across different prompts and LLMs. We also have an Enhance Prompt option, to suggest improvements for you! Click on Enhance Prompt, select your LLM (e.g., GPT-4o) and apply enhancements on your project tasks.
Label Studio will:
For example, your enhanced prompt might now include:
Run the enhanced prompt and compare its outputs against the initial results. If ground truth data is available, Label Studio will provide a side-by-side comparison, helping you measure improvements quantitatively.
With each iteration, you’ll see how the prompt enhancement changes the predictions and alignment with ground truth.
Once you’re satisfied with the performance of your Prompts, you can easily scale up and generate pre-annotations for all tasks in your Project. These pre-filled fields give annotators a head start, allowing them to more quickly review and accept accurate predictions, or correct them as needed. This shifts their role from manual labeling to focused reviewing, so they can spend more time on challenging or ambiguous tasks. By reducing repetitive work, you streamline the process and boost both speed and accuracy.
Label Studio's new Prompts functionality enables you to consolidate complex NLP tasks into a single workflow powered by LLMs. Whether it’s entity extraction, classification, reasoning generation, or more, you can now:
With Prompts, you’ll accelerate time to model development while maintaining high quality data, saving resources and unlocking business value sooner
Happy labeling!