NLP Autolabeling with Label Studio Prompts

Natural language processing (NLP) tasks encompass a wide range of use cases, from extracting key information within text to classifying content or generating summaries. These tasks are foundational in applications such as analyzing customer feedback, automating document processing, or improving conversational AI systems.

Large Language Models (LLMs) are revolutionizing how we tackle data annotation, offering speed and efficiency that were previously unattainable. With Label Studio's Prompts, you can now handle multiple NLP annotation types for a single data task in a unified Label Studio workflow.

It is often challenging to ensure quality when directly using an LLM to label data. This is where Label Studio helps you maintain both efficiency and quality. You can compare your prompt’s performance to ground truth, manage different prompt versions, and use a human-in-the-loop workflow to ensure data quality remains high.

Let’s dive into how you can leverage these capabilities.

Multi-Step Annotation: One Prompt, Many Possibilities

Imagine you’re working on a project involving product reviews. For each review, you want to extract named entities, perform two types of classification, and generate free-text reasoning. Traditional workflows might require manual annotations for each task, but with Prompts in Label Studio, you can consolidate all these tasks into a single Prompt.

This not only speeds up the annotation process but also reduces redundant work for human annotators, who will now be empowered to enforce label quality.

Step-by-Step Walkthrough

Before we dive into the written tutorial, here's a quick video that walks you through each step:

Step 1: Setting Up Your Prompt

To begin, navigate to the Prompts page via the hamburger menu. Click Create Prompt and name your new prompt (e.g., Demo Prompt: Product Reviews). Next, link it to your target project.

For this tutorial, we’ll use a project called Demo Project: Product Reviews. Once selected, Label Studio automatically populates the expected outputs for each control tag based on your project’s labeling configuration. These include:

Labels for entity extraction,
Choices for classification tasks, and
Reasoning areas for free-text generation.

Hit Create to finalize your Prompt setup.

Step 2: Writing and Running Your Prompt

Select the LLM you’d like to use. In our demo, we’ll opt for OpenAI’s GPT-4o-mini. Since Prompts is designed for iterative improvements, you don’t need to over-engineer your first attempt at writing the prompt. For instance, you can even start with:

"Given the text and title, answer all the questions."

Make sure to include any data variables you want to be used in the LLMs prediction. In our demo, we include {text} and {title} of the product review. Save the Prompt and run it on your sample of tasks.

Step 3: Reviewing Initial Results

Once the Prompt has executed, you’ll see a view of the LLM’s predictions. For any incomplete or inaccurate annotations, you can click into the task row for a deeper inspection.

For example, in one task:

NER: It seemed to accurately identify the “animal” entity twice
Classification: Labeled the sentiment as "negative" even though the product review mentioned the product is “great”
Reasoning: Highlighted product disappointment but lacked precision in wording.

These results provide a baseline to evaluate your prompt’s initial performance.

Pro Tip! If you have ground truth annotations (https://docs.humansignal.com/guide/quality#Define-ground-truth-annotations-for-a-project ) in your Project, you can actually run inference against ground truth, to get a side-by-side comparison of the ground truth vs. LLM prediction, and metrics like accuracy, precision, recall, and F1 Score.

*Results from a Prompt run on a Sample of Project Tasks*

*Quickview after clicking into a task row*

*Results from a Prompt run on Ground Truth Tasks*

Step 4: Iterate and (Auto) Enhance Your Prompt

You can create new Prompt versions to compare predictions across different prompts and LLMs. We also have an Enhance Prompt option, to suggest improvements for you! Click on Enhance Prompt, select your LLM (e.g., GPT-4o) and apply enhancements on your project tasks.

Label Studio will:

Suggest prompt enhancement, explaining the changes it’s made.
Provide a new Prompt based off of its suggestions

For example, your enhanced prompt might now include:

Explicit instructions like “Given the product review and title, perform the following tasks...”
Detailed output formats for clarity

*Enhanced Prompt saved as new Prompt Version*

Step 5: Comparing and Iterating

Run the enhanced prompt and compare its outputs against the initial results. If ground truth data is available, Label Studio will provide a side-by-side comparison, helping you measure improvements quantitatively.

With each iteration, you’ll see how the prompt enhancement changes the predictions and alignment with ground truth.

Step 6: Human-in-the-Loop

Once you’re satisfied with the performance of your Prompts, you can easily scale up and generate pre-annotations for all tasks in your Project. These pre-filled fields give annotators a head start, allowing them to more quickly review and accept accurate predictions, or correct them as needed. This shifts their role from manual labeling to focused reviewing, so they can spend more time on challenging or ambiguous tasks. By reducing repetitive work, you streamline the process and boost both speed and accuracy.

Unlock NLP Annotation at Scale

Label Studio's new Prompts functionality enables you to consolidate complex NLP tasks into a single workflow powered by LLMs. Whether it’s entity extraction, classification, reasoning generation, or more, you can now: