NEW: OpenAI Structured Outputs with Label Studio
Contact Sales

Automate Data Labeling with HumanSignal

We just released exciting functionality that could transform the way your data science teams work: fully-automated labeling powered by LLMs and our new Prompts interface!

While the idea of using LLMs to label data is not new, data labeled by generative AI tends to be less accurate and reliable than human-labeled data, which results in worse model performance when the machine-labeled datasets are used for training or fine-tuning. However, the cost and time savings of labeling large datasets automatically is too enticing to ignore.

That’s why we’ve developed a new LLM-powered solution to enable the efficiencies of automated labeling, while ensuring high quality labeled data: Prompts. This new tool allows you to build, test, and iterate on prompts to accurately label large-scale datasets using real-time quality metrics based on ground truth data. Or you can forgo ground truth creation and simply use the Prompts interface with constrained generation and optional human-in-the-loop workflows to bootstrap labeling projects or for fast prototyping.

Once you’re comfortable with the performance of the prompt, you can use it to label massive datasets quickly and accurately.

The additional benefit is that you can do all this within a streamlined workflow that ensures you don’t have to spend a bunch of time and effort bouncing around between tools, creating custom scripts, exporting, importing, and converting data to make sure it’s usable. Instead, you can do all of this in a single platform that will ingest, search, label, and export your data in the format you need.

To get a sense for exactly how this could benefit your labeling efforts, see what our global manufacturing customer Geberit was able to accomplish using the new Prompts interface for auto-labeling:

  • 5x faster labeling throughput
  • 95% annotation accuracy against ground truth at scale
  • 4-5x cost savings vs. manual and semi-automated labeling efforts

You can read the entire case study here.

How Automated Labeling with HumanSignal Works

What does the Prompts workflow look like in more detail? We’ll demonstrate the workflow using the example of an internal tools team at a large tech enterprise that has been asked to build a system to find the best news article summaries for a busy CTO, Taylor. In this example, we'll start with ground truth data, but it's possible to skip that step and simply use the Prompts workflow to bootstrap a new project.

Taylor is managing multiple research and engineering efforts, but still wants to stay updated with the latest trends and innovations. They don't have time to sift through the 10 papers they subscribe to each day but still want to see article highlights that present information in a way they find useful. Let's figure out how to automatically predict the best summary according to their custom preferences.

1. Collect Initial Data and Set Ground Truth
The process starts by collecting feedback on a sample set of data. For instance, have Taylor provide preferences for three summaries on 20 articles. This feedback will serve as a foundation for building the system. This can easily be done in Label Studio’s labeling interface. Once these summaries are labeled with Taylor’s preferences, those summary preferences can be used as ground truth.

2. Create a New Prompt
In Label Studio, select your project and the appropriate problem type (e.g., text classification). Based on the selected project’s label config, the classes are automatically detected. In this case, each class corresponds with Summary 1, Summary 2, and Summary 3. 

3. Conduct Initial Prompt Evaluation
With the Prompts tool, select a base model and write a simple, initial user prompt. For instance, ask the LLM (e.g., GPT-3.5 Turbo) to return the number of the best summary. Then assess the performance of this basic prompt against the ground truth using the provided metrics and reports. With this first pass, the model’s labeling accuracy may be relatively low (e.g., 32%) but can be improved with the following couple steps. It’s worth noting that steps 4- 5 are optional. It’s very possible to bootstrap labels without the metrics and iteration, by changing the task subset option (default ‘Ground Truth’) in the dropdown.

4. Iterate and Improve the Prompt
Enhance the prompt by adding specific context and detailed instructions, such as Taylor’s job responsibilities and preferences, and the desired tone of the summaries. Save the updated prompt and evaluate its performance again. The goal is to achieve a consistent improvement in accuracy (e.g., from 32% to 70%).

5. Further Prompt Refinement
Rinse and repeat with additional prompt improvements. Consider adding few-shot examples or additional domain-specific context to further improve the prompt’s accuracy. This iterative refinement helps in honing the prompt for better performance.

6. Automate Predictions
Once you are satisfied with your prompt’s performance against your ground truth dataset, upload all the data you want to automatically label into your Label Studio Project. Within the Prompts tool, you can use your refined prompt to generate predictions for the entire dataset with the click of a button. This step applies the prompt to all tasks, not just the initial sample set.

7. Validate and Export Predictions
Ensure that the predictions align with the target preferences (e.g. Taylor’s preferences) after auto-labeling and then export the data into whatever format you require using Label Studio’s export functionality.

8. Replicate and Scale
Once the prompt is refined and validated, you can further streamline your operations by replicating this workflow for other users and projects within your organization.

By following these steps, you can effectively leverage prompt engineering to build a system that aligns with specific human preferences, automate the labeling process, and ultimately provide valuable insights to your target audience. This workflow not only improves efficiency but also ensures that your model can deliver the necessary performance and results.

This feature is currently available within the HumanSignal Platform. If you’re interested in seeing how this feature could benefit and speed your machine learning efforts, schedule a demo - we’d love to show you how helpful this can be to your ML efforts.

Related Content