Pre-Label Image Captions with Label Studio Prompts

Here’s something that's at the heart of modern computer vision applications: getting high-quality image datasets. Whether you're working on autonomous vehicles identifying road objects or medical imaging systems detecting anomalies, the success of your AI project depends on your training data.

The challenge lies in not just collecting these images, but accurately describing them in a way that captures the full context of the images. Image captioning has emerged as a core component in bridging computer vision and natural language understanding capabilities. Consider these practical image captioning use cases:

E-commerce platforms generating comprehensive product descriptions
Accessibility tools creating precise alt-text for visually impaired users
Content moderation systems accurately detecting policy violations

These applications require not just object recognition ("there is a cat") but also contextual understanding ("a ginger cat sleeping on a windowsill at sunset"). However, manual annotation workflows face several challenges when captioning at scale, including consistency between annotators, tradeoffs between speed and accuracy, and task complexity increasing annotation time significantly.Large Language Models (LLMs) are revolutionizing how we tackle these challenges, unlocking new capabilities in understanding and describing visual content.

With Label Studio's Prompts, you can now handle multiple image annotation types - from basic object classification to nuanced caption generation - in a unified workflow that maintains both efficiency and quality.

This workflow helps you:

Leverage the power of LLM-based prelabeling through a simple UI
Systematically evaluate LLM prompt performance against golden datasets
Integrate human-in-the-loop validation and correction
Scale annotation to large datasets at a fraction of the time and cost

Let’s dive in.

Prerequisites

Prompts is available on Label Studio Enterprise and Label Studio Starter Cloud. You can sign up for a 14-day free trial, or reach out to support for help in getting set up.
Project with compatible label configuration (supported object and control tags here).
Cloud storage or HTTP hosted image data

Multi-Step Annotation: One Prompt, Many Captions

Now, imagine you're working on a project involving image captioning. For each image, you want to classify objects (cat, dog, cow, horse, tree, house) and generate descriptive captions. Traditional workflows might require separate annotation steps, but with Prompts in Label Studio, you can consolidate all these tasks into a single Prompt.

This streamlined approach empowers annotators to focus on quality control rather than manual labeling of every image.

Step-by-Step Walkthrough

If you'd rather watch the video of the walkthrough, check it out below!

Step 1: Setting Up Your Prompt

To begin, navigate to the Prompts page via the hamburger menu. Click Create Prompt and name your new prompt (e.g., Image Captioning Prompt). Next, link it to your target project.

For this tutorial, we’ll use a project called Demo Project: Animal Picture Captions. Once selected, Label Studio automatically populates the expected outputs for each control tag based on your project’s labeling configuration. In our case, this includes:

Classification options for object detection
Text area for caption generation

Hit Create to finalize your Prompt setup.

Step 2: Writing and Running Your Prompt

Select the LLM you’d like to use. In our demo, we’ll opt for OpenAI’s GPT-4o. Since Prompts is designed for iterative improvements, start simple. For instance:

Given the image, identify any objects from the provided categories and generate a descriptive caption.
{image}

Make sure to include any data variables you want to to be used as context for the LLM (eg. “{image}”) in the LLMs prediction. Save the Prompt and run it on your sample of tasks.

Step 3: Reviewing Initial Results

Once the Prompt has executed, you’ll see a view of the LLM’s predictions. For any incomplete or inaccurate annotations, click into the task row for a deeper inspection.

For example, in one task:

Classification: cow; however there were also trees in the image
Caption: “A picturesque scene of cows grazing in a lush green meadow, with majestic mountains in the background under a clear blue sky.”

Pro Tip! If you have ground truth annotations in your Project, you can run inference against ground truth for a side-by-side comparison and metrics like accuracy, precision, recall, and F1 Score.

*Results from a Prompt run on a Sample of Project Tasks*

*Quickview after clicking in to a task row*

*Results from a Prompt run on Ground Truth Tasks*

Step 4: Iterate and (Auto) Enhance Your Prompt

You can create new Prompt versions to compare predictions across different prompts and LLMs, or use the Enhance Prompt option for automated improvements! Click on Enhance Prompt, select your LLM (e.g., GPT-4o) and apply enhancements on your project tasks.

Label Studio will:

Suggest prompt enhancement, explaining the changes
Provide a new Prompt based off of its suggestions

For example, your enhanced prompt might now include:

Input descriptions and few-shot examples
Guidelines for caption detail and structure

*Enhanced Prompt saved as new Prompt Version*

Step 5: Comparing and Iterating

Run the enhanced prompt and compare its outputs against initial results. With ground truth data, Label Studio provides side-by-side comparisons for quantitative improvement measurement.

With each iteration, you’ll see how the prompt enhancement changes the predictions and alignment with ground truth.

Step 6: Human-in-the-Loop

Once satisfied with your Prompts' performance, scale up to generate pre-annotations for all tasks in your project. These pre-filled fields give annotators a head start, allowing them to quickly review and accept accurate predictions or make corrections as needed. This shifts their role from manual labeling to focused reviewing, enabling more time for challenging or ambiguous images.

Integrating human review ensures that your data remains high quality while significantly improving annotation efficiency. Instead of starting from scratch, annotators can focus on refining model outputs, catching edge cases, and handling nuanced scenarios that automated systems might struggle with. This hybrid approach balances automation with human expertise, reducing annotation fatigue while maintaining accuracy.

Curious about how to best implement human-in-the-loop workflows with Prompts? Watch our workshop to see how Label Studio helps you:

Set up Prompts to pre-label data effectively
Iterate and improve performance
Use human reviewers to fine-tune labels for the highest accuracy

With a well-tuned Prompt and an efficient human-in-the-loop process, you can scale annotation efforts while maintaining control over quality, ultimately improving training data for your models.

Unlock Image Annotation at Scale

Label Studio's Prompts enables you to consolidate complex image tasks into a single workflow powered by LLMs. Whether it’s object classification, captioning, or more, you can now:

Pre-label complex annotations seamlessly
Enhance and track LLM prompts iteratively for better performance
Maintain efficient, high-quality annotations with human-in-the-loop review

With Prompts, you’ll accelerate time to model development while maintaining high quality data, saving resources and unlocking business value sooner.

Happy labeling!