✨ Download the New Guide: Ensuring Quality for Mission-Critical AI Applications
Contact Sales

Autolabel Image Captions with Label Studio Prompts

Here’s something that's at the heart of modern computer vision applications: getting high-quality image datasets. Whether you're working on autonomous vehicles identifying road objects or medical imaging systems detecting anomalies, the success of your AI project depends on your training data.

The challenge lies in not just collecting these images, but accurately describing them in a way that captures the full context of the images. Image captioning has emerged as a core component in bridging computer vision and natural language understanding capabilities. Consider these practical image captioning use cases:

  • E-commerce platforms generating comprehensive product descriptions
  • Accessibility tools creating precise alt-text for visually impaired users
  • Content moderation systems accurately detecting policy violations

These applications require not just object recognition ("there is a cat") but also contextual understanding ("a ginger cat sleeping on a windowsill at sunset"). However, manual annotation workflows face several challenges when captioning at scale, including consistency between annotators, tradeoffs between speed and accuracy, and task complexity increasing annotation time significantly.Large Language Models (LLMs) are revolutionizing how we tackle these challenges, unlocking new capabilities in understanding and describing visual content.

With Label Studio's Prompts, you can now handle multiple image annotation types - from basic object classification to nuanced caption generation - in a unified workflow that maintains both efficiency and quality.

This workflow helps you:

  • Leverage the power of LLM-based autolabeling through a simple UI
  • Systematically evaluate LLM prompt performance against golden datasets
  • Integrate human-in-the-loop validation and correction
  • Scale annotation to large datasets at a fraction of the time and cost

Let’s dive in.

Prerequisites

Multi-Step Annotation: One Prompt, Many Captions

Now, imagine you're working on a project involving image captioning. For each image, you want to classify objects (cat, dog, cow, horse, tree, house) and generate descriptive captions. Traditional workflows might require separate annotation steps, but with Prompts in Label Studio, you can consolidate all these tasks into a single Prompt.

This streamlined approach empowers annotators to focus on quality control rather than manual labeling of every image.

Step-by-Step Walkthrough

If you'd rather watch the video of the walkthrough, check it out below!

Step 1: Setting Up Your Prompt

To begin, navigate to the Prompts page via the hamburger menu. Click Create Prompt and name your new prompt (e.g., Image Captioning Prompt). Next, link it to your target project.

For this tutorial, we’ll use a project called Demo Project: Animal Picture Captions. Once selected, Label Studio automatically populates the expected outputs for each control tag based on your project’s labeling configuration. In our case, this includes:

  • Classification options for object detection
  • Text area for caption generation

Hit Create to finalize your Prompt setup.

Step 2: Writing and Running Your Prompt

Select the LLM you’d like to use. In our demo, we’ll opt for OpenAI’s GPT-4o. Since Prompts is designed for iterative improvements, start simple. For instance:

Given the image, identify any objects from the provided categories and generate a descriptive caption.
{image}

Make sure to include any data variables you want to to be used as context for the LLM (eg. “{image}”) in the LLMs prediction. Save the Prompt and run it on your sample of tasks.

Step 3: Reviewing Initial Results

Once the Prompt has executed, you’ll see a view of the LLM’s predictions. For any incomplete or inaccurate annotations, click into the task row for a deeper inspection.

For example, in one task:

  • Classification: cow; however there were also trees in the image
  • Caption: “A picturesque scene of cows grazing in a lush green meadow, with majestic mountains in the background under a clear blue sky.”

Pro Tip! If you have ground truth annotations in your Project, you can run inference against ground truth for a side-by-side comparison and metrics like accuracy, precision, recall, and F1 Score.

Results from a Prompt run on a Sample of Project Tasks

Quickview after clicking in to a task row

Results from a Prompt run on Ground Truth Tasks

Step 4: Iterate and (Auto) Enhance Your Prompt

You can create new Prompt versions to compare predictions across different prompts and LLMs, or use the Enhance Prompt option for automated improvements! Click on Enhance Prompt, select your LLM (e.g., GPT-4o) and apply enhancements on your project tasks.

Label Studio will:

  1. Suggest prompt enhancement, explaining the changes
  2. Provide a new Prompt based off of its suggestions

For example, your enhanced prompt might now include:

  • Input descriptions and few-shot examples
  • Guidelines for caption detail and structure

Setting up Prompt Enhancement

Results from Prompt Enhancement

Enhanced Prompt saved as new Prompt Version

Step 5: Comparing and Iterating

Run the enhanced prompt and compare its outputs against initial results. With ground truth data, Label Studio provides side-by-side comparisons for quantitative improvement measurement.

With each iteration, you’ll see how the prompt enhancement changes the predictions and alignment with ground truth.

Step 6: Human-in-the-Loop

Once satisfied with your Prompts' performance, scale up to generate pre-annotations for all tasks in your Project. These pre-filled fields give annotators a head start, allowing them to quickly review and accept accurate predictions or make corrections as needed. This shifts their role from manual labeling to focused reviewing, enabling more time for challenging or ambiguous images.

Unlock Image Annotation at Scale

Label Studio's Prompts enables you to consolidate complex image tasks into a single workflow powered by LLMs. Whether it’s object classification, captioning, or more, you can now:

  • Auto-label complex annotations seamlessly
  • Enhance and track LLM prompts iteratively for better performance
  • Maintain efficient, high-quality annotations with human-in-the-loop review

With Prompts, you’ll accelerate time to model development while maintaining high quality data, saving resources and unlocking business value sooner.

Happy labeling!

Related Content