Here’s something that's at the heart of modern computer vision applications: getting high-quality image datasets. Whether you're working on autonomous vehicles identifying road objects or medical imaging systems detecting anomalies, the success of your AI project depends on your training data.
The challenge lies in not just collecting these images, but accurately describing them in a way that captures the full context of the images. Image captioning has emerged as a core component in bridging computer vision and natural language understanding capabilities. Consider these practical image captioning use cases:
These applications require not just object recognition ("there is a cat") but also contextual understanding ("a ginger cat sleeping on a windowsill at sunset"). However, manual annotation workflows face several challenges when captioning at scale, including consistency between annotators, tradeoffs between speed and accuracy, and task complexity increasing annotation time significantly.Large Language Models (LLMs) are revolutionizing how we tackle these challenges, unlocking new capabilities in understanding and describing visual content.
With Label Studio's Prompts, you can now handle multiple image annotation types - from basic object classification to nuanced caption generation - in a unified workflow that maintains both efficiency and quality.
This workflow helps you:
Let’s dive in.
Now, imagine you're working on a project involving image captioning. For each image, you want to classify objects (cat, dog, cow, horse, tree, house) and generate descriptive captions. Traditional workflows might require separate annotation steps, but with Prompts in Label Studio, you can consolidate all these tasks into a single Prompt.
This streamlined approach empowers annotators to focus on quality control rather than manual labeling of every image.
If you'd rather watch the video of the walkthrough, check it out below!
To begin, navigate to the Prompts page via the hamburger menu. Click Create Prompt and name your new prompt (e.g., Image Captioning Prompt). Next, link it to your target project.
For this tutorial, we’ll use a project called Demo Project: Animal Picture Captions. Once selected, Label Studio automatically populates the expected outputs for each control tag based on your project’s labeling configuration. In our case, this includes:
Hit Create to finalize your Prompt setup.
Select the LLM you’d like to use. In our demo, we’ll opt for OpenAI’s GPT-4o. Since Prompts is designed for iterative improvements, start simple. For instance:
Given the image, identify any objects from the provided categories and generate a descriptive caption.
{image}
Make sure to include any data variables you want to to be used as context for the LLM (eg. “{image}”) in the LLMs prediction. Save the Prompt and run it on your sample of tasks.
Once the Prompt has executed, you’ll see a view of the LLM’s predictions. For any incomplete or inaccurate annotations, click into the task row for a deeper inspection.
For example, in one task:
Pro Tip! If you have ground truth annotations in your Project, you can run inference against ground truth for a side-by-side comparison and metrics like accuracy, precision, recall, and F1 Score.
You can create new Prompt versions to compare predictions across different prompts and LLMs, or use the Enhance Prompt option for automated improvements! Click on Enhance Prompt, select your LLM (e.g., GPT-4o) and apply enhancements on your project tasks.
Label Studio will:
For example, your enhanced prompt might now include:
Run the enhanced prompt and compare its outputs against initial results. With ground truth data, Label Studio provides side-by-side comparisons for quantitative improvement measurement.
With each iteration, you’ll see how the prompt enhancement changes the predictions and alignment with ground truth.
Once satisfied with your Prompts' performance, scale up to generate pre-annotations for all tasks in your Project. These pre-filled fields give annotators a head start, allowing them to quickly review and accept accurate predictions or make corrections as needed. This shifts their role from manual labeling to focused reviewing, enabling more time for challenging or ambiguous images.
Label Studio's Prompts enables you to consolidate complex image tasks into a single workflow powered by LLMs. Whether it’s object classification, captioning, or more, you can now:
With Prompts, you’ll accelerate time to model development while maintaining high quality data, saving resources and unlocking business value sooner.
Happy labeling!