Label Studio Enterprise

Prompts

Scale AI evaluation and annotation

Accelerate labeling and evaluation workflows with prompts and LLMs built into Label Studio Enterprise.

Contact Sales Compare editions

Image Evals / AI Image Rubric Evaluation

v3 · Current gpt-5

System

You are a visual design expert. Your task is to score a generated image based on a strict rubric.

You will be given an image. The prompt is "portrait of airplane"

1. Prompt Alignment (1–5) Does the subject clearly match the prompt?

2. Visual Quality (1–5) Is the image sharp, well-lit, and artifact-free?

3. Composition (1–5) Is the framing, balance, and crop strong?

User

{{image}} variable

Overall Evaluation Results 20 tasks $ 0.025 avg ⤢ Expand

Results Alignment Versions Compare History

All Needs Review Has Error Columns ▾

Image Ovr Algn Qual Comp Comments Cost

5 5 5 4 Clear, sharp photo of airplane at airport — matches prompt, strong composition, no issues. $0.021

4 5 4 4 Alignment excellent — clearly matches "portrait of airplane". Image slightly soft but fully… $0.023

2 2 3 2 Subject is a person holding a model plane — not an airplane portrait. Prompt fundamentally… ⚠ Needs Review $0.019

5 5 5 5 Commercial airplane on runway — subject perfectly framed, excellent light, prompt… $0.020

Active Learning

An automated active-learning loop

Every annotation closes the loop. The model retrains on each new labeled data. Predictions update automatically; the least-confident tasks are routed to your team. Available exclusively in Label Studio Enterprise.

Description

A man with a beard sits at a desk working on a laptop. He looks puzzled by…

Predictions

A man with a beard sits at a desk…

Woman in a cream linen top, soft…

Professional with glasses, open-plan…

Two colleagues review a screen together…

Tasks

Woman in a cream linen top, soft…

Professional with glasses, open-plan…

A man with a beard sits at a desk…

Annotate

An annotator labels a task. A webhook instantly notifies the connected model backend.

Retrain

The model retrains on the new labeled data. No manual batching or scheduling.

Re-predict

Fresh predictions from the updated model flow back into Label Studio in real time.

Re-prioritize

Tasks are reordered so annotators always see the most uncertain, lowest-confidence cases next.

Use LLM-as-a-judge or jury

Deploy models to act like a judge or a jury, scoring responses against a rubric, rating scale or pass/fail threshold. Measure consensus or identify uncertain cases to route to subject matter experts.

Judge model Overall Dims Verdict

Claude Sonnet 4 Anthropic

A · H · C

Accurate, well-reasoned, and grounded in the task context. The 60% efficiency claim is appropriately hedged and the core mechanism is explained clearly.

AccurateConciseOn-task

GPT-5 OpenAI

A · H · C

Good explanation of the active learning loop. Could benefit from a concrete example or a mention of annotation tool integration to ground the efficiency claim.

AccurateNeeds example

Gemini 1.5 Pro Google DeepMind

A · H · C

Satisfactory summary. The response is clear and mostly correct. The 60% figure is presented without a citation — recommend flagging for human review before publication.

ClearUnverified stat

Pre-label with AI so humans can focus on what matters

Set up a prompt and generate predictions for an entire dataset. Your annotation team can switch to reviewing and validating instead of manually writing labels from zero.

Datasets / Portrait Dataset

Image Classification Caption

indoor · candid Bearded man in a denim jacket smiling warmly in a café setting, with soft bokeh light in the background.

natural · portrait Woman in a cream linen top photographed in soft natural light next to large tropical foliage, looking off-camera.

workplace · portrait Professional with glasses and highlighted hair standing confidently in an open-plan office with a monitor visible behind her.

formal · indoor Professional woman in a navy blazer smiling in a formal office setting, with framed credentials visible in the background.

indoor · portrait Brown-haired woman in a dark formal shirt photographed with a warm smile, lit by diffused natural window light indoors.

Bring your own model

Connect any model: your own model, a model from any commercial provider, or an AI gateway like OpenRouter or Hugging Face.

Empower SMEs

Give subject-matter experts
the power to refine prompts

You don't need to be a masterful prompt engineer to improve a model's performance. Click to enhance prompts automatically, improving definitions, decision logic and examples.

Prompts / Product Review NLP

Base model

gpt-5-mini OpenAI

Prompt

Text Title

Given the product review and title, perform the following:

- sentiment: classify the review as positive, neutral, or negative

- entities: identify key product features and brands mentioned

Enhancement Complete Applied

Changes made:

The prompt lacked specificity around output structure and task scope. The following improvements were applied:

1 Added output format — specified JSON structure ensures consistent, parseable responses across all model providers.
2 Clarified entity scope — listed product features, brand names, and key adjectives as explicit target entity types.
3 Handled ambiguous cases — added instructions for mixed-sentiment reviews and confidence scoring.

Verify and align to ground truth.
Then automate and scale.

How do you know your AI judges are actually right? Bad pre-labels bias your annotators. Unreliable judges give you false confidence in your models.

Score against ground truth first. Set a confidence threshold, verify accuracy on a sample set, and automate with confidence.

Prompts / MT Evaluation

ResultsAlignmentVariantsCompareHistory

Overall Evaluation Results

73.15% accuracy 20 tasks $0.055005 avg

AllCorrectIncorrectHas Error

# Original text Ground truth LLM output Score Accuracy Cost

24701 That is why Mr Searle's report deserves our full support and congratulations. C'est pourquoi le rapport de M. Searle mérite notre plein soutien et nos félicitations. C'est pourquoi le rapport de M. Searle mérite notre plein soutien et nos félicitations. 5 100% $0.0027

24702 I believe the Commission has shown great clarity on this very important matter. Je crois que la Commission a fait preuve d'une grande clarté sur cette question importante. Je pense que la Commission a été très claire dans son approche de cette importante question. 4 84% $0.0031

24703 Liberalization and trade must go hand in hand with strong social protection measures. La libéralisation et le commerce doivent aller de pair avec la protection sociale. Les échanges commerciaux et la libéralisation sont importants pour la croissance économique. 2 42% $0.0029

Read how a leading enterprise team automated benchmark evaluation →

Pipelines & API

Automate the entire pipeline

Wire Label Studio into your stack with webhooks, a full REST API, and a Python SDK. Every task, annotation, and project change can trigger the next step so labeling, training, and evaluation run without manual hand-offs.

Trigger model training

Kick off training automatically once a project hits a threshold of new annotations.

Drive active learning

Fire the active-learning loop on every annotation event — no scheduler required.

Version your data

Push each new batch of labeled data into your dataset-versioning repository automatically.

Integrate external pipelines

Connect Label Studio to your stack with the REST API and Python SDK for fully programmatic workflows.

Webhook events

webhook_handler.py

from flask import Flask, request

app = Flask(__name__)

@app.route("/ls-webhook", methods=["POST"])
def label_studio_webhook():
    event = request.get_json()
    action = event.get("action")

    if action == "ANNOTATION_CREATED":
        task = event["task"]
        annotation = event["annotation"]
        # New label landed — kick off the active-learning loop
        retrain_model(project_id=task["project"], annotation=annotation)

    return {"status": "ok"}, 200

Scale AI evaluation and annotation

An automated active-learning loop

Annotate

Retrain

Re-predict

Re-prioritize

Use LLM-as-a-judge or jury

Pre-label with AI so humans can focus on what matters

Bring your own model

Give subject-matter experts
the power to refine prompts

Verify and align to ground truth.
Then automate and scale.

Overall Evaluation Results

Put prompts to work for your use cases

Automate the entire pipeline

Trigger model training

Drive active learning

Version your data

Integrate external pipelines

COMPREHENSIVE INFRASTRUCTURE

Programmable Interfaces

FULLY PROGRAMMABLE

MULTIMODAL DATA

EMBEDDABLE

AI Automation

LLM AS JUDGE

AUTOMATED PRELABELING

PLUGINS

Quality Assurance

AGREEMENT WORKFLOWS

WORKFORCE MANAGEMENT

ROLES & PERMISSIONS

Data Security & Compliance

Scale AI evaluation and annotation

An automated active-learning loop

Annotate

Retrain

Re-predict

Re-prioritize

Use LLM-as-a-judge or jury

Pre-label with AI so humans can focus on what matters

Bring your own model

Give subject-matter expertsthe power to refine prompts

Verify and align to ground truth.Then automate and scale.

Overall Evaluation Results

Put prompts to work for your use cases

Automate the entire pipeline

Trigger model training

Drive active learning

Version your data

Integrate external pipelines

Programmable Interfaces

FULLY PROGRAMMABLE

MULTIMODAL DATA

EMBEDDABLE

AI Automation

LLM AS JUDGE

AUTOMATED PRELABELING

PLUGINS

Quality Assurance

AGREEMENT WORKFLOWS

WORKFORCE MANAGEMENT

ROLES & PERMISSIONS

Data Security & Compliance

Give subject-matter experts
the power to refine prompts

Verify and align to ground truth.
Then automate and scale.