Smoldocling isn’t your typical OCR tool. Where most optical character recognition models struggle with tables, charts, and structured formatting, Smoldocling stands out by offering a lightweight, all-in-one solution for full-document conversion.
In this post, we walk through how to test Smoldocling’s OCR capabilities using Label Studio, helping you evaluate how well it extracts text, layout, and structure from complex documents.
As introduced in the paper SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion, SmolDocling is designed to process entire pages while retaining structure, spatial location, and formatting. Unlike traditional OCR models that require multiple specialized components, SmolDocling generates DocTags, a universal markup format that captures all document elements in full context. This makes it more efficient and scalable for a wide range of document types, including business reports, academic papers, patents, and technical documents.
But how well does it perform on real-world data? To help answer that, we have created a Jupyter Notebook that walks you through testing SmolDocling’s OCR capabilities using Label Studio.
OCR models have improved significantly, but they still face major challenges.
SmolDocling aims to solve these issues by providing a compact, vision-language model that processes full-page documents with structured outputs. However, evaluation is critical to measure accuracy and fine-tune results for real-world use.
To get started, check out the step-by-step notebook.
By integrating SmolDocling with Label Studio, you can gain insights into how well the model performs and fine-tune results to improve document understanding.