DotsOCR¶

Layout-aware text extraction with bounding boxes.

Overview¶


Tasks	Text Extraction
Backends	PyTorch, VLLM
Speed	3-5s/page
Quality	Very Good
VRAM	8-12GB

What Makes It Different¶

DotsOCR extracts text with layout information. Each text block includes: - Text content - Bounding box coordinates - Element category (title, text, table, etc.)

Best for technical documents where structure matters.

Basic Usage¶

from omnidocs.tasks.text_extraction import DotsOCRTextExtractor
from omnidocs.tasks.text_extraction.dotsocr import DotsOCRPyTorchConfig
from PIL import Image

image = Image.open("document.png")

extractor = DotsOCRTextExtractor(
    backend=DotsOCRPyTorchConfig(device="cuda")
)

result = extractor.extract(image, output_format="markdown")
print(result.content)

With Layout Information¶

result = extractor.extract(image, include_layout=True)

# Access layout elements
for element in result.layout:
    print(f"[{element.category}] {element.bbox}: {element.text[:50]}...")

Output:

[title] [50, 20, 500, 60]: Introduction to Machine Learning
[text] [50, 80, 900, 200]: Machine learning is a subset of...
[table] [50, 220, 900, 450]: | Model | Accuracy | Speed |...
[figure] [50, 470, 400, 700]: [Figure caption text]

Backend Configs¶

PyTorch¶

from omnidocs.tasks.text_extraction.dotsocr import DotsOCRPyTorchConfig

config = DotsOCRPyTorchConfig(
    device="cuda",
    max_new_tokens=8192,  # Increase for long documents
)

VLLM¶

from omnidocs.tasks.text_extraction.dotsocr import DotsOCRVLLMConfig

config = DotsOCRVLLMConfig(
    tensor_parallel_size=1,
    gpu_memory_utilization=0.9,
)

Layout Categories¶

DotsOCR detects 11 element types:

Category	Description
`title`	Document/section headings
`text`	Body paragraphs
`list`	Bullet/numbered lists
`table`	Data tables
`figure`	Images, diagrams
`caption`	Figure/table captions
`formula`	Math equations
`footnote`	Footnotes
`header`	Page headers
`footer`	Page footers
`abstract`	Abstract sections

When to Use DotsOCR vs Qwen¶

Use Case	Model
General text extraction	Qwen
Need bounding boxes	DotsOCR
Technical documents	DotsOCR
Tables with coordinates	DotsOCR
Fastest extraction	Qwen
MLX / API support	Qwen

Troubleshooting¶

Truncated output

# Increase token limit
config = DotsOCRPyTorchConfig(max_new_tokens=16384)

Missing layout elements

# Ensure include_layout=True
result = extractor.extract(image, include_layout=True)