OCR¶

Extract text with precise bounding boxes.

Input / Output¶

Input: Document image

Output: Text blocks with coordinates and confidence scores

result = ocr.extract(image)
for block in result.text_blocks:
    print(f"'{block.text}' @ {block.bbox} ({block.confidence:.2f})")

'Invoice' @ BoundingBox(x1=100, y1=50, x2=200, y2=80) (0.98)
'Date: 2024-01-15' @ BoundingBox(x1=100, y1=100, x2=280, y2=125) (0.96)
'Total: $1,234.56' @ BoundingBox(x1=100, y1=400, x2=300, y2=430) (0.97)

Quick Start¶

from omnidocs.tasks.ocr_extraction import TesseractOCR, TesseractConfig
from PIL import Image

image = Image.open("document.png")

ocr = TesseractOCR(
    config=TesseractConfig(languages=["eng"])
)

result = ocr.extract(image)

for block in result.text_blocks:
    print(f"'{block.text}' @ {block.bbox}")

Available Models¶

Model	Speed	GPU	Languages	Best For
Tesseract	Fast	No	100+	General, multilingual
EasyOCR	Medium	Optional	80+	Higher accuracy
PaddleOCR	Fast	Optional	80+	Asian languages

When to Use¶

✅ Need word/character coordinates ✅ Building search indexes with positions ✅ Form field extraction ✅ Text location for downstream processing

❌ Just need readable text → Use Text Extraction ❌ Just need structure → Use Layout Analysis

Upcoming Models¶

Model	Description	Status
SuryaOCR	Modern multilingual OCR	🔜 Soon
QwenOCR	VLM-based OCR	🔜 Soon