Skip to content

OCR

Extract text with precise bounding boxes.


Input / Output

Input: Document image

Output: Text blocks with coordinates and confidence scores

result = ocr.extract(image)
for block in result.text_blocks:
    print(f"'{block.text}' @ {block.bbox} ({block.confidence:.2f})")
'Invoice' @ BoundingBox(x1=100, y1=50, x2=200, y2=80) (0.98)
'Date: 2024-01-15' @ BoundingBox(x1=100, y1=100, x2=280, y2=125) (0.96)
'Total: $1,234.56' @ BoundingBox(x1=100, y1=400, x2=300, y2=430) (0.97)

Quick Start

from omnidocs.tasks.ocr_extraction import TesseractOCR, TesseractConfig
from PIL import Image

image = Image.open("document.png")

ocr = TesseractOCR(
    config=TesseractConfig(languages=["eng"])
)

result = ocr.extract(image)

for block in result.text_blocks:
    print(f"'{block.text}' @ {block.bbox}")

Available Models

Model Speed GPU Languages Best For
Tesseract Fast No 100+ General, multilingual
EasyOCR Medium Optional 80+ Higher accuracy
PaddleOCR Fast Optional 80+ Asian languages

When to Use

✅ Need word/character coordinates ✅ Building search indexes with positions ✅ Form field extraction ✅ Text location for downstream processing

❌ Just need readable text → Use Text Extraction ❌ Just need structure → Use Layout Analysis


Upcoming Models

Model Description Status
SuryaOCR Modern multilingual OCR 🔜 Soon
QwenOCR VLM-based OCR 🔜 Soon