Usage¶
Everything you need to use OmniDocs in your projects.
Tasks & Models¶
Text Extraction¶
Convert documents to Markdown/HTML.
| Model | Speed | Backends |
|---|---|---|
| Qwen | 2-3s/page | PyTorch, VLLM, MLX, API |
| DotsOCR | 3-5s/page | PyTorch, VLLM, API |
Layout Analysis¶
Detect structure (titles, tables, figures).
| Model | Speed | Labels |
|---|---|---|
| DocLayoutYOLO | 0.1-0.2s/page | Fixed (11) |
| RT-DETR | 0.3-0.5s/page | Fixed (11) |
| Qwen Layout | 2-3s/page | Custom |
OCR¶
Extract text with coordinates.
| Model | Speed | Languages |
|---|---|---|
| Tesseract | 0.5-1s/page | 100+ |
| EasyOCR | 1-2s/page | 80+ |
| PaddleOCR | 0.5-1s/page | 80+ |
Workflows¶
- Batch Processing - Process multiple documents
- Deployment - Deploy on Modal GPUs
Upcoming¶
Tasks: Table Extraction, Math Recognition, Chart Understanding
Models: Chandra, LightOnOCR-2, MinerU, SuryaOCR, SuryaLayout
See Roadmap for full tracking.