Skip to content

Index

OmniDocs Banner

Unified Python toolkit for visual document processing


Install

pip install omnidocs[pytorch]

Extract Text in 4 Lines

from omnidocs import Document
from omnidocs.tasks.text_extraction import QwenTextExtractor
from omnidocs.tasks.text_extraction.qwen import QwenPyTorchConfig

doc = Document.from_pdf("document.pdf")
extractor = QwenTextExtractor(backend=QwenPyTorchConfig(device="cuda"))
result = extractor.extract(doc.get_page(0), output_format="markdown")
print(result.content)

What OmniDocs Does

Task What You Get Example Models
Text Extraction Markdown/HTML from documents VLM API, Qwen3-VL, MinerU VL, DotsOCR, Nanonets OCR2
Layout Analysis Bounding boxes for titles, tables, figures VLM API, DocLayoutYOLO, RT-DETR, MinerU VL, Qwen Layout
Structured Extraction Typed Pydantic objects from documents VLM API (any cloud provider)
OCR Text + coordinates Tesseract, EasyOCR, PaddleOCR
Table Extraction Structured table data (rows, columns, cells) TableFormer
Reading Order Logical reading sequence Rule-based R-tree

Core Design

Image → Extractor.extract() → Pydantic Output
  • One API: .extract() for every task
  • Type-Safe: Pydantic configs with IDE autocomplete
  • Multi-Backend: PyTorch, VLLM, MLX, API
  • Stateless: Document loads data, you manage results

Choose Your Backend

Backend Install Best For
PyTorch pip install omnidocs[pytorch] Development, single GPU
VLLM pip install omnidocs[vllm] Production, high throughput
MLX pip install omnidocs[mlx] Apple Silicon (M1/M2/M3)
API pip install omnidocs No GPU, cloud-based (included by default)

What's Available

Model Task PyTorch VLLM MLX API
VLM API Text, Layout, Structured -- -- --
Qwen3-VL Text, Layout
MinerU VL Text, Layout
Granite Docling Text
DotsOCR Text --
Nanonets OCR2 Text --
DocLayoutYOLO Layout -- -- --
RT-DETR Layout -- -- --
TableFormer Table -- -- --
Tesseract OCR -- -- --
EasyOCR OCR -- -- --
PaddleOCR OCR -- -- --
Rule-based Reading Order -- -- --

Coming Soon

Model Task Status
Surya OCR, Layout 🔜 Planned

See Roadmap for full tracking.


Documentation

  • Getting Started

    Install, configure, and run your first extraction

  • Concepts

    Architecture, configs, backends, and design decisions

  • Usage

    Tasks, models, batch processing, and deployment


Quick Reference

Single-Backend Model (e.g., DocLayoutYOLO)

from omnidocs.tasks.layout_analysis import DocLayoutYOLO, DocLayoutYOLOConfig

layout = DocLayoutYOLO(config=DocLayoutYOLOConfig(device="cuda"))
result = layout.extract(image)

Multi-Backend Model (e.g., Qwen)

from omnidocs.tasks.text_extraction import QwenTextExtractor
from omnidocs.tasks.text_extraction.qwen import QwenPyTorchConfig  # or VLLMConfig, MLXConfig, APIConfig

extractor = QwenTextExtractor(backend=QwenPyTorchConfig(device="cuda"))
result = extractor.extract(image, output_format="markdown")

VLM API (Any Cloud Provider, No GPU)

from omnidocs.vlm import VLMAPIConfig
from omnidocs.tasks.text_extraction import VLMTextExtractor

# Works with Gemini, OpenRouter, Azure, OpenAI, self-hosted VLLM
config = VLMAPIConfig(model="gemini/gemini-2.5-flash")
extractor = VLMTextExtractor(config=config)
result = extractor.extract(image, output_format="markdown")