Models
All supported models and their configurations.
Available Models
| Model |
Speed |
Backends |
Status |
| MinerU VL |
3-6s/page |
PyTorch, VLLM, MLX, API |
✅ Ready |
| Qwen |
2-3s/page |
PyTorch, VLLM, MLX, API |
✅ Ready |
| DotsOCR |
3-5s/page |
PyTorch, VLLM, API |
✅ Ready |
| Nanonets OCR2 |
2-4s/page |
PyTorch, VLLM, MLX |
✅ Ready |
Layout Analysis
| Model |
Speed |
Backends |
Status |
| MinerU VL |
3-6s/page |
PyTorch, VLLM, MLX, API |
✅ Ready |
| DocLayoutYOLO |
0.1-0.2s/page |
PyTorch |
✅ Ready |
| RT-DETR |
0.3-0.5s/page |
PyTorch |
✅ Ready |
| Qwen Layout |
2-3s/page |
PyTorch, VLLM, MLX, API |
✅ Ready |
OCR
| Model |
Speed |
Backends |
Status |
| Tesseract |
0.5-1s/page |
CPU |
✅ Ready |
| EasyOCR |
1-2s/page |
PyTorch |
✅ Ready |
| PaddleOCR |
0.5-1s/page |
PaddlePaddle |
✅ Ready |
| Model |
Speed |
Backends |
Status |
| TableFormer |
0.5-1s/table |
PyTorch |
✅ Ready |
Reading Order
| Model |
Speed |
Backends |
Status |
| Rule-based |
<0.1s/page |
CPU |
✅ Ready |
By Backend
| Backend |
Models |
| PyTorch |
MinerU VL, Qwen, DotsOCR, Nanonets, DocLayoutYOLO, RT-DETR, EasyOCR, TableFormer |
| VLLM |
MinerU VL, Qwen, DotsOCR, Nanonets |
| MLX |
MinerU VL, Qwen, Nanonets |
| API |
MinerU VL, Qwen, DotsOCR |
| CPU |
Tesseract, PaddleOCR, Rule-based Reading Order |
Upcoming Models
| Model |
Parameters |
Description |
Status |
| Granite Docling |
258M |
Edge deployment, fast inference |
🔜 Scripts ready |
| Chandra |
9B |
High accuracy text extraction |
🔜 Planned |
Layout Analysis
| Model |
Description |
Status |
| SuryaLayout |
Modern layout detection |
🔜 Planned |
OCR
| Model |
Description |
Status |
| SuryaOCR |
Modern multilingual OCR |
🔜 Planned |
New Tasks
| Task |
Models |
Status |
| Math Recognition |
UniMERNet, Qwen |
🔜 Planned |
| Structured Output |
VLM (GPT-4V, Gemini) |
🔜 Planned |
See Roadmap for full tracking.