Overview¶
DeepSeek-OCR backend configurations and extractor for text extraction.
Two generations of DeepSeek OCR models from deepseek-ai:
DeepSeek-OCR (Oct 2024, arXiv:2510.18234) — v1, MIT license, 3B params, ~6.7 GB BF16 DeepSeek-OCR-2 (Jan 2026, arXiv:2601.20552) — v2, Apache 2.0, 3B params, improved "Visual Causal Flow"
Both share the same inference interface (AutoModel + AutoTokenizer with model.infer()). The default model is DeepSeek-OCR-2 (latest).
Supported prompts
"
<|grounding|>Convert the document to markdown." ← structured document
"
Available backends
- DeepSeekOCRTextPyTorchConfig: PyTorch/HuggingFace backend
- DeepSeekOCRTextVLLMConfig: VLLM high-throughput backend (recommended, ~2500 tok/s on A100)
- DeepSeekOCRTextMLXConfig: MLX backend for Apple Silicon
- DeepSeekOCRTextAPIConfig: API backend (Novita AI)
HuggingFaces
deepseek-ai/DeepSeek-OCR-2 (latest, Apache 2.0) deepseek-ai/DeepSeek-OCR (v1, MIT)
GitHub: https://github.com/deepseek-ai/DeepSeek-OCR-2
Example
DeepSeekOCRTextAPIConfig
¶
Bases: BaseModel
API backend configuration for DeepSeek-OCR / DeepSeek-OCR-2 text extraction.
Uses litellm for provider-agnostic API access. Primary provider: Novita AI (official hosting).
Example
DeepSeekOCRTextExtractor
¶
Bases: BaseTextExtractor
DeepSeek-OCR / DeepSeek-OCR-2 text extractor.
High-accuracy OCR model that reads complex real-world documents (PDFs, forms, tables, handwritten/noisy text) and outputs clean Markdown. Uses a hybrid vision encoder + causal text decoder — output is structured by the model itself rather than post-processed from bounding boxes.
DeepSeek-OCR-2 ("Visual Causal Flow") is the default — released Jan 2026.
Supports PyTorch, VLLM (recommended), MLX, and API backends.
Example
from omnidocs.tasks.text_extraction import DeepSeekOCRTextExtractor
from omnidocs.tasks.text_extraction.deepseek import (
DeepSeekOCRTextPyTorchConfig,
DeepSeekOCRTextVLLMConfig,
)
# VLLM — ~2500 tokens/s on A100 (recommended for production)
extractor = DeepSeekOCRTextExtractor(
backend=DeepSeekOCRTextVLLMConfig()
)
result = extractor.extract(image)
print(result.content)
# PyTorch with crop_mode for dense pages
extractor = DeepSeekOCRTextExtractor(
backend=DeepSeekOCRTextPyTorchConfig(crop_mode=True)
)
Initialize DeepSeek-OCR extractor.
| PARAMETER | DESCRIPTION |
|---|---|
backend
|
Backend config. One of: - DeepSeekOCRTextPyTorchConfig (local GPU) - DeepSeekOCRTextVLLMConfig (recommended, high-throughput) - DeepSeekOCRTextMLXConfig (Apple Silicon) - DeepSeekOCRTextAPIConfig (Novita AI)
TYPE:
|
Source code in omnidocs/tasks/text_extraction/deepseek/extractor.py
extract
¶
extract(
image: Union[Image, ndarray, str, Path],
output_format: Literal["html", "markdown"] = "markdown",
) -> TextOutput
Extract text from a document image.
DeepSeek-OCR always outputs Markdown-structured text. The output_format parameter is accepted for API compatibility.
| PARAMETER | DESCRIPTION |
|---|---|
image
|
Input image (PIL Image, numpy array, or file path)
TYPE:
|
output_format
|
Accepted for API compatibility (default: "markdown")
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
TextOutput
|
TextOutput with extracted Markdown content |
Source code in omnidocs/tasks/text_extraction/deepseek/extractor.py
DeepSeekOCRTextMLXConfig
¶
Bases: BaseModel
MLX backend configuration for DeepSeek-OCR text extraction.
Apple Silicon only (M1/M2/M3+). Do NOT deploy to Modal/cloud. Uses standard mlx-vlm generate interface.
Note: MLX variants currently available for DeepSeek-OCR v1. Check mlx-community for DeepSeek-OCR-2 variants as they are published.
DeepSeekOCRTextPyTorchConfig
¶
Bases: BaseModel
PyTorch/HuggingFace backend configuration for DeepSeek-OCR / DeepSeek-OCR-2.
Uses AutoModel + AutoTokenizer. Inference via model.infer() — the model handles tiling and multi-page PDF stitching internally.
Models
deepseek-ai/DeepSeek-OCR-2 (default, latest — Jan 2026, Apache 2.0) deepseek-ai/DeepSeek-OCR (v1 — Oct 2024, MIT)
GPU requirements: L4 / A100 (≥16 GB VRAM recommended).
Example
DeepSeekOCRTextVLLMConfig
¶
Bases: BaseModel
VLLM backend configuration for DeepSeek-OCR / DeepSeek-OCR-2 text extraction.
DeepSeek-OCR has official upstream VLLM support (~2500 tokens/s on A100). Recommended for high-throughput batch document processing in production. Requires: vllm>=0.11.1 (or nightly for OCR-2), torch, transformers==4.46.3
Note: Default model is DeepSeek-OCR v1 (not v2) because DeepSeek-OCR-2 VLLM support requires a vllm nightly build. Use PyTorch backend for DeepSeek-OCR-2 until official vllm support is released.
Example
api
¶
API backend configuration for DeepSeek-OCR text extraction.
Note: DeepSeek-OCR-2 API availability may vary by provider — check novita.ai for updated model slugs as providers onboard the new version.
DeepSeekOCRTextAPIConfig
¶
Bases: BaseModel
API backend configuration for DeepSeek-OCR / DeepSeek-OCR-2 text extraction.
Uses litellm for provider-agnostic API access. Primary provider: Novita AI (official hosting).
Example
extractor
¶
DeepSeek-OCR / DeepSeek-OCR-2 text extractor.
DeepSeek-OCR (Oct 2024, arXiv:2510.18234) — v1, MIT, 3B params DeepSeek-OCR-2 (Jan 2026, arXiv:2601.20552) — v2, Apache 2.0, 3B params, "Visual Causal Flow"
Supported backends: PyTorch, VLLM (official upstream support), MLX, API.
GitHub
v1: https://github.com/deepseek-ai/DeepSeek-OCR v2: https://github.com/deepseek-ai/DeepSeek-OCR-2
Example
from omnidocs.tasks.text_extraction import DeepSeekOCRTextExtractor
from omnidocs.tasks.text_extraction.deepseek import DeepSeekOCRTextVLLMConfig
extractor = DeepSeekOCRTextExtractor(
backend=DeepSeekOCRTextVLLMConfig() # DeepSeek-OCR-2, VLLM
)
result = extractor.extract(image)
print(result.content)
DeepSeekOCRTextExtractor
¶
Bases: BaseTextExtractor
DeepSeek-OCR / DeepSeek-OCR-2 text extractor.
High-accuracy OCR model that reads complex real-world documents (PDFs, forms, tables, handwritten/noisy text) and outputs clean Markdown. Uses a hybrid vision encoder + causal text decoder — output is structured by the model itself rather than post-processed from bounding boxes.
DeepSeek-OCR-2 ("Visual Causal Flow") is the default — released Jan 2026.
Supports PyTorch, VLLM (recommended), MLX, and API backends.
Example
from omnidocs.tasks.text_extraction import DeepSeekOCRTextExtractor
from omnidocs.tasks.text_extraction.deepseek import (
DeepSeekOCRTextPyTorchConfig,
DeepSeekOCRTextVLLMConfig,
)
# VLLM — ~2500 tokens/s on A100 (recommended for production)
extractor = DeepSeekOCRTextExtractor(
backend=DeepSeekOCRTextVLLMConfig()
)
result = extractor.extract(image)
print(result.content)
# PyTorch with crop_mode for dense pages
extractor = DeepSeekOCRTextExtractor(
backend=DeepSeekOCRTextPyTorchConfig(crop_mode=True)
)
Initialize DeepSeek-OCR extractor.
| PARAMETER | DESCRIPTION |
|---|---|
backend
|
Backend config. One of: - DeepSeekOCRTextPyTorchConfig (local GPU) - DeepSeekOCRTextVLLMConfig (recommended, high-throughput) - DeepSeekOCRTextMLXConfig (Apple Silicon) - DeepSeekOCRTextAPIConfig (Novita AI)
TYPE:
|
Source code in omnidocs/tasks/text_extraction/deepseek/extractor.py
extract
¶
extract(
image: Union[Image, ndarray, str, Path],
output_format: Literal["html", "markdown"] = "markdown",
) -> TextOutput
Extract text from a document image.
DeepSeek-OCR always outputs Markdown-structured text. The output_format parameter is accepted for API compatibility.
| PARAMETER | DESCRIPTION |
|---|---|
image
|
Input image (PIL Image, numpy array, or file path)
TYPE:
|
output_format
|
Accepted for API compatibility (default: "markdown")
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
TextOutput
|
TextOutput with extracted Markdown content |
Source code in omnidocs/tasks/text_extraction/deepseek/extractor.py
mlx
¶
MLX backend configuration for DeepSeek-OCR text extraction.
Available MLX quantized variants (mlx-community): mlx-community/DeepSeek-OCR-4bit (4-bit, recommended) mlx-community/DeepSeek-OCR-8bit (8-bit, higher fidelity)
Note: DeepSeek-OCR-2 MLX variants may not yet be available — check https://huggingface.co/mlx-community for latest uploads. Fall back to DeepSeek-OCR v1 4bit/8bit for Apple Silicon.
DeepSeekOCRTextMLXConfig
¶
Bases: BaseModel
MLX backend configuration for DeepSeek-OCR text extraction.
Apple Silicon only (M1/M2/M3+). Do NOT deploy to Modal/cloud. Uses standard mlx-vlm generate interface.
Note: MLX variants currently available for DeepSeek-OCR v1. Check mlx-community for DeepSeek-OCR-2 variants as they are published.
pytorch
¶
PyTorch/HuggingFace backend configuration for DeepSeek-OCR text extraction.
Both DeepSeek-OCR and DeepSeek-OCR-2 use
- AutoModel (not AutoModelForCausalLM)
- AutoTokenizer (not AutoProcessor)
- model.infer(tokenizer, prompt=..., image_file=...) for inference
Requirements (from official README): python==3.12.9, CUDA==11.8 torch==2.6.0, transformers==4.46.3, tokenizers==0.20.3 einops, addict, easydict flash-attn==2.7.3 (optional, --no-build-isolation)
DeepSeekOCRTextPyTorchConfig
¶
Bases: BaseModel
PyTorch/HuggingFace backend configuration for DeepSeek-OCR / DeepSeek-OCR-2.
Uses AutoModel + AutoTokenizer. Inference via model.infer() — the model handles tiling and multi-page PDF stitching internally.
Models
deepseek-ai/DeepSeek-OCR-2 (default, latest — Jan 2026, Apache 2.0) deepseek-ai/DeepSeek-OCR (v1 — Oct 2024, MIT)
GPU requirements: L4 / A100 (≥16 GB VRAM recommended).
Example
vllm
¶
VLLM backend configuration for DeepSeek-OCR text extraction.
DeepSeek-OCR has official upstream VLLM support (announced Oct 23 2025). Achieves ~2500 tokens/s on A100-40G — the recommended backend for production.
DeepSeek-OCR-2 VLLM support: refer to https://github.com/deepseek-ai/DeepSeek-OCR-2 for the latest vLLM setup instructions (may require nightly build).
DeepSeekOCRTextVLLMConfig
¶
Bases: BaseModel
VLLM backend configuration for DeepSeek-OCR / DeepSeek-OCR-2 text extraction.
DeepSeek-OCR has official upstream VLLM support (~2500 tokens/s on A100). Recommended for high-throughput batch document processing in production. Requires: vllm>=0.11.1 (or nightly for OCR-2), torch, transformers==4.46.3
Note: Default model is DeepSeek-OCR v1 (not v2) because DeepSeek-OCR-2 VLLM support requires a vllm nightly build. Use PyTorch backend for DeepSeek-OCR-2 until official vllm support is released.