Extractor¶
DeepSeek-OCR / DeepSeek-OCR-2 text extractor.
DeepSeek-OCR (Oct 2024, arXiv:2510.18234) — v1, MIT, 3B params DeepSeek-OCR-2 (Jan 2026, arXiv:2601.20552) — v2, Apache 2.0, 3B params, "Visual Causal Flow"
Supported backends: PyTorch, VLLM (official upstream support), MLX, API.
GitHub
v1: https://github.com/deepseek-ai/DeepSeek-OCR v2: https://github.com/deepseek-ai/DeepSeek-OCR-2
Example
from omnidocs.tasks.text_extraction import DeepSeekOCRTextExtractor
from omnidocs.tasks.text_extraction.deepseek import DeepSeekOCRTextVLLMConfig
extractor = DeepSeekOCRTextExtractor(
backend=DeepSeekOCRTextVLLMConfig() # DeepSeek-OCR-2, VLLM
)
result = extractor.extract(image)
print(result.content)
DeepSeekOCRTextExtractor
¶
Bases: BaseTextExtractor
DeepSeek-OCR / DeepSeek-OCR-2 text extractor.
High-accuracy OCR model that reads complex real-world documents (PDFs, forms, tables, handwritten/noisy text) and outputs clean Markdown. Uses a hybrid vision encoder + causal text decoder — output is structured by the model itself rather than post-processed from bounding boxes.
DeepSeek-OCR-2 ("Visual Causal Flow") is the default — released Jan 2026.
Supports PyTorch, VLLM (recommended), MLX, and API backends.
Example
from omnidocs.tasks.text_extraction import DeepSeekOCRTextExtractor
from omnidocs.tasks.text_extraction.deepseek import (
DeepSeekOCRTextPyTorchConfig,
DeepSeekOCRTextVLLMConfig,
)
# VLLM — ~2500 tokens/s on A100 (recommended for production)
extractor = DeepSeekOCRTextExtractor(
backend=DeepSeekOCRTextVLLMConfig()
)
result = extractor.extract(image)
print(result.content)
# PyTorch with crop_mode for dense pages
extractor = DeepSeekOCRTextExtractor(
backend=DeepSeekOCRTextPyTorchConfig(crop_mode=True)
)
Initialize DeepSeek-OCR extractor.
| PARAMETER | DESCRIPTION |
|---|---|
backend
|
Backend config. One of: - DeepSeekOCRTextPyTorchConfig (local GPU) - DeepSeekOCRTextVLLMConfig (recommended, high-throughput) - DeepSeekOCRTextMLXConfig (Apple Silicon) - DeepSeekOCRTextAPIConfig (Novita AI)
TYPE:
|
Source code in omnidocs/tasks/text_extraction/deepseek/extractor.py
extract
¶
extract(
image: Union[Image, ndarray, str, Path],
output_format: Literal["html", "markdown"] = "markdown",
) -> TextOutput
Extract text from a document image.
DeepSeek-OCR always outputs Markdown-structured text. The output_format parameter is accepted for API compatibility.
| PARAMETER | DESCRIPTION |
|---|---|
image
|
Input image (PIL Image, numpy array, or file path)
TYPE:
|
output_format
|
Accepted for API compatibility (default: "markdown")
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
TextOutput
|
TextOutput with extracted Markdown content |