Overview¶
Dots OCR text extractor and backend configurations.
Available backends: - PyTorch: DotsOCRPyTorchConfig (local GPU inference) - VLLM: DotsOCRVLLMConfig (offline batch inference) - API: DotsOCRAPIConfig (online VLLM server via OpenAI-compatible API)
DotsOCRAPIConfig
¶
Bases: BaseModel
API backend configuration for Dots OCR.
This config is for accessing a deployed VLLM server via OpenAI-compatible API. Typically used with modal_dotsocr_vllm_online.py deployment.
Example
DotsOCRTextExtractor
¶
Bases: BaseTextExtractor
Dots OCR Vision-Language Model text extractor with layout detection.
Extracts text from document images with layout information including: - 11 layout categories (Caption, Footnote, Formula, List-item, etc.) - Bounding boxes (normalized to 0-1024) - Multi-format text (Markdown, LaTeX, HTML) - Reading order preservation
Supports PyTorch, VLLM, and API backends.
Example
from omnidocs.tasks.text_extraction import DotsOCRTextExtractor
from omnidocs.tasks.text_extraction.dotsocr import DotsOCRPyTorchConfig
# Initialize with PyTorch backend
extractor = DotsOCRTextExtractor(
backend=DotsOCRPyTorchConfig(model="rednote-hilab/dots.ocr")
)
# Extract with layout
result = extractor.extract(image, include_layout=True)
print(f"Found {result.num_layout_elements} elements")
print(result.content)
Initialize Dots OCR text extractor.
| PARAMETER | DESCRIPTION |
|---|---|
backend
|
Backend configuration. One of: - DotsOCRPyTorchConfig: PyTorch/HuggingFace backend - DotsOCRVLLMConfig: VLLM high-throughput backend - DotsOCRAPIConfig: API backend (online VLLM server)
TYPE:
|
Source code in omnidocs/tasks/text_extraction/dotsocr/extractor.py
extract
¶
extract(
image: Union[Image, ndarray, str, Path],
output_format: Literal[
"markdown", "html", "json"
] = "markdown",
include_layout: bool = False,
custom_prompt: Optional[str] = None,
max_tokens: int = 8192,
) -> DotsOCRTextOutput
Extract text from image using Dots OCR.
| PARAMETER | DESCRIPTION |
|---|---|
image
|
Input image (PIL Image, numpy array, or file path)
TYPE:
|
output_format
|
Output format ("markdown", "html", or "json")
TYPE:
|
include_layout
|
Include layout bounding boxes in output
TYPE:
|
custom_prompt
|
Override default extraction prompt
TYPE:
|
max_tokens
|
Maximum tokens for generation
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
DotsOCRTextOutput
|
DotsOCRTextOutput with extracted content and optional layout |
| RAISES | DESCRIPTION |
|---|---|
RuntimeError
|
If model is not loaded or inference fails |
Source code in omnidocs/tasks/text_extraction/dotsocr/extractor.py
DotsOCRPyTorchConfig
¶
Bases: BaseModel
PyTorch/HuggingFace backend configuration for Dots OCR.
Dots OCR provides layout-aware text extraction with 11 predefined layout categories (Caption, Footnote, Formula, List-item, Page-footer, Page-header, Picture, Section-header, Table, Text, Title).
Example
DotsOCRVLLMConfig
¶
Bases: BaseModel
VLLM backend configuration for Dots OCR.
VLLM provides high-throughput inference with optimizations like: - PagedAttention for efficient KV cache management - Continuous batching for higher throughput - Optimized CUDA kernels
Example
api
¶
API backend configuration for Dots OCR (VLLM online server).
DotsOCRAPIConfig
¶
Bases: BaseModel
API backend configuration for Dots OCR.
This config is for accessing a deployed VLLM server via OpenAI-compatible API. Typically used with modal_dotsocr_vllm_online.py deployment.
Example
extractor
¶
Dots OCR text extractor with layout-aware extraction.
A Vision-Language Model optimized for document OCR with structured output containing layout information, bounding boxes, and multi-format text.
Supports PyTorch, VLLM, and API backends.
Example
from omnidocs.tasks.text_extraction import DotsOCRTextExtractor
from omnidocs.tasks.text_extraction.dotsocr import DotsOCRPyTorchConfig
extractor = DotsOCRTextExtractor(
backend=DotsOCRPyTorchConfig(model="rednote-hilab/dots.ocr")
)
result = extractor.extract(image, include_layout=True)
print(result.content)
for elem in result.layout:
print(f"{elem.category}: {elem.bbox}")
DotsOCRTextExtractor
¶
Bases: BaseTextExtractor
Dots OCR Vision-Language Model text extractor with layout detection.
Extracts text from document images with layout information including: - 11 layout categories (Caption, Footnote, Formula, List-item, etc.) - Bounding boxes (normalized to 0-1024) - Multi-format text (Markdown, LaTeX, HTML) - Reading order preservation
Supports PyTorch, VLLM, and API backends.
Example
from omnidocs.tasks.text_extraction import DotsOCRTextExtractor
from omnidocs.tasks.text_extraction.dotsocr import DotsOCRPyTorchConfig
# Initialize with PyTorch backend
extractor = DotsOCRTextExtractor(
backend=DotsOCRPyTorchConfig(model="rednote-hilab/dots.ocr")
)
# Extract with layout
result = extractor.extract(image, include_layout=True)
print(f"Found {result.num_layout_elements} elements")
print(result.content)
Initialize Dots OCR text extractor.
| PARAMETER | DESCRIPTION |
|---|---|
backend
|
Backend configuration. One of: - DotsOCRPyTorchConfig: PyTorch/HuggingFace backend - DotsOCRVLLMConfig: VLLM high-throughput backend - DotsOCRAPIConfig: API backend (online VLLM server)
TYPE:
|
Source code in omnidocs/tasks/text_extraction/dotsocr/extractor.py
extract
¶
extract(
image: Union[Image, ndarray, str, Path],
output_format: Literal[
"markdown", "html", "json"
] = "markdown",
include_layout: bool = False,
custom_prompt: Optional[str] = None,
max_tokens: int = 8192,
) -> DotsOCRTextOutput
Extract text from image using Dots OCR.
| PARAMETER | DESCRIPTION |
|---|---|
image
|
Input image (PIL Image, numpy array, or file path)
TYPE:
|
output_format
|
Output format ("markdown", "html", or "json")
TYPE:
|
include_layout
|
Include layout bounding boxes in output
TYPE:
|
custom_prompt
|
Override default extraction prompt
TYPE:
|
max_tokens
|
Maximum tokens for generation
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
DotsOCRTextOutput
|
DotsOCRTextOutput with extracted content and optional layout |
| RAISES | DESCRIPTION |
|---|---|
RuntimeError
|
If model is not loaded or inference fails |
Source code in omnidocs/tasks/text_extraction/dotsocr/extractor.py
pytorch
¶
PyTorch backend configuration for Dots OCR.
DotsOCRPyTorchConfig
¶
Bases: BaseModel
PyTorch/HuggingFace backend configuration for Dots OCR.
Dots OCR provides layout-aware text extraction with 11 predefined layout categories (Caption, Footnote, Formula, List-item, Page-footer, Page-header, Picture, Section-header, Table, Text, Title).
Example
vllm
¶
VLLM backend configuration for Dots OCR.
DotsOCRVLLMConfig
¶
Bases: BaseModel
VLLM backend configuration for Dots OCR.
VLLM provides high-throughput inference with optimizations like: - PagedAttention for efficient KV cache management - Continuous batching for higher throughput - Optimized CUDA kernels