Overview¶
Nanonets OCR2-3B backend configurations and extractor for text extraction.
Available backends
- NanonetsTextPyTorchConfig: PyTorch/HuggingFace backend
- NanonetsTextVLLMConfig: VLLM high-throughput backend
- NanonetsTextMLXConfig: MLX backend for Apple Silicon
Example
NanonetsTextExtractor
¶
Bases: BaseTextExtractor
Nanonets OCR2-3B Vision-Language Model text extractor.
Extracts text from document images with support for:
- Tables (output as HTML)
- Equations (output as LaTeX)
- Image captions (wrapped in tags)
- Watermarks (wrapped in
Supports PyTorch, VLLM, and MLX backends.
Example
from omnidocs.tasks.text_extraction import NanonetsTextExtractor
from omnidocs.tasks.text_extraction.nanonets import NanonetsTextPyTorchConfig
# Initialize with PyTorch backend
extractor = NanonetsTextExtractor(
backend=NanonetsTextPyTorchConfig()
)
# Extract text
result = extractor.extract(image)
print(result.content)
Initialize Nanonets text extractor.
| PARAMETER | DESCRIPTION |
|---|---|
backend
|
Backend configuration. One of: - NanonetsTextPyTorchConfig: PyTorch/HuggingFace backend - NanonetsTextVLLMConfig: VLLM high-throughput backend - NanonetsTextMLXConfig: MLX backend for Apple Silicon
TYPE:
|
Source code in omnidocs/tasks/text_extraction/nanonets/extractor.py
extract
¶
extract(
image: Union[Image, ndarray, str, Path],
output_format: Literal["html", "markdown"] = "markdown",
) -> TextOutput
Extract text from an image.
Note: Nanonets OCR2 produces a unified output format that includes tables as HTML and equations as LaTeX inline. The output_format parameter is accepted for API compatibility but does not change the output structure.
| PARAMETER | DESCRIPTION |
|---|---|
image
|
Input image as: - PIL.Image.Image: PIL image object - np.ndarray: Numpy array (HWC format, RGB) - str or Path: Path to image file
TYPE:
|
output_format
|
Accepted for API compatibility (default: "markdown")
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
TextOutput
|
TextOutput containing extracted text content |
| RAISES | DESCRIPTION |
|---|---|
RuntimeError
|
If model is not loaded |
ValueError
|
If image format is not supported |
Source code in omnidocs/tasks/text_extraction/nanonets/extractor.py
NanonetsTextMLXConfig
¶
Bases: BaseModel
MLX backend configuration for Nanonets OCR2-3B text extraction.
This backend uses MLX for Apple Silicon native inference. Best for local development and testing on macOS M1/M2/M3/M4+. Requires: mlx, mlx-vlm
Note: This backend only works on Apple Silicon Macs. Do NOT use for Modal/cloud deployments.
NanonetsTextPyTorchConfig
¶
Bases: BaseModel
PyTorch/HuggingFace backend configuration for Nanonets OCR2-3B text extraction.
This backend uses the transformers library with PyTorch for local GPU inference. Requires: torch, transformers, accelerate
NanonetsTextVLLMConfig
¶
Bases: BaseModel
VLLM backend configuration for Nanonets OCR2-3B text extraction.
This backend uses VLLM for high-throughput inference. Best for batch processing and production deployments. Requires: vllm, torch, transformers, qwen-vl-utils
extractor
¶
Nanonets OCR2-3B text extractor.
A Vision-Language Model for extracting text from document images with support for tables (HTML), equations (LaTeX), and image captions.
Supports PyTorch and VLLM backends.
Example
NanonetsTextExtractor
¶
Bases: BaseTextExtractor
Nanonets OCR2-3B Vision-Language Model text extractor.
Extracts text from document images with support for:
- Tables (output as HTML)
- Equations (output as LaTeX)
- Image captions (wrapped in tags)
- Watermarks (wrapped in
Supports PyTorch, VLLM, and MLX backends.
Example
from omnidocs.tasks.text_extraction import NanonetsTextExtractor
from omnidocs.tasks.text_extraction.nanonets import NanonetsTextPyTorchConfig
# Initialize with PyTorch backend
extractor = NanonetsTextExtractor(
backend=NanonetsTextPyTorchConfig()
)
# Extract text
result = extractor.extract(image)
print(result.content)
Initialize Nanonets text extractor.
| PARAMETER | DESCRIPTION |
|---|---|
backend
|
Backend configuration. One of: - NanonetsTextPyTorchConfig: PyTorch/HuggingFace backend - NanonetsTextVLLMConfig: VLLM high-throughput backend - NanonetsTextMLXConfig: MLX backend for Apple Silicon
TYPE:
|
Source code in omnidocs/tasks/text_extraction/nanonets/extractor.py
extract
¶
extract(
image: Union[Image, ndarray, str, Path],
output_format: Literal["html", "markdown"] = "markdown",
) -> TextOutput
Extract text from an image.
Note: Nanonets OCR2 produces a unified output format that includes tables as HTML and equations as LaTeX inline. The output_format parameter is accepted for API compatibility but does not change the output structure.
| PARAMETER | DESCRIPTION |
|---|---|
image
|
Input image as: - PIL.Image.Image: PIL image object - np.ndarray: Numpy array (HWC format, RGB) - str or Path: Path to image file
TYPE:
|
output_format
|
Accepted for API compatibility (default: "markdown")
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
TextOutput
|
TextOutput containing extracted text content |
| RAISES | DESCRIPTION |
|---|---|
RuntimeError
|
If model is not loaded |
ValueError
|
If image format is not supported |
Source code in omnidocs/tasks/text_extraction/nanonets/extractor.py
mlx
¶
MLX backend configuration for Nanonets OCR2-3B text extraction.
NanonetsTextMLXConfig
¶
Bases: BaseModel
MLX backend configuration for Nanonets OCR2-3B text extraction.
This backend uses MLX for Apple Silicon native inference. Best for local development and testing on macOS M1/M2/M3/M4+. Requires: mlx, mlx-vlm
Note: This backend only works on Apple Silicon Macs. Do NOT use for Modal/cloud deployments.
pytorch
¶
PyTorch/HuggingFace backend configuration for Nanonets OCR2-3B text extraction.
NanonetsTextPyTorchConfig
¶
Bases: BaseModel
PyTorch/HuggingFace backend configuration for Nanonets OCR2-3B text extraction.
This backend uses the transformers library with PyTorch for local GPU inference. Requires: torch, transformers, accelerate
vllm
¶
VLLM backend configuration for Nanonets OCR2-3B text extraction.
NanonetsTextVLLMConfig
¶
Bases: BaseModel
VLLM backend configuration for Nanonets OCR2-3B text extraction.
This backend uses VLLM for high-throughput inference. Best for batch processing and production deployments. Requires: vllm, torch, transformers, qwen-vl-utils