Overview¶
Qwen3-VL backend configurations and extractor for text extraction.
Available backends
- QwenTextPyTorchConfig: PyTorch/HuggingFace backend
- QwenTextVLLMConfig: VLLM high-throughput backend
- QwenTextMLXConfig: MLX backend for Apple Silicon
- QwenTextAPIConfig: API backend (OpenRouter, etc.)
Example
QwenTextAPIConfig
¶
Bases: BaseModel
API backend configuration for Qwen text extraction.
Uses litellm for provider-agnostic API access. Supports OpenRouter, Gemini, Azure, OpenAI, and any other litellm-compatible provider.
API keys can be passed directly or read from environment variables.
Example
# OpenRouter (reads OPENROUTER_API_KEY from env)
config = QwenTextAPIConfig(
model="openrouter/qwen/qwen3-vl-8b-instruct",
)
# With explicit key
config = QwenTextAPIConfig(
model="openrouter/qwen/qwen3-vl-8b-instruct",
api_key=os.environ["OPENROUTER_API_KEY"],
api_base="https://openrouter.ai/api/v1",
)
QwenTextExtractor
¶
Bases: BaseTextExtractor
Qwen3-VL Vision-Language Model text extractor.
Extracts text from document images and outputs as structured HTML or Markdown. Uses Qwen3-VL's built-in document parsing prompts.
Supports PyTorch, VLLM, MLX, and API backends.
Example
from omnidocs.tasks.text_extraction import QwenTextExtractor
from omnidocs.tasks.text_extraction.qwen import QwenTextPyTorchConfig
# Initialize with PyTorch backend
extractor = QwenTextExtractor(
backend=QwenTextPyTorchConfig(model="Qwen/Qwen3-VL-8B-Instruct")
)
# Extract as Markdown
result = extractor.extract(image, output_format="markdown")
print(result.content)
# Extract as HTML
result = extractor.extract(image, output_format="html")
print(result.content)
Initialize Qwen text extractor.
| PARAMETER | DESCRIPTION |
|---|---|
backend
|
Backend configuration. One of: - QwenTextPyTorchConfig: PyTorch/HuggingFace backend - QwenTextVLLMConfig: VLLM high-throughput backend - QwenTextMLXConfig: MLX backend for Apple Silicon - QwenTextAPIConfig: API backend (OpenRouter, etc.)
TYPE:
|
Source code in omnidocs/tasks/text_extraction/qwen/extractor.py
extract
¶
extract(
image: Union[Image, ndarray, str, Path],
output_format: Literal["html", "markdown"] = "markdown",
) -> TextOutput
Extract text from an image.
| PARAMETER | DESCRIPTION |
|---|---|
image
|
Input image as: - PIL.Image.Image: PIL image object - np.ndarray: Numpy array (HWC format, RGB) - str or Path: Path to image file
TYPE:
|
output_format
|
Desired output format: - "html": Structured HTML with div elements - "markdown": Markdown format
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
TextOutput
|
TextOutput containing extracted text content |
| RAISES | DESCRIPTION |
|---|---|
RuntimeError
|
If model is not loaded |
ValueError
|
If image format or output_format is not supported |
Source code in omnidocs/tasks/text_extraction/qwen/extractor.py
QwenTextMLXConfig
¶
Bases: BaseModel
MLX backend configuration for Qwen text extraction.
This backend uses MLX for Apple Silicon native inference. Best for local development and testing on macOS M1/M2/M3+. Requires: mlx, mlx-vlm
Note: This backend only works on Apple Silicon Macs. Do NOT use for Modal/cloud deployments.
QwenTextPyTorchConfig
¶
Bases: BaseModel
PyTorch/HuggingFace backend configuration for Qwen text extraction.
This backend uses the transformers library with PyTorch for local GPU inference. Requires: torch, transformers, accelerate, qwen-vl-utils
Example
QwenTextVLLMConfig
¶
Bases: BaseModel
VLLM backend configuration for Qwen text extraction.
This backend uses VLLM for high-throughput inference. Best for batch processing and production deployments. Requires: vllm, torch, transformers, qwen-vl-utils
Example
api
¶
API backend configuration for Qwen3-VL text extraction.
Uses litellm for provider-agnostic inference (OpenRouter, Gemini, Azure, etc.).
QwenTextAPIConfig
¶
Bases: BaseModel
API backend configuration for Qwen text extraction.
Uses litellm for provider-agnostic API access. Supports OpenRouter, Gemini, Azure, OpenAI, and any other litellm-compatible provider.
API keys can be passed directly or read from environment variables.
Example
# OpenRouter (reads OPENROUTER_API_KEY from env)
config = QwenTextAPIConfig(
model="openrouter/qwen/qwen3-vl-8b-instruct",
)
# With explicit key
config = QwenTextAPIConfig(
model="openrouter/qwen/qwen3-vl-8b-instruct",
api_key=os.environ["OPENROUTER_API_KEY"],
api_base="https://openrouter.ai/api/v1",
)
extractor
¶
Qwen3-VL text extractor.
A Vision-Language Model for extracting text from document images as structured HTML or Markdown.
Supports PyTorch, VLLM, MLX, and API backends.
Example
from omnidocs.tasks.text_extraction import QwenTextExtractor
from omnidocs.tasks.text_extraction.qwen import QwenTextPyTorchConfig
extractor = QwenTextExtractor(
backend=QwenTextPyTorchConfig(model="Qwen/Qwen3-VL-8B-Instruct")
)
result = extractor.extract(image, output_format="markdown")
print(result.content)
QwenTextExtractor
¶
Bases: BaseTextExtractor
Qwen3-VL Vision-Language Model text extractor.
Extracts text from document images and outputs as structured HTML or Markdown. Uses Qwen3-VL's built-in document parsing prompts.
Supports PyTorch, VLLM, MLX, and API backends.
Example
from omnidocs.tasks.text_extraction import QwenTextExtractor
from omnidocs.tasks.text_extraction.qwen import QwenTextPyTorchConfig
# Initialize with PyTorch backend
extractor = QwenTextExtractor(
backend=QwenTextPyTorchConfig(model="Qwen/Qwen3-VL-8B-Instruct")
)
# Extract as Markdown
result = extractor.extract(image, output_format="markdown")
print(result.content)
# Extract as HTML
result = extractor.extract(image, output_format="html")
print(result.content)
Initialize Qwen text extractor.
| PARAMETER | DESCRIPTION |
|---|---|
backend
|
Backend configuration. One of: - QwenTextPyTorchConfig: PyTorch/HuggingFace backend - QwenTextVLLMConfig: VLLM high-throughput backend - QwenTextMLXConfig: MLX backend for Apple Silicon - QwenTextAPIConfig: API backend (OpenRouter, etc.)
TYPE:
|
Source code in omnidocs/tasks/text_extraction/qwen/extractor.py
extract
¶
extract(
image: Union[Image, ndarray, str, Path],
output_format: Literal["html", "markdown"] = "markdown",
) -> TextOutput
Extract text from an image.
| PARAMETER | DESCRIPTION |
|---|---|
image
|
Input image as: - PIL.Image.Image: PIL image object - np.ndarray: Numpy array (HWC format, RGB) - str or Path: Path to image file
TYPE:
|
output_format
|
Desired output format: - "html": Structured HTML with div elements - "markdown": Markdown format
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
TextOutput
|
TextOutput containing extracted text content |
| RAISES | DESCRIPTION |
|---|---|
RuntimeError
|
If model is not loaded |
ValueError
|
If image format or output_format is not supported |
Source code in omnidocs/tasks/text_extraction/qwen/extractor.py
mlx
¶
MLX backend configuration for Qwen3-VL text extraction.
QwenTextMLXConfig
¶
Bases: BaseModel
MLX backend configuration for Qwen text extraction.
This backend uses MLX for Apple Silicon native inference. Best for local development and testing on macOS M1/M2/M3+. Requires: mlx, mlx-vlm
Note: This backend only works on Apple Silicon Macs. Do NOT use for Modal/cloud deployments.
pytorch
¶
PyTorch/HuggingFace backend configuration for Qwen3-VL text extraction.
QwenTextPyTorchConfig
¶
Bases: BaseModel
PyTorch/HuggingFace backend configuration for Qwen text extraction.
This backend uses the transformers library with PyTorch for local GPU inference. Requires: torch, transformers, accelerate, qwen-vl-utils
Example
vllm
¶
VLLM backend configuration for Qwen3-VL text extraction.
QwenTextVLLMConfig
¶
Bases: BaseModel
VLLM backend configuration for Qwen text extraction.
This backend uses VLLM for high-throughput inference. Best for batch processing and production deployments. Requires: vllm, torch, transformers, qwen-vl-utils