Overview¶
Qwen3-VL backend configurations and detector for layout detection.
Available backends
- QwenLayoutPyTorchConfig: PyTorch/HuggingFace backend
- QwenLayoutVLLMConfig: VLLM high-throughput backend
- QwenLayoutMLXConfig: MLX backend for Apple Silicon
- QwenLayoutAPIConfig: API backend (OpenRouter, etc.)
Example
QwenLayoutAPIConfig
¶
Bases: BaseModel
API backend configuration for Qwen layout detection.
Uses litellm for provider-agnostic API access. Supports OpenRouter, Gemini, Azure, OpenAI, and any other litellm-compatible provider.
API keys can be passed directly or read from environment variables.
Example
# OpenRouter (reads OPENROUTER_API_KEY from env)
config = QwenLayoutAPIConfig(
model="openrouter/qwen/qwen3-vl-8b-instruct",
)
# With explicit key
config = QwenLayoutAPIConfig(
model="openrouter/qwen/qwen3-vl-8b-instruct",
api_key=os.environ["OPENROUTER_API_KEY"],
api_base="https://openrouter.ai/api/v1",
)
QwenLayoutDetector
¶
Bases: BaseLayoutExtractor
Qwen3-VL Vision-Language Model layout detector.
A flexible VLM-based layout detector that supports custom labels. Unlike fixed-label models (DocLayoutYOLO, RT-DETR), Qwen can detect any document elements specified at runtime.
Supports PyTorch, VLLM, MLX, and API backends.
Example
from omnidocs.tasks.layout_extraction import QwenLayoutDetector, CustomLabel
from omnidocs.tasks.layout_extraction.qwen import QwenLayoutPyTorchConfig
# Initialize with PyTorch backend
detector = QwenLayoutDetector(
backend=QwenLayoutPyTorchConfig(model="Qwen/Qwen3-VL-8B-Instruct")
)
# Basic extraction with default labels
result = detector.extract(image)
# With custom labels (strings)
result = detector.extract(image, custom_labels=["code_block", "sidebar"])
# With typed custom labels
labels = [
CustomLabel(name="code_block", color="#E74C3C"),
CustomLabel(name="sidebar", description="Side panel content"),
]
result = detector.extract(image, custom_labels=labels)
Initialize Qwen layout detector.
| PARAMETER | DESCRIPTION |
|---|---|
backend
|
Backend configuration. One of: - QwenLayoutPyTorchConfig: PyTorch/HuggingFace backend - QwenLayoutVLLMConfig: VLLM high-throughput backend - QwenLayoutMLXConfig: MLX backend for Apple Silicon - QwenLayoutAPIConfig: API backend (OpenRouter, etc.)
TYPE:
|
Source code in omnidocs/tasks/layout_extraction/qwen/detector.py
extract
¶
extract(
image: Union[Image, ndarray, str, Path],
custom_labels: Optional[
List[Union[str, CustomLabel]]
] = None,
) -> LayoutOutput
Run layout detection on an image.
| PARAMETER | DESCRIPTION |
|---|---|
image
|
Input image as: - PIL.Image.Image: PIL image object - np.ndarray: Numpy array (HWC format, RGB) - str or Path: Path to image file
TYPE:
|
custom_labels
|
Optional custom labels to detect. Can be: - None: Use default labels (title, text, table, figure, etc.) - List[str]: Simple label names ["code_block", "sidebar"] - List[CustomLabel]: Typed labels with metadata
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
LayoutOutput
|
LayoutOutput with detected layout boxes |
| RAISES | DESCRIPTION |
|---|---|
RuntimeError
|
If model is not loaded |
ValueError
|
If image format is not supported |
Source code in omnidocs/tasks/layout_extraction/qwen/detector.py
QwenLayoutMLXConfig
¶
Bases: BaseModel
MLX backend configuration for Qwen layout detection.
This backend uses MLX for Apple Silicon native inference. Best for local development and testing on macOS M1/M2/M3+. Requires: mlx, mlx-vlm
Note: This backend only works on Apple Silicon Macs. Do NOT use for Modal/cloud deployments.
QwenLayoutPyTorchConfig
¶
Bases: BaseModel
PyTorch/HuggingFace backend configuration for Qwen layout detection.
This backend uses the transformers library with PyTorch for local GPU inference. Requires: torch, transformers, accelerate, qwen-vl-utils
Example
QwenLayoutVLLMConfig
¶
Bases: BaseModel
VLLM backend configuration for Qwen layout detection.
This backend uses VLLM for high-throughput inference. Best for batch processing and production deployments. Requires: vllm, torch, transformers, qwen-vl-utils
Example
api
¶
API backend configuration for Qwen3-VL layout detection.
Uses litellm for provider-agnostic inference (OpenRouter, Gemini, Azure, etc.).
QwenLayoutAPIConfig
¶
Bases: BaseModel
API backend configuration for Qwen layout detection.
Uses litellm for provider-agnostic API access. Supports OpenRouter, Gemini, Azure, OpenAI, and any other litellm-compatible provider.
API keys can be passed directly or read from environment variables.
Example
# OpenRouter (reads OPENROUTER_API_KEY from env)
config = QwenLayoutAPIConfig(
model="openrouter/qwen/qwen3-vl-8b-instruct",
)
# With explicit key
config = QwenLayoutAPIConfig(
model="openrouter/qwen/qwen3-vl-8b-instruct",
api_key=os.environ["OPENROUTER_API_KEY"],
api_base="https://openrouter.ai/api/v1",
)
detector
¶
Qwen3-VL layout detector.
A Vision-Language Model for flexible layout detection with custom label support. Supports PyTorch, VLLM, MLX, and API backends.
Example
from omnidocs.tasks.layout_extraction import QwenLayoutDetector
from omnidocs.tasks.layout_extraction.qwen import QwenLayoutPyTorchConfig
detector = QwenLayoutDetector(
backend=QwenLayoutPyTorchConfig(model="Qwen/Qwen3-VL-8B-Instruct")
)
result = detector.extract(image)
# With custom labels
result = detector.extract(image, custom_labels=["code_block", "sidebar"])
QwenLayoutDetector
¶
Bases: BaseLayoutExtractor
Qwen3-VL Vision-Language Model layout detector.
A flexible VLM-based layout detector that supports custom labels. Unlike fixed-label models (DocLayoutYOLO, RT-DETR), Qwen can detect any document elements specified at runtime.
Supports PyTorch, VLLM, MLX, and API backends.
Example
from omnidocs.tasks.layout_extraction import QwenLayoutDetector, CustomLabel
from omnidocs.tasks.layout_extraction.qwen import QwenLayoutPyTorchConfig
# Initialize with PyTorch backend
detector = QwenLayoutDetector(
backend=QwenLayoutPyTorchConfig(model="Qwen/Qwen3-VL-8B-Instruct")
)
# Basic extraction with default labels
result = detector.extract(image)
# With custom labels (strings)
result = detector.extract(image, custom_labels=["code_block", "sidebar"])
# With typed custom labels
labels = [
CustomLabel(name="code_block", color="#E74C3C"),
CustomLabel(name="sidebar", description="Side panel content"),
]
result = detector.extract(image, custom_labels=labels)
Initialize Qwen layout detector.
| PARAMETER | DESCRIPTION |
|---|---|
backend
|
Backend configuration. One of: - QwenLayoutPyTorchConfig: PyTorch/HuggingFace backend - QwenLayoutVLLMConfig: VLLM high-throughput backend - QwenLayoutMLXConfig: MLX backend for Apple Silicon - QwenLayoutAPIConfig: API backend (OpenRouter, etc.)
TYPE:
|
Source code in omnidocs/tasks/layout_extraction/qwen/detector.py
extract
¶
extract(
image: Union[Image, ndarray, str, Path],
custom_labels: Optional[
List[Union[str, CustomLabel]]
] = None,
) -> LayoutOutput
Run layout detection on an image.
| PARAMETER | DESCRIPTION |
|---|---|
image
|
Input image as: - PIL.Image.Image: PIL image object - np.ndarray: Numpy array (HWC format, RGB) - str or Path: Path to image file
TYPE:
|
custom_labels
|
Optional custom labels to detect. Can be: - None: Use default labels (title, text, table, figure, etc.) - List[str]: Simple label names ["code_block", "sidebar"] - List[CustomLabel]: Typed labels with metadata
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
LayoutOutput
|
LayoutOutput with detected layout boxes |
| RAISES | DESCRIPTION |
|---|---|
RuntimeError
|
If model is not loaded |
ValueError
|
If image format is not supported |
Source code in omnidocs/tasks/layout_extraction/qwen/detector.py
mlx
¶
MLX backend configuration for Qwen3-VL layout detection.
QwenLayoutMLXConfig
¶
Bases: BaseModel
MLX backend configuration for Qwen layout detection.
This backend uses MLX for Apple Silicon native inference. Best for local development and testing on macOS M1/M2/M3+. Requires: mlx, mlx-vlm
Note: This backend only works on Apple Silicon Macs. Do NOT use for Modal/cloud deployments.
pytorch
¶
PyTorch/HuggingFace backend configuration for Qwen3-VL layout detection.
QwenLayoutPyTorchConfig
¶
Bases: BaseModel
PyTorch/HuggingFace backend configuration for Qwen layout detection.
This backend uses the transformers library with PyTorch for local GPU inference. Requires: torch, transformers, accelerate, qwen-vl-utils
Example
vllm
¶
VLLM backend configuration for Qwen3-VL layout detection.
QwenLayoutVLLMConfig
¶
Bases: BaseModel
VLLM backend configuration for Qwen layout detection.
This backend uses VLLM for high-throughput inference. Best for batch processing and production deployments. Requires: vllm, torch, transformers, qwen-vl-utils