Skip to content

Qwen

Vision-language model for text extraction and layout analysis.


Overview

Tasks Text Extraction, Layout Analysis
Backends PyTorch, VLLM, MLX, API
Speed 2-3s/page
Quality Excellent
VRAM 8-16GB (8B model)

Text Extraction

from omnidocs import Document
from omnidocs.tasks.text_extraction import QwenTextExtractor
from omnidocs.tasks.text_extraction.qwen import QwenPyTorchConfig

doc = Document.from_pdf("document.pdf")

extractor = QwenTextExtractor(
    backend=QwenPyTorchConfig(device="cuda")
)

result = extractor.extract(doc.get_page(0), output_format="markdown")
print(result.content)

Layout Analysis

from omnidocs.tasks.layout_analysis import QwenLayoutDetector
from omnidocs.tasks.layout_analysis.qwen import QwenLayoutPyTorchConfig

detector = QwenLayoutDetector(
    backend=QwenLayoutPyTorchConfig(device="cuda")
)

result = detector.extract(image)
for elem in result.elements:
    print(f"{elem.label}: {elem.bbox}")

Custom Labels

from omnidocs.tasks.layout_analysis import CustomLabel

custom_labels = [
    CustomLabel(name="code_block", description="Code snippets"),
    CustomLabel(name="sidebar", description="Sidebar content"),
]

result = detector.extract(image, custom_labels=custom_labels)

Backend Configs

PyTorch (Local GPU)

from omnidocs.tasks.text_extraction.qwen import QwenPyTorchConfig

config = QwenPyTorchConfig(
    model="Qwen/Qwen3-VL-8B-Instruct",
    device="cuda",              # "cuda", "cpu", "mps"
    torch_dtype="bfloat16",
)

VLLM (High Throughput)

from omnidocs.tasks.text_extraction.qwen import QwenVLLMConfig

config = QwenVLLMConfig(
    model="Qwen/Qwen3-VL-8B-Instruct",
    tensor_parallel_size=1,     # GPUs to use
    gpu_memory_utilization=0.9,
)

MLX (Apple Silicon)

from omnidocs.tasks.text_extraction.qwen import QwenMLXConfig

config = QwenMLXConfig(
    model="Qwen/Qwen3-VL-2B-Instruct",
    quantization="4bit",
)

API (Cloud)

from omnidocs.tasks.text_extraction.qwen import QwenAPIConfig

config = QwenAPIConfig(
    model="qwen3-vl-8b",
    api_key="YOUR_API_KEY",
    base_url="https://api.provider.com/v1",
)

Model Variants

Model Parameters VRAM Quality Speed
Qwen/Qwen3-VL-2B-Instruct 2B 4GB Good Fast
Qwen/Qwen3-VL-8B-Instruct 8B 16GB Excellent Medium
Qwen/Qwen3-VL-32B-Instruct 32B 64GB Outstanding Slow

Recommendation: Start with 8B for best quality/speed balance.


Troubleshooting

CUDA out of memory

# Use smaller model
config = QwenPyTorchConfig(model="Qwen/Qwen3-VL-2B-Instruct")

Slow inference

# Use VLLM backend
config = QwenVLLMConfig(tensor_parallel_size=1)

No GPU

# Use API backend
config = QwenAPIConfig(api_key="...", base_url="...")