Skip to content

VLLM

VLLM backend configuration for Qwen3-VL text extraction.

QwenTextVLLMConfig

Bases: BaseModel

VLLM backend configuration for Qwen text extraction.

This backend uses VLLM for high-throughput inference. Best for batch processing and production deployments. Requires: vllm, torch, transformers, qwen-vl-utils

Example
config = QwenTextVLLMConfig(
        model="Qwen/Qwen3-VL-8B-Instruct",
        tensor_parallel_size=2,
        gpu_memory_utilization=0.9,
    )