Skip to content

VLLM

VLLM backend configuration for DeepSeek-OCR text extraction.

DeepSeek-OCR has official upstream VLLM support (announced Oct 23 2025). Achieves ~2500 tokens/s on A100-40G — the recommended backend for production.

DeepSeek-OCR-2 VLLM support: refer to https://github.com/deepseek-ai/DeepSeek-OCR-2 for the latest vLLM setup instructions (may require nightly build).

DeepSeekOCRTextVLLMConfig

Bases: BaseModel

VLLM backend configuration for DeepSeek-OCR / DeepSeek-OCR-2 text extraction.

DeepSeek-OCR has official upstream VLLM support (~2500 tokens/s on A100). Recommended for high-throughput batch document processing in production. Requires: vllm>=0.11.1 (or nightly for OCR-2), torch, transformers==4.46.3

Note: Default model is DeepSeek-OCR v1 (not v2) because DeepSeek-OCR-2 VLLM support requires a vllm nightly build. Use PyTorch backend for DeepSeek-OCR-2 until official vllm support is released.

Example
config = DeepSeekOCRTextVLLMConfig(
    model="deepseek-ai/DeepSeek-OCR-2",
    tensor_parallel_size=1,
    gpu_memory_utilization=0.9,
)