Skip to content

VLLM

VLLM backend configuration for GLM-OCR text extraction.

GLMOCRVLLMConfig

Bases: BaseModel

VLLM backend configuration for GLM-OCR.

GLM-OCR supports VLLM with MTP (Multi-Token Prediction) speculative decoding
for significantly higher throughput. Requires vllm>=0.17.0 and transformers>=5.3.0.

Example:

python config = GLMOCRVLLMConfig(gpu_memory_utilization=0.85)