VLLM¶
VLLM backend configuration for LightOn text extraction.
LightOnTextVLLMConfig
¶
Bases: BaseModel
VLLM backend config for LightOn text extraction.
Uses VLLM for high-throughput GPU inference with: - PagedAttention for efficient KV cache - Continuous batching - Optimized CUDA kernels