PyTorch¶
PyTorch/HuggingFace backend configuration for DeepSeek-OCR text extraction.
Both DeepSeek-OCR and DeepSeek-OCR-2 use
- AutoModel (not AutoModelForCausalLM)
- AutoTokenizer (not AutoProcessor)
- model.infer(tokenizer, prompt=..., image_file=...) for inference
Requirements (from official README): python==3.12.9, CUDA==11.8 torch==2.6.0, transformers==4.46.3, tokenizers==0.20.3 einops, addict, easydict flash-attn==2.7.3 (optional, --no-build-isolation)
DeepSeekOCRTextPyTorchConfig
¶
Bases: BaseModel
PyTorch/HuggingFace backend configuration for DeepSeek-OCR / DeepSeek-OCR-2.
Uses AutoModel + AutoTokenizer. Inference via model.infer() — the model handles tiling and multi-page PDF stitching internally.
Models
deepseek-ai/DeepSeek-OCR-2 (default, latest — Jan 2026, Apache 2.0) deepseek-ai/DeepSeek-OCR (v1 — Oct 2024, MIT)
GPU requirements: L4 / A100 (≥16 GB VRAM recommended).