Skip to content

PyTorch

PyTorch/HuggingFace backend configuration for DeepSeek-OCR text extraction.

Both DeepSeek-OCR and DeepSeek-OCR-2 use
  • AutoModel (not AutoModelForCausalLM)
  • AutoTokenizer (not AutoProcessor)
  • model.infer(tokenizer, prompt=..., image_file=...) for inference

Requirements (from official README): python==3.12.9, CUDA==11.8 torch==2.6.0, transformers==4.46.3, tokenizers==0.20.3 einops, addict, easydict flash-attn==2.7.3 (optional, --no-build-isolation)

DeepSeekOCRTextPyTorchConfig

Bases: BaseModel

PyTorch/HuggingFace backend configuration for DeepSeek-OCR / DeepSeek-OCR-2.

Uses AutoModel + AutoTokenizer. Inference via model.infer() — the model handles tiling and multi-page PDF stitching internally.

Models

deepseek-ai/DeepSeek-OCR-2 (default, latest — Jan 2026, Apache 2.0) deepseek-ai/DeepSeek-OCR (v1 — Oct 2024, MIT)

GPU requirements: L4 / A100 (≥16 GB VRAM recommended).

Example
config = DeepSeekOCRTextPyTorchConfig(
    model="deepseek-ai/DeepSeek-OCR-2",
    use_flash_attention=True,  # requires flash-attn==2.7.3
)