Skip to content

Extractor

GLM-OCR text extractor.

GLM-OCR from zai-org (Feb 2026) — 0.9B OCR-specialist model. Architecture: CogViT visual encoder (0.4B) + GLM decoder (0.5B). Scores #1 on OmniDocBench V1.5 (94.62).

Key differences from GLM-V
  • Uses AutoModelForImageTextToText (NOT Glm4vForConditionalGeneration)
  • Uses AutoProcessor with direct image input (no chat template URL trick)
  • Much smaller (0.9B vs 9B) — faster, lower VRAM
  • Requires transformers>=5.3.0
  • No tokens, no <|begin_of_box|> — clean output

GLMOCRTextExtractor

GLMOCRTextExtractor(backend: GLMOCRBackendConfig)

Bases: BaseTextExtractor

GLM-OCR text extractor (zai-org/GLM-OCR, 0.9B, Feb 2026).

Purpose-built OCR model, #1 on OmniDocBench V1.5.
Faster and cheaper than GLM-V for pure document OCR tasks.

Example:

python from omnidocs.tasks.text_extraction import GLMOCRTextExtractor from omnidocs.tasks.text_extraction.glmocr import GLMOCRPyTorchConfig extractor = GLMOCRTextExtractor(backend=GLMOCRPyTorchConfig()) result = extractor.extract(image) print(result.content)

Source code in omnidocs/tasks/text_extraction/glmocr/extractor.py
def __init__(self, backend: GLMOCRBackendConfig):
    self.backend_config = backend
    self._backend: Any = None
    self._processor: Any = None
    self._loaded = False
    self._sampling_params_class: Any = None
    self._mlx_config: Any = None
    self._apply_chat_template: Any = None
    self._generate: Any = None
    self._load_model()