Extractor¶
GLM-OCR text extractor.
GLM-OCR from zai-org (Feb 2026) — 0.9B OCR-specialist model. Architecture: CogViT visual encoder (0.4B) + GLM decoder (0.5B). Scores #1 on OmniDocBench V1.5 (94.62).
Key differences from GLM-V
- Uses AutoModelForImageTextToText (NOT Glm4vForConditionalGeneration)
- Uses AutoProcessor with direct image input (no chat template URL trick)
- Much smaller (0.9B vs 9B) — faster, lower VRAM
- Requires transformers>=5.3.0
- No
tokens, no <|begin_of_box|> — clean output
GLMOCRTextExtractor
¶
Bases: BaseTextExtractor
GLM-OCR text extractor (zai-org/GLM-OCR, 0.9B, Feb 2026).
Purpose-built OCR model, #1 on OmniDocBench V1.5.
Faster and cheaper than GLM-V for pure document OCR tasks.
Example:
python
from omnidocs.tasks.text_extraction import GLMOCRTextExtractor
from omnidocs.tasks.text_extraction.glmocr import GLMOCRPyTorchConfig
extractor = GLMOCRTextExtractor(backend=GLMOCRPyTorchConfig())
result = extractor.extract(image)
print(result.content)