Overview¶
GLM-OCR backend configurations and extractor for text extraction.
GLM-OCR from zai-org (Feb 2026) — 0.9B OCR-specialist model. Architecture: CogViT visual encoder (0.4B) + GLM decoder (0.5B). Scores #1 on OmniDocBench V1.5 (94.62), beating models 10x its size.
Unlike GLM-V (which is a general VLM), GLM-OCR is purpose-built for document OCR. Uses AutoModelForImageTextToText + AutoProcessor (NOT Glm4vForConditionalGeneration). Requires transformers>=5.3.0.
Available backends
- GLMOCRPyTorchConfig: PyTorch/HuggingFace backend
- GLMOCRVLLMConfig: VLLM high-throughput backend (with MTP speculative decoding)
- GLMOCRAPIConfig: API backend
HuggingFace: zai-org/GLM-OCR License: Apache 2.0
GLMOCRAPIConfig
¶
Bases: BaseModel
API backend configuration for GLM-OCR.
Primary provider: ZhipuAI / BigModel (official) — get key at open.bigmodel.cn.
Example:
python
# Self-hosted vLLM server
config = GLMOCRAPIConfig(
model="zai-org/GLM-OCR",
api_base="http://localhost:8000/v1",
api_key="token-abc",
)
GLMOCRTextExtractor
¶
Bases: BaseTextExtractor
GLM-OCR text extractor (zai-org/GLM-OCR, 0.9B, Feb 2026).
Purpose-built OCR model, #1 on OmniDocBench V1.5.
Faster and cheaper than GLM-V for pure document OCR tasks.
Example:
python
from omnidocs.tasks.text_extraction import GLMOCRTextExtractor
from omnidocs.tasks.text_extraction.glmocr import GLMOCRPyTorchConfig
extractor = GLMOCRTextExtractor(backend=GLMOCRPyTorchConfig())
result = extractor.extract(image)
print(result.content)
Source code in omnidocs/tasks/text_extraction/glmocr/extractor.py
GLMOCRMLXConfig
¶
Bases: BaseModel
MLX backend configuration for GLM-OCR.
Uses mlx-vlm for Apple Silicon native inference.
GLM-OCR at 0.9B runs comfortably on any M-series Mac with 8GB+ unified memory.
Requires: mlx, mlx-vlm>=0.3.11
Note: Only works on Apple Silicon Macs. Do NOT use for Modal/cloud deployments.
Available models:
mlx-community/GLM-OCR-bf16 (default — full precision, 2.21 GB)
mlx-community/GLM-OCR-6bit (quantized, smaller)
Example:
python
config = GLMOCRMLXConfig() # bf16, default
config = GLMOCRMLXConfig(model="mlx-community/GLM-OCR-6bit") # quantized
GLMOCRPyTorchConfig
¶
Bases: BaseModel
PyTorch/HuggingFace backend configuration for GLM-OCR.
GLM-OCR uses AutoModelForImageTextToText + AutoProcessor.
Requires transformers>=5.3.0.
Example:
python
config = GLMOCRPyTorchConfig() # zai-org/GLM-OCR, default
config = GLMOCRPyTorchConfig(device="mps") # Apple Silicon
GLMOCRVLLMConfig
¶
Bases: BaseModel
VLLM backend configuration for GLM-OCR.
GLM-OCR supports VLLM with MTP (Multi-Token Prediction) speculative decoding
for significantly higher throughput. Requires vllm>=0.17.0 and transformers>=5.3.0.
Example:
python
config = GLMOCRVLLMConfig(gpu_memory_utilization=0.85)
api
¶
API backend configuration for GLM-OCR text extraction.
GLMOCRAPIConfig
¶
Bases: BaseModel
API backend configuration for GLM-OCR.
Primary provider: ZhipuAI / BigModel (official) — get key at open.bigmodel.cn.
Example:
python
# Self-hosted vLLM server
config = GLMOCRAPIConfig(
model="zai-org/GLM-OCR",
api_base="http://localhost:8000/v1",
api_key="token-abc",
)
extractor
¶
GLM-OCR text extractor.
GLM-OCR from zai-org (Feb 2026) — 0.9B OCR-specialist model. Architecture: CogViT visual encoder (0.4B) + GLM decoder (0.5B). Scores #1 on OmniDocBench V1.5 (94.62).
Key differences from GLM-V
- Uses AutoModelForImageTextToText (NOT Glm4vForConditionalGeneration)
- Uses AutoProcessor with direct image input (no chat template URL trick)
- Much smaller (0.9B vs 9B) — faster, lower VRAM
- Requires transformers>=5.3.0
- No
tokens, no <|begin_of_box|> — clean output
GLMOCRTextExtractor
¶
Bases: BaseTextExtractor
GLM-OCR text extractor (zai-org/GLM-OCR, 0.9B, Feb 2026).
Purpose-built OCR model, #1 on OmniDocBench V1.5.
Faster and cheaper than GLM-V for pure document OCR tasks.
Example:
python
from omnidocs.tasks.text_extraction import GLMOCRTextExtractor
from omnidocs.tasks.text_extraction.glmocr import GLMOCRPyTorchConfig
extractor = GLMOCRTextExtractor(backend=GLMOCRPyTorchConfig())
result = extractor.extract(image)
print(result.content)
Source code in omnidocs/tasks/text_extraction/glmocr/extractor.py
mlx
¶
MLX backend configuration for GLM-OCR text extraction.
GLMOCRMLXConfig
¶
Bases: BaseModel
MLX backend configuration for GLM-OCR.
Uses mlx-vlm for Apple Silicon native inference.
GLM-OCR at 0.9B runs comfortably on any M-series Mac with 8GB+ unified memory.
Requires: mlx, mlx-vlm>=0.3.11
Note: Only works on Apple Silicon Macs. Do NOT use for Modal/cloud deployments.
Available models:
mlx-community/GLM-OCR-bf16 (default — full precision, 2.21 GB)
mlx-community/GLM-OCR-6bit (quantized, smaller)
Example:
python
config = GLMOCRMLXConfig() # bf16, default
config = GLMOCRMLXConfig(model="mlx-community/GLM-OCR-6bit") # quantized
pytorch
¶
PyTorch backend configuration for GLM-OCR text extraction.
GLMOCRPyTorchConfig
¶
Bases: BaseModel
PyTorch/HuggingFace backend configuration for GLM-OCR.
GLM-OCR uses AutoModelForImageTextToText + AutoProcessor.
Requires transformers>=5.3.0.
Example:
python
config = GLMOCRPyTorchConfig() # zai-org/GLM-OCR, default
config = GLMOCRPyTorchConfig(device="mps") # Apple Silicon
vllm
¶
VLLM backend configuration for GLM-OCR text extraction.
GLMOCRVLLMConfig
¶
Bases: BaseModel
VLLM backend configuration for GLM-OCR.
GLM-OCR supports VLLM with MTP (Multi-Token Prediction) speculative decoding
for significantly higher throughput. Requires vllm>=0.17.0 and transformers>=5.3.0.
Example:
python
config = GLMOCRVLLMConfig(gpu_memory_utilization=0.85)