Vlm¶
VLM structured extractor.
A provider-agnostic Vision-Language Model structured extractor using litellm. Extracts structured data matching a Pydantic schema from document images.
Example
from pydantic import BaseModel
from omnidocs.vlm import VLMAPIConfig
from omnidocs.tasks.structured_extraction import VLMStructuredExtractor
class Invoice(BaseModel):
vendor: str
total: float
items: list[str]
date: str
config = VLMAPIConfig(model="gemini/gemini-2.5-flash")
extractor = VLMStructuredExtractor(config=config)
result = extractor.extract(
image="invoice.png",
schema=Invoice,
prompt="Extract invoice details from this document.",
)
print(result.data.vendor, result.data.total)
VLMStructuredExtractor
¶
Bases: BaseStructuredExtractor
Provider-agnostic VLM structured extractor using litellm.
Extracts structured data from document images using any cloud VLM API. Uses litellm's native response_format support to send Pydantic schemas to providers that support structured output (OpenAI, Gemini, etc.).
Example
from pydantic import BaseModel
from omnidocs.vlm import VLMAPIConfig
from omnidocs.tasks.structured_extraction import VLMStructuredExtractor
class Invoice(BaseModel):
vendor: str
total: float
items: list[str]
config = VLMAPIConfig(model="gemini/gemini-2.5-flash")
extractor = VLMStructuredExtractor(config=config)
result = extractor.extract("invoice.png", schema=Invoice, prompt="Extract invoice fields")
print(result.data.vendor)
Initialize VLM structured extractor.
| PARAMETER | DESCRIPTION |
|---|---|
config
|
VLM API configuration with model and provider details.
TYPE:
|
Source code in omnidocs/tasks/structured_extraction/vlm.py
extract
¶
extract(
image: Union[Image, ndarray, str, Path],
schema: type[BaseModel],
prompt: str,
) -> StructuredOutput
Extract structured data from an image.
| PARAMETER | DESCRIPTION |
|---|---|
image
|
Input image (PIL Image, numpy array, or file path).
TYPE:
|
schema
|
Pydantic model class defining the expected output structure.
TYPE:
|
prompt
|
Extraction prompt describing what to extract.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
StructuredOutput
|
StructuredOutput containing the validated data. |