Client¶
VLM completion utilities using litellm for provider-agnostic inference.
vlm_completion
¶
Send image + prompt to any VLM via litellm. Returns raw text.
| PARAMETER | DESCRIPTION |
|---|---|
config
|
VLM API configuration.
TYPE:
|
prompt
|
Text prompt to send with the image.
TYPE:
|
image
|
PIL Image to send.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str
|
Raw text response from the model. |
Source code in omnidocs/vlm/client.py
vlm_structured_completion
¶
vlm_structured_completion(
config: VLMAPIConfig,
prompt: str,
image: Image,
response_schema: type[BaseModel],
) -> BaseModel
Send image + prompt, get structured Pydantic output.
Tries two strategies: 1. litellm's native response_format (works with OpenAI, Gemini, etc.) 2. Fallback: prompt-based JSON extraction for providers that don't support response_format (OpenRouter, some open-source models)
| PARAMETER | DESCRIPTION |
|---|---|
config
|
VLM API configuration.
TYPE:
|
prompt
|
Text prompt to send with the image.
TYPE:
|
image
|
PIL Image to send.
TYPE:
|
response_schema
|
Pydantic model class for structured output.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
BaseModel
|
Validated instance of response_schema. |