Skip to content

Overview

Granite Docling text extraction with multi-backend support.

GraniteDoclingTextAPIConfig

Bases: BaseModel

Configuration for Granite Docling text extraction via API.

Uses litellm for provider-agnostic API access. Supports OpenRouter, Gemini, Azure, OpenAI, and any other litellm-compatible provider.

API keys can be passed directly or read from environment variables.

Example
# OpenRouter
config = GraniteDoclingTextAPIConfig(
    model="openrouter/ibm-granite/granite-docling-258M",
)

GraniteDoclingTextExtractor

GraniteDoclingTextExtractor(
    backend: GraniteDoclingTextBackendConfig,
)

Bases: BaseTextExtractor

Granite Docling text extractor supporting PyTorch, VLLM, MLX, and API backends.

Granite Docling is IBM's compact vision-language model optimized for document conversion. It outputs DocTags format which is converted to Markdown using the docling_core library.

Example

from omnidocs.tasks.text_extraction.granitedocling import ( ... GraniteDoclingTextExtractor, ... GraniteDoclingTextPyTorchConfig, ... ) config = GraniteDoclingTextPyTorchConfig(device="cuda") extractor = GraniteDoclingTextExtractor(backend=config) result = extractor.extract(image, output_format="markdown") print(result.content)

Initialize Granite Docling extractor with backend configuration.

PARAMETER DESCRIPTION
backend

Backend configuration (PyTorch, VLLM, MLX, or API config)

TYPE: GraniteDoclingTextBackendConfig

Source code in omnidocs/tasks/text_extraction/granitedocling/extractor.py
def __init__(self, backend: GraniteDoclingTextBackendConfig):
    """
    Initialize Granite Docling extractor with backend configuration.

    Args:
        backend: Backend configuration (PyTorch, VLLM, MLX, or API config)
    """
    self.backend_config = backend
    self._backend: Any = None
    self._processor: Any = None
    self._loaded: bool = False

    # Backend-specific helpers
    self._mlx_config: Any = None
    self._apply_chat_template: Any = None
    self._generate: Any = None
    self._sampling_params_class: Any = None
    self._device: str = "cpu"

    self._load_model()

extract

extract(
    image: Union[Image, ndarray, str, Path],
    output_format: Literal["html", "markdown"] = "markdown",
) -> TextOutput

Extract text from an image using Granite Docling.

PARAMETER DESCRIPTION
image

Input image (PIL Image, numpy array, or file path)

TYPE: Union[Image, ndarray, str, Path]

output_format

Output format ("markdown" or "html")

TYPE: Literal['html', 'markdown'] DEFAULT: 'markdown'

RETURNS DESCRIPTION
TextOutput

TextOutput with extracted content

Source code in omnidocs/tasks/text_extraction/granitedocling/extractor.py
def extract(
    self,
    image: Union[Image.Image, np.ndarray, str, Path],
    output_format: Literal["html", "markdown"] = "markdown",
) -> TextOutput:
    """
    Extract text from an image using Granite Docling.

    Args:
        image: Input image (PIL Image, numpy array, or file path)
        output_format: Output format ("markdown" or "html")

    Returns:
        TextOutput with extracted content
    """
    if not self._loaded:
        raise RuntimeError("Model not loaded")

    if output_format not in ("html", "markdown"):
        raise ValueError(f"Invalid output_format: {output_format}")

    pil_image = self._prepare_image(image)
    width, height = pil_image.size

    # Dispatch to backend-specific inference
    config_type = type(self.backend_config).__name__

    if config_type == "GraniteDoclingTextPyTorchConfig":
        raw_output = self._infer_pytorch(pil_image)
    elif config_type == "GraniteDoclingTextVLLMConfig":
        raw_output = self._infer_vllm(pil_image)
    elif config_type == "GraniteDoclingTextMLXConfig":
        raw_output = self._infer_mlx(pil_image)
    elif config_type == "GraniteDoclingTextAPIConfig":
        raw_output = self._infer_api(pil_image)
    else:
        raise RuntimeError(f"Unknown backend: {config_type}")

    # Convert DocTags to Markdown
    markdown_output = self._convert_doctags_to_markdown(raw_output, pil_image)

    # For HTML output, wrap in basic HTML structure
    if output_format == "html":
        content = f"<html><body>\n{markdown_output}\n</body></html>"
    else:
        content = markdown_output

    return TextOutput(
        content=content,
        format=OutputFormat(output_format),
        raw_output=raw_output,
        plain_text=self._extract_plain_text(markdown_output),
        image_width=width,
        image_height=height,
        model_name=f"Granite-Docling-258M ({config_type.replace('Config', '')})",
    )

GraniteDoclingTextMLXConfig

Bases: BaseModel

Configuration for Granite Docling text extraction with MLX backend.

This backend is optimized for Apple Silicon Macs (M1/M2/M3/M4). Uses the MLX-optimized model variant.

GraniteDoclingTextPyTorchConfig

Bases: BaseModel

Configuration for Granite Docling text extraction with PyTorch backend.

GraniteDoclingTextVLLMConfig

Bases: BaseModel

Configuration for Granite Docling text extraction with VLLM backend.

IMPORTANT: This config uses revision="untied" by default, which is required for VLLM compatibility with Granite Docling's tied weights.

api

API backend configuration for Granite Docling text extraction.

Uses litellm for provider-agnostic inference (OpenRouter, Gemini, Azure, etc.).

GraniteDoclingTextAPIConfig

Bases: BaseModel

Configuration for Granite Docling text extraction via API.

Uses litellm for provider-agnostic API access. Supports OpenRouter, Gemini, Azure, OpenAI, and any other litellm-compatible provider.

API keys can be passed directly or read from environment variables.

Example
# OpenRouter
config = GraniteDoclingTextAPIConfig(
    model="openrouter/ibm-granite/granite-docling-258M",
)

extractor

Granite Docling text extractor with multi-backend support.

GraniteDoclingTextExtractor

GraniteDoclingTextExtractor(
    backend: GraniteDoclingTextBackendConfig,
)

Bases: BaseTextExtractor

Granite Docling text extractor supporting PyTorch, VLLM, MLX, and API backends.

Granite Docling is IBM's compact vision-language model optimized for document conversion. It outputs DocTags format which is converted to Markdown using the docling_core library.

Example

from omnidocs.tasks.text_extraction.granitedocling import ( ... GraniteDoclingTextExtractor, ... GraniteDoclingTextPyTorchConfig, ... ) config = GraniteDoclingTextPyTorchConfig(device="cuda") extractor = GraniteDoclingTextExtractor(backend=config) result = extractor.extract(image, output_format="markdown") print(result.content)

Initialize Granite Docling extractor with backend configuration.

PARAMETER DESCRIPTION
backend

Backend configuration (PyTorch, VLLM, MLX, or API config)

TYPE: GraniteDoclingTextBackendConfig

Source code in omnidocs/tasks/text_extraction/granitedocling/extractor.py
def __init__(self, backend: GraniteDoclingTextBackendConfig):
    """
    Initialize Granite Docling extractor with backend configuration.

    Args:
        backend: Backend configuration (PyTorch, VLLM, MLX, or API config)
    """
    self.backend_config = backend
    self._backend: Any = None
    self._processor: Any = None
    self._loaded: bool = False

    # Backend-specific helpers
    self._mlx_config: Any = None
    self._apply_chat_template: Any = None
    self._generate: Any = None
    self._sampling_params_class: Any = None
    self._device: str = "cpu"

    self._load_model()

extract

extract(
    image: Union[Image, ndarray, str, Path],
    output_format: Literal["html", "markdown"] = "markdown",
) -> TextOutput

Extract text from an image using Granite Docling.

PARAMETER DESCRIPTION
image

Input image (PIL Image, numpy array, or file path)

TYPE: Union[Image, ndarray, str, Path]

output_format

Output format ("markdown" or "html")

TYPE: Literal['html', 'markdown'] DEFAULT: 'markdown'

RETURNS DESCRIPTION
TextOutput

TextOutput with extracted content

Source code in omnidocs/tasks/text_extraction/granitedocling/extractor.py
def extract(
    self,
    image: Union[Image.Image, np.ndarray, str, Path],
    output_format: Literal["html", "markdown"] = "markdown",
) -> TextOutput:
    """
    Extract text from an image using Granite Docling.

    Args:
        image: Input image (PIL Image, numpy array, or file path)
        output_format: Output format ("markdown" or "html")

    Returns:
        TextOutput with extracted content
    """
    if not self._loaded:
        raise RuntimeError("Model not loaded")

    if output_format not in ("html", "markdown"):
        raise ValueError(f"Invalid output_format: {output_format}")

    pil_image = self._prepare_image(image)
    width, height = pil_image.size

    # Dispatch to backend-specific inference
    config_type = type(self.backend_config).__name__

    if config_type == "GraniteDoclingTextPyTorchConfig":
        raw_output = self._infer_pytorch(pil_image)
    elif config_type == "GraniteDoclingTextVLLMConfig":
        raw_output = self._infer_vllm(pil_image)
    elif config_type == "GraniteDoclingTextMLXConfig":
        raw_output = self._infer_mlx(pil_image)
    elif config_type == "GraniteDoclingTextAPIConfig":
        raw_output = self._infer_api(pil_image)
    else:
        raise RuntimeError(f"Unknown backend: {config_type}")

    # Convert DocTags to Markdown
    markdown_output = self._convert_doctags_to_markdown(raw_output, pil_image)

    # For HTML output, wrap in basic HTML structure
    if output_format == "html":
        content = f"<html><body>\n{markdown_output}\n</body></html>"
    else:
        content = markdown_output

    return TextOutput(
        content=content,
        format=OutputFormat(output_format),
        raw_output=raw_output,
        plain_text=self._extract_plain_text(markdown_output),
        image_width=width,
        image_height=height,
        model_name=f"Granite-Docling-258M ({config_type.replace('Config', '')})",
    )

mlx

MLX backend configuration for Granite Docling text extraction (Apple Silicon).

GraniteDoclingTextMLXConfig

Bases: BaseModel

Configuration for Granite Docling text extraction with MLX backend.

This backend is optimized for Apple Silicon Macs (M1/M2/M3/M4). Uses the MLX-optimized model variant.

pytorch

PyTorch backend configuration for Granite Docling text extraction.

GraniteDoclingTextPyTorchConfig

Bases: BaseModel

Configuration for Granite Docling text extraction with PyTorch backend.

vllm

VLLM backend configuration for Granite Docling text extraction.

GraniteDoclingTextVLLMConfig

Bases: BaseModel

Configuration for Granite Docling text extraction with VLLM backend.

IMPORTANT: This config uses revision="untied" by default, which is required for VLLM compatibility with Granite Docling's tied weights.