Overview¶

Granite Docling text extraction with multi-backend support.

GraniteDoclingTextAPIConfig ¶

Bases: BaseModel

Configuration for Granite Docling text extraction via API.

Uses litellm for provider-agnostic API access. Supports OpenRouter, Gemini, Azure, OpenAI, and any other litellm-compatible provider.

API keys can be passed directly or read from environment variables.

Example

# OpenRouter
config = GraniteDoclingTextAPIConfig(
    model="openrouter/ibm-granite/granite-docling-258M",
)

GraniteDoclingTextExtractor ¶

GraniteDoclingTextExtractor(
    backend: GraniteDoclingTextBackendConfig,
)

Bases: BaseTextExtractor

Granite Docling text extractor supporting PyTorch, VLLM, MLX, and API backends.

Granite Docling is IBM's compact vision-language model optimized for document conversion. It outputs DocTags format which is converted to Markdown using the docling_core library.

Example

from omnidocs.tasks.text_extraction.granitedocling import ( ... GraniteDoclingTextExtractor, ... GraniteDoclingTextPyTorchConfig, ... ) config = GraniteDoclingTextPyTorchConfig(device="cuda") extractor = GraniteDoclingTextExtractor(backend=config) result = extractor.extract(image, output_format="markdown") print(result.content)

Initialize Granite Docling extractor with backend configuration.

PARAMETER	DESCRIPTION
`backend`	Backend configuration (PyTorch, VLLM, MLX, or API config) TYPE: `GraniteDoclingTextBackendConfig`

Source code in omnidocs/tasks/text_extraction/granitedocling/extractor.py

def __init__(self, backend: GraniteDoclingTextBackendConfig):
    """
    Initialize Granite Docling extractor with backend configuration.

    Args:
        backend: Backend configuration (PyTorch, VLLM, MLX, or API config)
    """
    self.backend_config = backend
    self._backend: Any = None
    self._processor: Any = None
    self._loaded: bool = False

    # Backend-specific helpers
    self._mlx_config: Any = None
    self._apply_chat_template: Any = None
    self._generate: Any = None
    self._sampling_params_class: Any = None
    self._device: str = "cpu"

    self._load_model()

extract ¶

extract(
    image: Union[Image, ndarray, str, Path],
    output_format: Literal["html", "markdown"] = "markdown",
) -> TextOutput

Extract text from an image using Granite Docling.

PARAMETER	DESCRIPTION
`image`	Input image (PIL Image, numpy array, or file path) TYPE: `Union[Image, ndarray, str, Path]`
`output_format`	Output format ("markdown" or "html") TYPE: `Literal['html', 'markdown']` DEFAULT: `'markdown'`

RETURNS	DESCRIPTION
`TextOutput`	TextOutput with extracted content

Source code in omnidocs/tasks/text_extraction/granitedocling/extractor.py

def extract(
    self,
    image: Union[Image.Image, np.ndarray, str, Path],
    output_format: Literal["html", "markdown"] = "markdown",
) -> TextOutput:
    """
    Extract text from an image using Granite Docling.

    Args:
        image: Input image (PIL Image, numpy array, or file path)
        output_format: Output format ("markdown" or "html")

    Returns:
        TextOutput with extracted content
    """
    if not self._loaded:
        raise RuntimeError("Model not loaded")

    if output_format not in ("html", "markdown"):
        raise ValueError(f"Invalid output_format: {output_format}")

    pil_image = self._prepare_image(image)
    width, height = pil_image.size

    # Dispatch to backend-specific inference
    config_type = type(self.backend_config).__name__

    if config_type == "GraniteDoclingTextPyTorchConfig":
        raw_output = self._infer_pytorch(pil_image)
    elif config_type == "GraniteDoclingTextVLLMConfig":
        raw_output = self._infer_vllm(pil_image)
    elif config_type == "GraniteDoclingTextMLXConfig":
        raw_output = self._infer_mlx(pil_image)
    elif config_type == "GraniteDoclingTextAPIConfig":
        raw_output = self._infer_api(pil_image)
    else:
        raise RuntimeError(f"Unknown backend: {config_type}")

    # Convert DocTags to Markdown
    markdown_output = self._convert_doctags_to_markdown(raw_output, pil_image)

    # For HTML output, wrap in basic HTML structure
    if output_format == "html":
        content = f"<html><body>\n{markdown_output}\n</body></html>"
    else:
        content = markdown_output

    return TextOutput(
        content=content,
        format=OutputFormat(output_format),
        raw_output=raw_output,
        plain_text=self._extract_plain_text(markdown_output),
        image_width=width,
        image_height=height,
        model_name=f"Granite-Docling-258M ({config_type.replace('Config', '')})",
    )

GraniteDoclingTextMLXConfig ¶

Bases: BaseModel

Configuration for Granite Docling text extraction with MLX backend.

This backend is optimized for Apple Silicon Macs (M1/M2/M3/M4). Uses the MLX-optimized model variant.

GraniteDoclingTextPyTorchConfig ¶

Bases: BaseModel

Configuration for Granite Docling text extraction with PyTorch backend.

GraniteDoclingTextVLLMConfig ¶

Bases: BaseModel

Configuration for Granite Docling text extraction with VLLM backend.

IMPORTANT: This config uses revision="untied" by default, which is required for VLLM compatibility with Granite Docling's tied weights.

api ¶

API backend configuration for Granite Docling text extraction.

Uses litellm for provider-agnostic inference (OpenRouter, Gemini, Azure, etc.).

GraniteDoclingTextAPIConfig ¶

Bases: BaseModel

Configuration for Granite Docling text extraction via API.

Uses litellm for provider-agnostic API access. Supports OpenRouter, Gemini, Azure, OpenAI, and any other litellm-compatible provider.

API keys can be passed directly or read from environment variables.

Example

# OpenRouter
config = GraniteDoclingTextAPIConfig(
    model="openrouter/ibm-granite/granite-docling-258M",
)

extractor ¶

Granite Docling text extractor with multi-backend support.

GraniteDoclingTextExtractor ¶

GraniteDoclingTextExtractor(
    backend: GraniteDoclingTextBackendConfig,
)

Bases: BaseTextExtractor

Granite Docling text extractor supporting PyTorch, VLLM, MLX, and API backends.

Granite Docling is IBM's compact vision-language model optimized for document conversion. It outputs DocTags format which is converted to Markdown using the docling_core library.

Example

from omnidocs.tasks.text_extraction.granitedocling import ( ... GraniteDoclingTextExtractor, ... GraniteDoclingTextPyTorchConfig, ... ) config = GraniteDoclingTextPyTorchConfig(device="cuda") extractor = GraniteDoclingTextExtractor(backend=config) result = extractor.extract(image, output_format="markdown") print(result.content)

Initialize Granite Docling extractor with backend configuration.

PARAMETER	DESCRIPTION
`backend`	Backend configuration (PyTorch, VLLM, MLX, or API config) TYPE: `GraniteDoclingTextBackendConfig`

Source code in omnidocs/tasks/text_extraction/granitedocling/extractor.py

def __init__(self, backend: GraniteDoclingTextBackendConfig):
    """
    Initialize Granite Docling extractor with backend configuration.

    Args:
        backend: Backend configuration (PyTorch, VLLM, MLX, or API config)
    """
    self.backend_config = backend
    self._backend: Any = None
    self._processor: Any = None
    self._loaded: bool = False

    # Backend-specific helpers
    self._mlx_config: Any = None
    self._apply_chat_template: Any = None
    self._generate: Any = None
    self._sampling_params_class: Any = None
    self._device: str = "cpu"

    self._load_model()

extract ¶

extract(
    image: Union[Image, ndarray, str, Path],
    output_format: Literal["html", "markdown"] = "markdown",
) -> TextOutput

Extract text from an image using Granite Docling.

PARAMETER	DESCRIPTION
`image`	Input image (PIL Image, numpy array, or file path) TYPE: `Union[Image, ndarray, str, Path]`
`output_format`	Output format ("markdown" or "html") TYPE: `Literal['html', 'markdown']` DEFAULT: `'markdown'`

RETURNS	DESCRIPTION
`TextOutput`	TextOutput with extracted content

Source code in omnidocs/tasks/text_extraction/granitedocling/extractor.py

def extract(
    self,
    image: Union[Image.Image, np.ndarray, str, Path],
    output_format: Literal["html", "markdown"] = "markdown",
) -> TextOutput:
    """
    Extract text from an image using Granite Docling.

    Args:
        image: Input image (PIL Image, numpy array, or file path)
        output_format: Output format ("markdown" or "html")

    Returns:
        TextOutput with extracted content
    """
    if not self._loaded:
        raise RuntimeError("Model not loaded")

    if output_format not in ("html", "markdown"):
        raise ValueError(f"Invalid output_format: {output_format}")

    pil_image = self._prepare_image(image)
    width, height = pil_image.size

    # Dispatch to backend-specific inference
    config_type = type(self.backend_config).__name__

    if config_type == "GraniteDoclingTextPyTorchConfig":
        raw_output = self._infer_pytorch(pil_image)
    elif config_type == "GraniteDoclingTextVLLMConfig":
        raw_output = self._infer_vllm(pil_image)
    elif config_type == "GraniteDoclingTextMLXConfig":
        raw_output = self._infer_mlx(pil_image)
    elif config_type == "GraniteDoclingTextAPIConfig":
        raw_output = self._infer_api(pil_image)
    else:
        raise RuntimeError(f"Unknown backend: {config_type}")

    # Convert DocTags to Markdown
    markdown_output = self._convert_doctags_to_markdown(raw_output, pil_image)

    # For HTML output, wrap in basic HTML structure
    if output_format == "html":
        content = f"<html><body>\n{markdown_output}\n</body></html>"
    else:
        content = markdown_output

    return TextOutput(
        content=content,
        format=OutputFormat(output_format),
        raw_output=raw_output,
        plain_text=self._extract_plain_text(markdown_output),
        image_width=width,
        image_height=height,
        model_name=f"Granite-Docling-258M ({config_type.replace('Config', '')})",
    )

mlx ¶

MLX backend configuration for Granite Docling text extraction (Apple Silicon).

GraniteDoclingTextMLXConfig ¶

Bases: BaseModel

Configuration for Granite Docling text extraction with MLX backend.

This backend is optimized for Apple Silicon Macs (M1/M2/M3/M4). Uses the MLX-optimized model variant.

pytorch ¶

PyTorch backend configuration for Granite Docling text extraction.

GraniteDoclingTextPyTorchConfig ¶

Bases: BaseModel

Configuration for Granite Docling text extraction with PyTorch backend.

vllm ¶

VLLM backend configuration for Granite Docling text extraction.

GraniteDoclingTextVLLMConfig ¶

Bases: BaseModel

Configuration for Granite Docling text extraction with VLLM backend.

IMPORTANT: This config uses revision="untied" by default, which is required for VLLM compatibility with Granite Docling's tied weights.