Models¶

Pydantic models for layout extraction outputs.

Defines standardized output types and label enums for layout detection.

Coordinate Systems

Absolute (default): Coordinates in pixels relative to original image size
Normalized (0-1024): Coordinates scaled to 0-1024 range (virtual 1024x1024 canvas)

Use bbox.to_normalized(width, height) or output.get_normalized_bboxes() to convert to normalized coordinates.

Example

result = extractor.extract(image)  # Returns absolute pixel coordinates
normalized = result.get_normalized_bboxes()  # Returns 0-1024 normalized coords

LayoutLabel ¶

Bases: str, Enum

Standardized layout labels used across all layout extractors.

These provide a consistent vocabulary regardless of which model is used.

CustomLabel ¶

Bases: BaseModel

Type-safe custom layout label definition for VLM-based models.

VLM models like Qwen3-VL support flexible custom labels beyond the standard LayoutLabel enum. Use this class to define custom labels with validation.

Example

from omnidocs.tasks.layout_extraction import CustomLabel

# Simple custom label
code_block = CustomLabel(name="code_block")

# With metadata
sidebar = CustomLabel(
        name="sidebar",
        description="Secondary content panel",
        color="#9B59B6",
    )

# Use with QwenLayoutDetector
result = detector.extract(image, custom_labels=[code_block, sidebar])

LabelMapping ¶

LabelMapping(mapping: Dict[str, LayoutLabel])

Base class for model-specific label mappings.

Each model maps its native labels to standardized LayoutLabel values.

Initialize label mapping.

PARAMETER	DESCRIPTION
`mapping`	Dict mapping model-specific labels to LayoutLabel enum values TYPE: `Dict[str, LayoutLabel]`

Source code in omnidocs/tasks/layout_extraction/models.py

def __init__(self, mapping: Dict[str, LayoutLabel]):
    """
    Initialize label mapping.

    Args:
        mapping: Dict mapping model-specific labels to LayoutLabel enum values
    """
    self._mapping = {k.lower(): v for k, v in mapping.items()}
    self._reverse_mapping = {v: k for k, v in mapping.items()}

supported_labels `property` ¶

supported_labels: List[str]

Get list of supported model-specific labels.

standard_labels `property` ¶

standard_labels: List[LayoutLabel]

Get list of standard labels this mapping produces.

to_standard ¶

to_standard(model_label: str) -> LayoutLabel

Convert model-specific label to standardized LayoutLabel.

Source code in omnidocs/tasks/layout_extraction/models.py

def to_standard(self, model_label: str) -> LayoutLabel:
    """Convert model-specific label to standardized LayoutLabel."""
    return self._mapping.get(model_label.lower(), LayoutLabel.UNKNOWN)

from_standard ¶

from_standard(standard_label: LayoutLabel) -> Optional[str]

Convert standardized LayoutLabel to model-specific label.

Source code in omnidocs/tasks/layout_extraction/models.py

def from_standard(self, standard_label: LayoutLabel) -> Optional[str]:
    """Convert standardized LayoutLabel to model-specific label."""
    return self._reverse_mapping.get(standard_label)

BoundingBox ¶

Bases: BaseModel

Bounding box coordinates in pixel space.

Coordinates follow the convention: (x1, y1) is top-left, (x2, y2) is bottom-right.

width `property` ¶

width: float

Width of the bounding box.

height `property` ¶

height: float

Height of the bounding box.

area `property` ¶

area: float

Area of the bounding box.

center `property` ¶

center: Tuple[float, float]

Center point of the bounding box.

to_list ¶

to_list() -> List[float]

Convert to [x1, y1, x2, y2] list.

Source code in omnidocs/tasks/layout_extraction/models.py

def to_list(self) -> List[float]:
    """Convert to [x1, y1, x2, y2] list."""
    return [self.x1, self.y1, self.x2, self.y2]

to_xyxy ¶

to_xyxy() -> Tuple[float, float, float, float]

Convert to (x1, y1, x2, y2) tuple.

Source code in omnidocs/tasks/layout_extraction/models.py

def to_xyxy(self) -> Tuple[float, float, float, float]:
    """Convert to (x1, y1, x2, y2) tuple."""
    return (self.x1, self.y1, self.x2, self.y2)

to_xywh ¶

to_xywh() -> Tuple[float, float, float, float]

Convert to (x, y, width, height) format.

Source code in omnidocs/tasks/layout_extraction/models.py

def to_xywh(self) -> Tuple[float, float, float, float]:
    """Convert to (x, y, width, height) format."""
    return (self.x1, self.y1, self.width, self.height)

from_list `classmethod` ¶

from_list(coords: List[float]) -> BoundingBox

Create from [x1, y1, x2, y2] list.

Source code in omnidocs/tasks/layout_extraction/models.py

@classmethod
def from_list(cls, coords: List[float]) -> "BoundingBox":
    """Create from [x1, y1, x2, y2] list."""
    if len(coords) != 4:
        raise ValueError(f"Expected 4 coordinates, got {len(coords)}")
    return cls(x1=coords[0], y1=coords[1], x2=coords[2], y2=coords[3])

to_normalized ¶

to_normalized(
    image_width: int, image_height: int
) -> BoundingBox

Convert to normalized coordinates (0-1024 range).

Scales coordinates from absolute pixel values to a virtual 1024x1024 canvas. This provides consistent coordinates regardless of original image size.

PARAMETER	DESCRIPTION
`image_width`	Original image width in pixels TYPE: `int`
`image_height`	Original image height in pixels TYPE: `int`

RETURNS	DESCRIPTION
`BoundingBox`	New BoundingBox with coordinates in 0-1024 range

Example

bbox = BoundingBox(x1=100, y1=50, x2=500, y2=300)
normalized = bbox.to_normalized(1000, 800)
# x: 100/1000*1024 = 102.4, y: 50/800*1024 = 64

Source code in omnidocs/tasks/layout_extraction/models.py

def to_normalized(self, image_width: int, image_height: int) -> "BoundingBox":
    """
    Convert to normalized coordinates (0-1024 range).

    Scales coordinates from absolute pixel values to a virtual 1024x1024 canvas.
    This provides consistent coordinates regardless of original image size.

    Args:
        image_width: Original image width in pixels
        image_height: Original image height in pixels

    Returns:
        New BoundingBox with coordinates in 0-1024 range

    Example:
        ```python
        bbox = BoundingBox(x1=100, y1=50, x2=500, y2=300)
        normalized = bbox.to_normalized(1000, 800)
        # x: 100/1000*1024 = 102.4, y: 50/800*1024 = 64
        ```
    """
    return BoundingBox(
        x1=self.x1 / image_width * NORMALIZED_SIZE,
        y1=self.y1 / image_height * NORMALIZED_SIZE,
        x2=self.x2 / image_width * NORMALIZED_SIZE,
        y2=self.y2 / image_height * NORMALIZED_SIZE,
    )

to_absolute ¶

to_absolute(
    image_width: int, image_height: int
) -> BoundingBox

Convert from normalized (0-1024) to absolute pixel coordinates.

PARAMETER	DESCRIPTION
`image_width`	Target image width in pixels TYPE: `int`
`image_height`	Target image height in pixels TYPE: `int`

RETURNS	DESCRIPTION
`BoundingBox`	New BoundingBox with absolute pixel coordinates

Source code in omnidocs/tasks/layout_extraction/models.py

def to_absolute(self, image_width: int, image_height: int) -> "BoundingBox":
    """
    Convert from normalized (0-1024) to absolute pixel coordinates.

    Args:
        image_width: Target image width in pixels
        image_height: Target image height in pixels

    Returns:
        New BoundingBox with absolute pixel coordinates
    """
    return BoundingBox(
        x1=self.x1 / NORMALIZED_SIZE * image_width,
        y1=self.y1 / NORMALIZED_SIZE * image_height,
        x2=self.x2 / NORMALIZED_SIZE * image_width,
        y2=self.y2 / NORMALIZED_SIZE * image_height,
    )

LayoutBox ¶

Bases: BaseModel

Single detected layout element with label, bounding box, and confidence.

to_dict ¶

to_dict() -> Dict

Convert to dictionary representation.

Source code in omnidocs/tasks/layout_extraction/models.py

def to_dict(self) -> Dict:
    """Convert to dictionary representation."""
    return {
        "label": self.label.value,
        "bbox": self.bbox.to_list(),
        "confidence": self.confidence,
        "class_id": self.class_id,
        "original_label": self.original_label,
    }

get_normalized_bbox ¶

get_normalized_bbox(
    image_width: int, image_height: int
) -> BoundingBox

Get bounding box in normalized (0-1024) coordinates.

PARAMETER	DESCRIPTION
`image_width`	Original image width TYPE: `int`
`image_height`	Original image height TYPE: `int`

RETURNS	DESCRIPTION
`BoundingBox`	BoundingBox with normalized coordinates

Source code in omnidocs/tasks/layout_extraction/models.py

def get_normalized_bbox(self, image_width: int, image_height: int) -> BoundingBox:
    """
    Get bounding box in normalized (0-1024) coordinates.

    Args:
        image_width: Original image width
        image_height: Original image height

    Returns:
        BoundingBox with normalized coordinates
    """
    return self.bbox.to_normalized(image_width, image_height)

LayoutOutput ¶

Bases: BaseModel

Complete layout extraction results for a single image.

element_count `property` ¶

element_count: int

Number of detected elements.

labels_found `property` ¶

labels_found: List[str]

Unique labels found in detections.

filter_by_label ¶

filter_by_label(label: LayoutLabel) -> List[LayoutBox]

Filter boxes by label.

Source code in omnidocs/tasks/layout_extraction/models.py

def filter_by_label(self, label: LayoutLabel) -> List[LayoutBox]:
    """Filter boxes by label."""
    return [box for box in self.bboxes if box.label == label]

filter_by_confidence ¶

filter_by_confidence(
    min_confidence: float,
) -> List[LayoutBox]

Filter boxes by minimum confidence.

Source code in omnidocs/tasks/layout_extraction/models.py

def filter_by_confidence(self, min_confidence: float) -> List[LayoutBox]:
    """Filter boxes by minimum confidence."""
    return [box for box in self.bboxes if box.confidence >= min_confidence]

to_dict ¶

to_dict() -> Dict

Convert to dictionary representation.

Source code in omnidocs/tasks/layout_extraction/models.py

def to_dict(self) -> Dict:
    """Convert to dictionary representation."""
    return {
        "bboxes": [box.to_dict() for box in self.bboxes],
        "image_width": self.image_width,
        "image_height": self.image_height,
        "model_name": self.model_name,
        "element_count": self.element_count,
        "labels_found": self.labels_found,
    }

sort_by_position ¶

sort_by_position(
    top_to_bottom: bool = True,
) -> LayoutOutput

Return a new LayoutOutput with boxes sorted by position.

PARAMETER	DESCRIPTION
`top_to_bottom`	If True, sort by y-coordinate (reading order) TYPE: `bool` DEFAULT: `True`

Source code in omnidocs/tasks/layout_extraction/models.py

def sort_by_position(self, top_to_bottom: bool = True) -> "LayoutOutput":
    """
    Return a new LayoutOutput with boxes sorted by position.

    Args:
        top_to_bottom: If True, sort by y-coordinate (reading order)
    """
    sorted_boxes = sorted(self.bboxes, key=lambda b: (b.bbox.y1, b.bbox.x1), reverse=not top_to_bottom)
    return LayoutOutput(
        bboxes=sorted_boxes,
        image_width=self.image_width,
        image_height=self.image_height,
        model_name=self.model_name,
    )

get_normalized_bboxes ¶

get_normalized_bboxes() -> List[Dict]

Get all bounding boxes in normalized (0-1024) coordinates.

RETURNS	DESCRIPTION
`List[Dict]`	List of dicts with normalized bbox coordinates and metadata.

Example

result = extractor.extract(image)
normalized = result.get_normalized_bboxes()
for box in normalized:
        print(f"{box['label']}: {box['bbox']}")  # coords in 0-1024 range

Source code in omnidocs/tasks/layout_extraction/models.py

def get_normalized_bboxes(self) -> List[Dict]:
    """
    Get all bounding boxes in normalized (0-1024) coordinates.

    Returns:
        List of dicts with normalized bbox coordinates and metadata.

    Example:
        ```python
        result = extractor.extract(image)
        normalized = result.get_normalized_bboxes()
        for box in normalized:
                print(f"{box['label']}: {box['bbox']}")  # coords in 0-1024 range
        ```
    """
    normalized = []
    for box in self.bboxes:
        norm_bbox = box.bbox.to_normalized(self.image_width, self.image_height)
        normalized.append(
            {
                "label": box.label.value,
                "bbox": norm_bbox.to_list(),
                "confidence": box.confidence,
                "class_id": box.class_id,
                "original_label": box.original_label,
            }
        )
    return normalized

visualize ¶

visualize(
    image: Image,
    output_path: Optional[Union[str, Path]] = None,
    show_labels: bool = True,
    show_confidence: bool = True,
    line_width: int = 3,
    font_size: int = 12,
) -> Image.Image

Visualize layout detection results on the image.

Draws bounding boxes with labels and confidence scores on the image. Each layout category has a distinct color for easy identification.

PARAMETER	DESCRIPTION
`image`	PIL Image to draw on (will be copied, not modified) TYPE: `Image`
`output_path`	Optional path to save the visualization TYPE: `Optional[Union[str, Path]]` DEFAULT: `None`
`show_labels`	Whether to show label text TYPE: `bool` DEFAULT: `True`
`show_confidence`	Whether to show confidence scores TYPE: `bool` DEFAULT: `True`
`line_width`	Width of bounding box lines TYPE: `int` DEFAULT: `3`
`font_size`	Size of label text (note: uses default font) TYPE: `int` DEFAULT: `12`

RETURNS	DESCRIPTION
`Image`	PIL Image with visualizations drawn

Example

result = extractor.extract(image)
viz = result.visualize(image, output_path="layout_viz.png")
viz.show()  # Display in notebook/viewer

Source code in omnidocs/tasks/layout_extraction/models.py

def visualize(
    self,
    image: "Image.Image",
    output_path: Optional[Union[str, Path]] = None,
    show_labels: bool = True,
    show_confidence: bool = True,
    line_width: int = 3,
    font_size: int = 12,
) -> "Image.Image":
    """
    Visualize layout detection results on the image.

    Draws bounding boxes with labels and confidence scores on the image.
    Each layout category has a distinct color for easy identification.

    Args:
        image: PIL Image to draw on (will be copied, not modified)
        output_path: Optional path to save the visualization
        show_labels: Whether to show label text
        show_confidence: Whether to show confidence scores
        line_width: Width of bounding box lines
        font_size: Size of label text (note: uses default font)

    Returns:
        PIL Image with visualizations drawn

    Example:
        ```python
        result = extractor.extract(image)
        viz = result.visualize(image, output_path="layout_viz.png")
        viz.show()  # Display in notebook/viewer
        ```
    """
    from PIL import ImageDraw

    # Copy image to avoid modifying original
    viz_image = image.copy().convert("RGB")
    draw = ImageDraw.Draw(viz_image)

    for box in self.bboxes:
        # Get color for this label
        color = LABEL_COLORS.get(box.label, "#95A5A6")

        # Draw bounding box
        coords = box.bbox.to_xyxy()
        draw.rectangle(coords, outline=color, width=line_width)

        # Build label text
        if show_labels or show_confidence:
            label_parts = []
            if show_labels:
                label_parts.append(box.label.value)
            if show_confidence:
                label_parts.append(f"{box.confidence:.2f}")
            label_text = " ".join(label_parts)

            # Draw label background
            text_bbox = draw.textbbox((coords[0], coords[1] - 20), label_text)
            draw.rectangle(text_bbox, fill=color)

            # Draw label text
            draw.text(
                (coords[0], coords[1] - 20),
                label_text,
                fill="white",
            )

    # Save if path provided
    if output_path:
        output_path = Path(output_path)
        output_path.parent.mkdir(parents=True, exist_ok=True)
        viz_image.save(output_path)

    return viz_image

load_json `classmethod` ¶

load_json(file_path: Union[str, Path]) -> LayoutOutput

Load a LayoutOutput instance from a JSON file.

Reads a JSON file and deserializes its contents into a LayoutOutput object. Uses Pydantic's model_validate_json for proper handling of nested objects.

PARAMETER	DESCRIPTION
`file_path`	Path to JSON file containing serialized LayoutOutput data. Can be string or pathlib.Path object. TYPE: `Union[str, Path]`

RETURNS	DESCRIPTION
`LayoutOutput`	Deserialized layout output instance from file. TYPE: `LayoutOutput`

RAISES	DESCRIPTION
`FileNotFoundError`	If the specified file does not exist.
`UnicodeDecodeError`	If file cannot be decoded as UTF-8.
`ValueError`	If file contents are not valid JSON.
`ValidationError`	If JSON data doesn't match LayoutOutput schema.

Example

output = LayoutOutput.load_json('layout_results.json')
print(f"Found {output.element_count} elements")

Found 5 elements

Source code in omnidocs/tasks/layout_extraction/models.py

@classmethod
def load_json(cls, file_path: Union[str, Path]) -> "LayoutOutput":
    """
    Load a LayoutOutput instance from a JSON file.

    Reads a JSON file and deserializes its contents into a LayoutOutput object.
    Uses Pydantic's model_validate_json for proper handling of nested objects.

    Args:
        file_path: Path to JSON file containing serialized LayoutOutput data.
                  Can be string or pathlib.Path object.

    Returns:
        LayoutOutput: Deserialized layout output instance from file.

    Raises:
        FileNotFoundError: If the specified file does not exist.
        UnicodeDecodeError: If file cannot be decoded as UTF-8.
        ValueError: If file contents are not valid JSON.
        ValidationError: If JSON data doesn't match LayoutOutput schema.

    Example:
        ```python
        output = LayoutOutput.load_json('layout_results.json')
        print(f"Found {output.element_count} elements")
        ```
        Found 5 elements
    """
    path = Path(file_path)
    return cls.model_validate_json(path.read_text(encoding="utf-8"))

save_json ¶

save_json(file_path: Union[str, Path]) -> None

Save LayoutOutput instance to a JSON file.

Serializes the LayoutOutput object to JSON and writes it to a file. Automatically creates parent directories if they don't exist. Uses UTF-8 encoding for compatibility and proper handling of special characters.

PARAMETER	DESCRIPTION
`file_path`	Path where JSON file should be saved. Can be string or pathlib.Path object. Parent directories will be created if they don't exist. TYPE: `Union[str, Path]`

RETURNS	DESCRIPTION
`None`	None

RAISES	DESCRIPTION
`OSError`	If file cannot be written due to permission or disk errors.
`TypeError`	If file_path is not a string or Path object.

Example

output = LayoutOutput(bboxes=[], image_width=800, image_height=600)
output.save_json('results/layout_output.json')
# File is created at results/layout_output.json
# Parent 'results' directory is created if it didn't exist

Source code in omnidocs/tasks/layout_extraction/models.py

def save_json(self, file_path: Union[str, Path]) -> None:
    """
    Save LayoutOutput instance to a JSON file.

    Serializes the LayoutOutput object to JSON and writes it to a file.
    Automatically creates parent directories if they don't exist. Uses UTF-8
    encoding for compatibility and proper handling of special characters.

    Args:
        file_path: Path where JSON file should be saved. Can be string or
                  pathlib.Path object. Parent directories will be created
                  if they don't exist.

    Returns:
        None

    Raises:
        OSError: If file cannot be written due to permission or disk errors.
        TypeError: If file_path is not a string or Path object.

    Example:
        ```python
        output = LayoutOutput(bboxes=[], image_width=800, image_height=600)
        output.save_json('results/layout_output.json')
        # File is created at results/layout_output.json
        # Parent 'results' directory is created if it didn't exist
        ```
    """
    path = Path(file_path)
    path.parent.mkdir(parents=True, exist_ok=True)
    path.write_text(self.model_dump_json(), encoding="utf-8")

Models¶

LayoutLabel ¶

CustomLabel ¶

LabelMapping ¶

supported_labels property ¶

standard_labels property ¶

to_standard ¶

from_standard ¶

BoundingBox ¶

width property ¶

height property ¶

area property ¶

center property ¶

to_list ¶

to_xyxy ¶

to_xywh ¶

from_list classmethod ¶

to_normalized ¶

to_absolute ¶

LayoutBox ¶

to_dict ¶

get_normalized_bbox ¶

LayoutOutput ¶

element_count property ¶

labels_found property ¶

filter_by_label ¶

filter_by_confidence ¶

to_dict ¶

sort_by_position ¶

get_normalized_bboxes ¶

visualize ¶

load_json classmethod ¶

save_json ¶

supported_labels `property` ¶

standard_labels `property` ¶

width `property` ¶

height `property` ¶

area `property` ¶

center `property` ¶

from_list `classmethod` ¶

element_count `property` ¶

labels_found `property` ¶

load_json `classmethod` ¶