Skip to content

📐 Layout Analysis

This section documents the API for layout analysis tasks, including various extractors for detecting and analyzing document structure.

Overview

Layout analysis in OmniDocs focuses on identifying and categorizing different regions within a document, such as text blocks, images, tables, and figures. This is crucial for understanding the document's overall structure and reading order.

Available Extractors

DocLayoutYOLOExtractor

A layout detection model based on YOLO-v10, designed for diverse document types.

omnidocs.tasks.layout_analysis.extractors.doc_layout_yolo.YOLOLayoutDetector

YOLOLayoutDetector(device: Optional[str] = None, show_log: bool = False, model_path: Optional[Union[str, Path]] = None)

Bases: BaseLayoutDetector

YOLO-based layout detection implementation.

Initialize YOLO Layout Detector.

Usage Example

from omnidocs.tasks.layout_analysis.extractors.doc_layout_yolo import YOLOLayoutDetector

extractor = YOLOLayoutDetector()
result = extractor.extract("document.pdf")
print(f"Detected {len(result.layouts)} layout elements.")

FlorenceLayoutExtractor

A fine-tuned model for document layout analysis, improving bounding box accuracy in document images.

omnidocs.tasks.layout_analysis.extractors.florence.FlorenceLayoutDetector

FlorenceLayoutDetector(device: Optional[str] = None, show_log: bool = False, trust_remote_code: bool = True, **kwargs)

Bases: BaseLayoutDetector

Florence-based layout detection implementation.

Initialize Florence Layout Detector.

Usage Example

from omnidocs.tasks.layout_analysis.extractors.florence import FlorenceLayoutDetector

extractor = FlorenceLayoutDetector()
result = extractor.extract("image.png")
print(f"Detected {len(result.layouts)} layout elements.")

PaddleLayoutExtractor

An OCR tool that supports multiple languages and provides layout detection capabilities.

omnidocs.tasks.layout_analysis.extractors.paddle.PaddleLayoutDetector

PaddleLayoutDetector(device: Optional[str] = None, show_log: bool = False, **kwargs)

Bases: BaseLayoutDetector

PaddleOCR-based layout detection implementation.

Initialize PaddleOCR Layout Detector.

Usage Example

from omnidocs.tasks.layout_analysis.extractors.paddle import PaddleLayoutDetector

extractor = PaddleLayoutDetector()
result = extractor.extract("image.png")
print(f"Detected {len(result.layouts)} layout elements.")

RTDETRLayoutExtractor

Implementation of RT-DETR, a real-time detection transformer focusing on object detection tasks.

omnidocs.tasks.layout_analysis.extractors.rtdetr.RTDETRLayoutDetector

RTDETRLayoutDetector(device: Optional[str] = None, show_log: bool = False, model_path: Optional[Union[str, Path]] = None, num_threads: Optional[int] = 4, use_cpu_only: bool = True)

Bases: BaseLayoutDetector

RT-DETR-based layout detection implementation.

Initialize RT-DETR Layout Detector with careful device handling.

Usage Example

from omnidocs.tasks.layout_analysis.extractors.rtdetr import RTDETRLayoutDetector

extractor = RTDETRLayoutDetector()
result = extractor.extract("image.png")
print(f"Detected {len(result.layouts)} layout elements.")

SuryaLayoutExtractor

OCR and layout analysis tool supporting 90+ languages, including reading order and table recognition.

omnidocs.tasks.layout_analysis.extractors.surya.SuryaLayoutDetector

SuryaLayoutDetector(device: Optional[str] = None, show_log: bool = False, **kwargs)

Bases: BaseLayoutDetector

Surya-based layout detection implementation.

Initialize Surya Layout Detector.

Usage Example

from omnidocs.tasks.layout_analysis.extractors.surya import SuryaLayoutDetector

extractor = SuryaLayoutDetector()
result = extractor.extract("document.pdf")
print(f"Detected {len(result.layouts)} layout elements.")

LayoutOutput

The standardized output format for layout analysis results.

omnidocs.tasks.layout_analysis.base.LayoutOutput

Bases: BaseModel

Container for all detected layout boxes in an image.

Attributes:

Name Type Description
bboxes List[LayoutBox]

List of detected LayoutBox objects

page_number Optional[int]

Optional page number for multi-page documents

image_size Optional[Tuple[int, int]]

Optional tuple of (width, height) of the processed image

save_json

save_json(output_path: Union[str, Path]) -> None

Save layout output to JSON file.

to_dict

to_dict() -> Dict

Convert to dictionary representation.

Key Properties

  • layouts (List[LayoutElement]): List of detected layout elements.
  • source_file (str): Path to the processed file.
  • source_img_size (Tuple[int, int]): Dimensions of the source image.

Key Methods

  • save_json(output_path): Save results to a JSON file.
  • visualize(image_path, output_path): Visualize layout elements on the source image.

LayoutElement

Represents a single detected layout element.

omnidocs.tasks.layout_analysis.base.BaseLayoutMapper

BaseLayoutMapper()

Base class for layout label mapping.

from_standard

from_standard(layout_label: LayoutLabel) -> Optional[str]

Convert standardized LayoutLabel to model-specific label.

to_standard

to_standard(model_label: str) -> Optional[LayoutLabel]

Convert model-specific label to standardized LayoutLabel.

Attributes

  • type (str): Type of the element (e.g., 'text', 'title', 'table', 'figure').
  • bbox (List[float]): Bounding box coordinates [x1, y1, x2, y2].
  • text_content (Optional[str]): Text content if applicable.
  • confidence (Optional[float]): Confidence score of the detection.

BaseLayoutExtractor

The abstract base class for all layout analysis extractors.

omnidocs.tasks.layout_analysis.base.BaseLayoutDetector

BaseLayoutDetector(show_log: bool = False)

Bases: ABC

Base class for all layout detection models.

preprocess_input

preprocess_input(input_path: Union[str, Path]) -> List[np.ndarray]

Convert input to processable format.

Parameters:

Name Type Description Default
input_path Union[str, Path]

Path to input image or PDF

required

Returns:

Type Description
List[ndarray]

List of preprocessed images as numpy arrays

visualize

visualize(detection_result: Tuple[Image, LayoutOutput], output_path: Union[str, Path]) -> None

Save annotated image to file.

Parameters:

Name Type Description Default
detection_result Tuple[Image, LayoutOutput]

Tuple containing (PIL Image, LayoutOutput)

required
output_path Union[str, Path]

Path to save visualization

required

LayoutMapper

Handles mapping of layout labels and normalization of bounding boxes.

omnidocs.tasks.layout_analysis.base.BaseLayoutMapper

BaseLayoutMapper()

Base class for layout label mapping.

from_standard

from_standard(layout_label: LayoutLabel) -> Optional[str]

Convert standardized LayoutLabel to model-specific label.

to_standard

to_standard(model_label: str) -> Optional[LayoutLabel]

Convert model-specific label to standardized LayoutLabel.