📐 Layout Analysis

This section documents the API for layout analysis tasks, including various extractors for detecting and analyzing document structure.

Overview

Layout analysis in OmniDocs focuses on identifying and categorizing different regions within a document, such as text blocks, images, tables, and figures. This is crucial for understanding the document's overall structure and reading order.

Available Extractors

DocLayoutYOLOExtractor

A layout detection model based on YOLO-v10, designed for diverse document types.

omnidocs.tasks.layout_analysis.extractors.doc_layout_yolo.YOLOLayoutDetector

YOLOLayoutDetector(device: Optional[str] = None, show_log: bool = False, model_path: Optional[Union[str, Path]] = None)

Bases: BaseLayoutDetector

YOLO-based layout detection implementation.

Initialize YOLO Layout Detector.

Usage Example

from omnidocs.tasks.layout_analysis.extractors.doc_layout_yolo import YOLOLayoutDetector

extractor = YOLOLayoutDetector()
result = extractor.extract("document.pdf")
print(f"Detected {len(result.layouts)} layout elements.")

FlorenceLayoutExtractor

A fine-tuned model for document layout analysis, improving bounding box accuracy in document images.

omnidocs.tasks.layout_analysis.extractors.florence.FlorenceLayoutDetector

FlorenceLayoutDetector(device: Optional[str] = None, show_log: bool = False, trust_remote_code: bool = True, **kwargs)

Bases: BaseLayoutDetector

Florence-based layout detection implementation.

Initialize Florence Layout Detector.

Usage Example

from omnidocs.tasks.layout_analysis.extractors.florence import FlorenceLayoutDetector

extractor = FlorenceLayoutDetector()
result = extractor.extract("image.png")
print(f"Detected {len(result.layouts)} layout elements.")

PaddleLayoutExtractor

An OCR tool that supports multiple languages and provides layout detection capabilities.

omnidocs.tasks.layout_analysis.extractors.paddle.PaddleLayoutDetector

PaddleLayoutDetector(device: Optional[str] = None, show_log: bool = False, **kwargs)

Bases: BaseLayoutDetector

PaddleOCR-based layout detection implementation.

Initialize PaddleOCR Layout Detector.

Usage Example

from omnidocs.tasks.layout_analysis.extractors.paddle import PaddleLayoutDetector

extractor = PaddleLayoutDetector()
result = extractor.extract("image.png")
print(f"Detected {len(result.layouts)} layout elements.")

RTDETRLayoutExtractor

Implementation of RT-DETR, a real-time detection transformer focusing on object detection tasks.

omnidocs.tasks.layout_analysis.extractors.rtdetr.RTDETRLayoutDetector

RTDETRLayoutDetector(device: Optional[str] = None, show_log: bool = False, model_path: Optional[Union[str, Path]] = None, num_threads: Optional[int] = 4, use_cpu_only: bool = True)

Bases: BaseLayoutDetector

RT-DETR-based layout detection implementation.

Initialize RT-DETR Layout Detector with careful device handling.

Usage Example

from omnidocs.tasks.layout_analysis.extractors.rtdetr import RTDETRLayoutDetector

extractor = RTDETRLayoutDetector()
result = extractor.extract("image.png")
print(f"Detected {len(result.layouts)} layout elements.")

SuryaLayoutExtractor

OCR and layout analysis tool supporting 90+ languages, including reading order and table recognition.

omnidocs.tasks.layout_analysis.extractors.surya.SuryaLayoutDetector

SuryaLayoutDetector(device: Optional[str] = None, show_log: bool = False, **kwargs)

Bases: BaseLayoutDetector

Surya-based layout detection implementation.

Initialize Surya Layout Detector.

Usage Example

from omnidocs.tasks.layout_analysis.extractors.surya import SuryaLayoutDetector

extractor = SuryaLayoutDetector()
result = extractor.extract("document.pdf")
print(f"Detected {len(result.layouts)} layout elements.")

LayoutOutput

The standardized output format for layout analysis results.

omnidocs.tasks.layout_analysis.base.LayoutOutput

Bases: BaseModel

Container for all detected layout boxes in an image.

Attributes:

Name	Type	Description
`bboxes`	`List[LayoutBox]`	List of detected LayoutBox objects
`page_number`	`Optional[int]`	Optional page number for multi-page documents
`image_size`	`Optional[Tuple[int, int]]`	Optional tuple of (width, height) of the processed image

save_json

save_json(output_path: Union[str, Path]) -> None

Save layout output to JSON file.

to_dict

to_dict() -> Dict

Convert to dictionary representation.

Key Properties

layouts (List[LayoutElement]): List of detected layout elements.
source_file (str): Path to the processed file.
source_img_size (Tuple[int, int]): Dimensions of the source image.

Key Methods

save_json(output_path): Save results to a JSON file.
visualize(image_path, output_path): Visualize layout elements on the source image.

LayoutElement

Represents a single detected layout element.

omnidocs.tasks.layout_analysis.base.BaseLayoutMapper

BaseLayoutMapper()

Base class for layout label mapping.

from_standard

from_standard(layout_label: LayoutLabel) -> Optional[str]

Convert standardized LayoutLabel to model-specific label.

to_standard

to_standard(model_label: str) -> Optional[LayoutLabel]

Convert model-specific label to standardized LayoutLabel.

Attributes

type (str): Type of the element (e.g., 'text', 'title', 'table', 'figure').
bbox (List[float]): Bounding box coordinates [x1, y1, x2, y2].
text_content (Optional[str]): Text content if applicable.
confidence (Optional[float]): Confidence score of the detection.

BaseLayoutExtractor

The abstract base class for all layout analysis extractors.

omnidocs.tasks.layout_analysis.base.BaseLayoutDetector

BaseLayoutDetector(show_log: bool = False)

Bases: ABC

Base class for all layout detection models.

preprocess_input

preprocess_input(input_path: Union[str, Path]) -> List[np.ndarray]

Convert input to processable format.

Parameters:

Name	Type	Description	Default
`input_path`	`Union[str, Path]`	Path to input image or PDF	required

Returns:

Type	Description
`List[ndarray]`	List of preprocessed images as numpy arrays

visualize

visualize(detection_result: Tuple[Image, LayoutOutput], output_path: Union[str, Path]) -> None

Save annotated image to file.

Parameters:

Name	Type	Description	Default
`detection_result`	`Tuple[Image, LayoutOutput]`	Tuple containing (PIL Image, LayoutOutput)	required
`output_path`	`Union[str, Path]`	Path to save visualization	required

LayoutMapper

Handles mapping of layout labels and normalization of bounding boxes.

omnidocs.tasks.layout_analysis.base.BaseLayoutMapper

BaseLayoutMapper()

Base class for layout label mapping.

from_standard

from_standard(layout_label: LayoutLabel) -> Optional[str]

Convert standardized LayoutLabel to model-specific label.

to_standard

to_standard(model_label: str) -> Optional[LayoutLabel]

Convert model-specific label to standardized LayoutLabel.

📐 Layout Analysis

Overview

Available Extractors

DocLayoutYOLOExtractor

omnidocs.tasks.layout_analysis.extractors.doc_layout_yolo.YOLOLayoutDetector

Usage Example

FlorenceLayoutExtractor

omnidocs.tasks.layout_analysis.extractors.florence.FlorenceLayoutDetector

Usage Example

PaddleLayoutExtractor

omnidocs.tasks.layout_analysis.extractors.paddle.PaddleLayoutDetector

Usage Example

RTDETRLayoutExtractor

omnidocs.tasks.layout_analysis.extractors.rtdetr.RTDETRLayoutDetector

Usage Example

SuryaLayoutExtractor

omnidocs.tasks.layout_analysis.extractors.surya.SuryaLayoutDetector

Usage Example

LayoutOutput

omnidocs.tasks.layout_analysis.base.LayoutOutput

save_json

to_dict

Key Properties

Key Methods

LayoutElement

omnidocs.tasks.layout_analysis.base.BaseLayoutMapper

from_standard

to_standard

Attributes

BaseLayoutExtractor

omnidocs.tasks.layout_analysis.base.BaseLayoutDetector

preprocess_input

visualize

LayoutMapper

omnidocs.tasks.layout_analysis.base.BaseLayoutMapper

from_standard

to_standard

Related Resources