📐 Layout Analysis
This section documents the API for layout analysis tasks, including various extractors for detecting and analyzing document structure.
Overview
Layout analysis in OmniDocs focuses on identifying and categorizing different regions within a document, such as text blocks, images, tables, and figures. This is crucial for understanding the document's overall structure and reading order.
Available Extractors
DocLayoutYOLOExtractor
A layout detection model based on YOLO-v10, designed for diverse document types.
omnidocs.tasks.layout_analysis.extractors.doc_layout_yolo.YOLOLayoutDetector
YOLOLayoutDetector(device: Optional[str] = None, show_log: bool = False, model_path: Optional[Union[str, Path]] = None)
Bases: BaseLayoutDetector
YOLO-based layout detection implementation.
Initialize YOLO Layout Detector.
Usage Example
from omnidocs.tasks.layout_analysis.extractors.doc_layout_yolo import YOLOLayoutDetector
extractor = YOLOLayoutDetector()
result = extractor.extract("document.pdf")
print(f"Detected {len(result.layouts)} layout elements.")
FlorenceLayoutExtractor
A fine-tuned model for document layout analysis, improving bounding box accuracy in document images.
omnidocs.tasks.layout_analysis.extractors.florence.FlorenceLayoutDetector
FlorenceLayoutDetector(device: Optional[str] = None, show_log: bool = False, trust_remote_code: bool = True, **kwargs)
Bases: BaseLayoutDetector
Florence-based layout detection implementation.
Initialize Florence Layout Detector.
Usage Example
from omnidocs.tasks.layout_analysis.extractors.florence import FlorenceLayoutDetector
extractor = FlorenceLayoutDetector()
result = extractor.extract("image.png")
print(f"Detected {len(result.layouts)} layout elements.")
PaddleLayoutExtractor
An OCR tool that supports multiple languages and provides layout detection capabilities.
omnidocs.tasks.layout_analysis.extractors.paddle.PaddleLayoutDetector
Bases: BaseLayoutDetector
PaddleOCR-based layout detection implementation.
Initialize PaddleOCR Layout Detector.
Usage Example
from omnidocs.tasks.layout_analysis.extractors.paddle import PaddleLayoutDetector
extractor = PaddleLayoutDetector()
result = extractor.extract("image.png")
print(f"Detected {len(result.layouts)} layout elements.")
RTDETRLayoutExtractor
Implementation of RT-DETR, a real-time detection transformer focusing on object detection tasks.
omnidocs.tasks.layout_analysis.extractors.rtdetr.RTDETRLayoutDetector
RTDETRLayoutDetector(device: Optional[str] = None, show_log: bool = False, model_path: Optional[Union[str, Path]] = None, num_threads: Optional[int] = 4, use_cpu_only: bool = True)
Bases: BaseLayoutDetector
RT-DETR-based layout detection implementation.
Initialize RT-DETR Layout Detector with careful device handling.
Usage Example
from omnidocs.tasks.layout_analysis.extractors.rtdetr import RTDETRLayoutDetector
extractor = RTDETRLayoutDetector()
result = extractor.extract("image.png")
print(f"Detected {len(result.layouts)} layout elements.")
SuryaLayoutExtractor
OCR and layout analysis tool supporting 90+ languages, including reading order and table recognition.
omnidocs.tasks.layout_analysis.extractors.surya.SuryaLayoutDetector
Bases: BaseLayoutDetector
Surya-based layout detection implementation.
Initialize Surya Layout Detector.
Usage Example
from omnidocs.tasks.layout_analysis.extractors.surya import SuryaLayoutDetector
extractor = SuryaLayoutDetector()
result = extractor.extract("document.pdf")
print(f"Detected {len(result.layouts)} layout elements.")
LayoutOutput
The standardized output format for layout analysis results.
omnidocs.tasks.layout_analysis.base.LayoutOutput
Bases: BaseModel
Container for all detected layout boxes in an image.
Attributes:
Name | Type | Description |
---|---|---|
bboxes |
List[LayoutBox]
|
List of detected LayoutBox objects |
page_number |
Optional[int]
|
Optional page number for multi-page documents |
image_size |
Optional[Tuple[int, int]]
|
Optional tuple of (width, height) of the processed image |
Key Properties
layouts
(List[LayoutElement]): List of detected layout elements.source_file
(str): Path to the processed file.source_img_size
(Tuple[int, int]): Dimensions of the source image.
Key Methods
save_json(output_path)
: Save results to a JSON file.visualize(image_path, output_path)
: Visualize layout elements on the source image.
LayoutElement
Represents a single detected layout element.
omnidocs.tasks.layout_analysis.base.BaseLayoutMapper
Attributes
type
(str): Type of the element (e.g., 'text', 'title', 'table', 'figure').bbox
(List[float]): Bounding box coordinates [x1, y1, x2, y2].text_content
(Optional[str]): Text content if applicable.confidence
(Optional[float]): Confidence score of the detection.
BaseLayoutExtractor
The abstract base class for all layout analysis extractors.
omnidocs.tasks.layout_analysis.base.BaseLayoutDetector
Bases: ABC
Base class for all layout detection models.
preprocess_input
Convert input to processable format.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_path
|
Union[str, Path]
|
Path to input image or PDF |
required |
Returns:
Type | Description |
---|---|
List[ndarray]
|
List of preprocessed images as numpy arrays |
visualize
Save annotated image to file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
detection_result
|
Tuple[Image, LayoutOutput]
|
Tuple containing (PIL Image, LayoutOutput) |
required |
output_path
|
Union[str, Path]
|
Path to save visualization |
required |
LayoutMapper
Handles mapping of layout labels and normalization of bounding boxes.