Skip to content

Layout Analysis in OmniDocs

Layout analysis is the process of detecting and classifying regions (text, tables, images, etc.) in documents or images. OmniDocs provides a unified interface to several state-of-the-art layout detection backends, making it easy to experiment, compare, and integrate them into your workflows.

Layout Analysis?

Layout analysis breaks a document page into its logical componentsโ€”like paragraphs, tables, figures, and headersโ€”by predicting bounding boxes and labels for each region. This is a crucial first step for downstream tasks like OCR, table extraction, and document understanding.

๐Ÿงฉ Supported Layout Detectors

OmniDocs supports multiple layout detection engines, each with its own strengths:

Detector Backend/Model Highlights
Paddle PaddleOCR Layout Fast, robust, easy to use, good for scanned docs
RTDETR RT-DETR Real-time, transformer-based, accurate
Surya Surya Layout Modern, high-accuracy, Indian docs friendly
YOLO YOLOv8/YOLOv5 Fast, customizable, works on many layouts
Florence Florence Layout (If available) Large foundation model, generalizes well

Tip: You can easily switch between detectors by changing a single import/class name.

๐Ÿ“ How to Use

All layout detectors follow the same API pattern:

from omnidocs.tasks.layout_analysis.extractors.paddle import PaddleLayoutDetector

detector = PaddleLayoutDetector(device='cpu')
image_path = "path/to/your/document.png"
annotated_image, layout_output = detector.detect(image_path)

# Visualize results
detector.visualize((annotated_image, layout_output), "output.png")
  • Change PaddleLayoutDetector to any other detector (e.g., RTDETRLayoutDetector, SuryaLayoutDetector, YOLOLayoutDetector) to use a different backend.
  • All detectors return both the annotated image and a structured output with bounding boxes and labels.

๐Ÿ“’ Example Notebooks

See the tutorial notebooks for hands-on examples: - Paddle Layout Analysis - YOLO Layout Analysis - RTDETR Layout Analysis - Surya Layout Analysis

Each notebook demonstrates: - How to initialize and use the detector - How to visualize results - How to interpret the output

๐Ÿ› ๏ธ Advanced Tips

  • Device Selection: Most detectors support device='cpu' or device='cuda' for GPU acceleration.
  • Custom Models: For YOLO and Surya, you can plug in your own trained weights.
  • Batch Processing: Use Python loops or scripts to process folders of images.
  • Output Structure: All detectors return bounding boxes, class labels, and (optionally) confidence scores.

๐Ÿ”— Next Steps

  • Try out the notebooks with your own documents.
  • Read the API Reference for advanced usage and customization.
  • Explore downstream tasks like OCR and table extraction using the detected layouts.

OmniDocs makes layout analysis accessible, reproducible, and extensible, no matter which backend you choose.