Layout Analysis¶

Detect document structure and element boundaries.

Input / Output¶

Input: Document image

Output: List of bounding boxes with labels and confidence scores

result = detector.extract(image)
for elem in result.elements:
    print(f"{elem.label}: {elem.bbox} ({elem.confidence:.2f})")

title: [50, 20, 500, 60] (0.98)
text: [50, 80, 900, 300] (0.95)
table: [50, 320, 900, 600] (0.92)
figure: [50, 620, 400, 900] (0.89)

Quick Start¶

from omnidocs.tasks.layout_analysis import DocLayoutYOLO, DocLayoutYOLOConfig
from PIL import Image

image = Image.open("document.png")

detector = DocLayoutYOLO(
    config=DocLayoutYOLOConfig(device="cuda")
)

result = detector.extract(image)

for elem in result.elements:
    print(f"{elem.label}: {elem.bbox}")

Available Models¶

Model	Speed	Labels	Best For
DocLayoutYOLO	0.1-0.2s/page	Fixed (11)	Speed
RT-DETR	0.3-0.5s/page	Fixed (11)	Accuracy
Qwen Layout	2-3s/page	Custom	Flexibility
VLM API	Varies	Custom	No GPU, any cloud provider

Fixed Labels¶

Models like DocLayoutYOLO and RT-DETR detect these predefined labels:

Label	Description
`title`	Document/section headings
`text`	Body paragraphs
`list`	Bullet or numbered lists
`table`	Data tables
`figure`	Images, diagrams, charts
`caption`	Figure/table captions
`formula`	Math equations
`footnote`	Footnotes
`page_header`	Running headers
`page_footer`	Running footers

Custom Labels (Qwen Layout)¶

Qwen Layout can detect any custom elements you define.

Simple String Labels¶

from omnidocs.tasks.layout_analysis import QwenLayoutDetector
from omnidocs.tasks.layout_analysis.qwen import QwenLayoutPyTorchConfig

detector = QwenLayoutDetector(
    backend=QwenLayoutPyTorchConfig(device="cuda")
)

# Detect custom elements
result = detector.extract(
    image,
    custom_labels=["code_block", "sidebar", "pull_quote", "diagram"]
)

for elem in result.elements:
    print(f"{elem.label}: {elem.bbox}")

Structured Labels with Metadata¶

For advanced use cases, use CustomLabel with descriptions:

from omnidocs.tasks.layout_analysis import QwenLayoutDetector, CustomLabel
from omnidocs.tasks.layout_analysis.qwen import QwenLayoutPyTorchConfig

detector = QwenLayoutDetector(
    backend=QwenLayoutPyTorchConfig(device="cuda")
)

# Structured labels with metadata
result = detector.extract(
    image,
    custom_labels=[
        CustomLabel(
            name="code_block",
            description="Programming source code areas",
            detection_prompt="Regions with monospace text and syntax highlighting",
            color="#2ecc71",
        ),
        CustomLabel(
            name="sidebar",
            description="Sidebar or callout content",
            detection_prompt="Boxed regions with supplementary information",
            color="#3498db",
        ),
        CustomLabel(
            name="warning_box",
            description="Warning or alert boxes",
            detection_prompt="Highlighted boxes with warning icons or red/yellow colors",
            color="#e74c3c",
        ),
    ]
)

for elem in result.elements:
    print(f"{elem.label}: {elem.bbox}")

Reusable Label Sets¶

Create reusable label collections for your domain:

from omnidocs.tasks.layout_analysis import CustomLabel

class TechnicalDocLabels:
    """Labels for technical documentation."""

    CODE_BLOCK = CustomLabel(
        name="code_block",
        description="Source code listings",
        color="#2ecc71"
    )

    API_REFERENCE = CustomLabel(
        name="api_reference",
        description="API documentation tables",
        color="#3498db"
    )

    DIAGRAM = CustomLabel(
        name="diagram",
        description="Architecture diagrams",
        color="#9b59b6"
    )

    @classmethod
    def all(cls):
        return [cls.CODE_BLOCK, cls.API_REFERENCE, cls.DIAGRAM]

# Use across projects
result = detector.extract(image, custom_labels=TechnicalDocLabels.all())

Fixed vs Custom Labels¶

Feature	Fixed (YOLO, RT-DETR)	Custom (Qwen)
Speed	0.1-0.5s/page	2-3s/page
Labels	11 predefined	Unlimited custom
Accuracy	High on standard docs	Good on any doc
Use case	Standard documents	Domain-specific

Choose Fixed Labels when: - Processing standard documents - Speed is critical - Standard elements are sufficient

Choose Custom Labels when: - Need domain-specific elements (code, sidebars, etc.) - Processing non-standard documents - Flexibility is more important than speed

Filtering Results¶

# By label
tables = [e for e in result.elements if e.label == "table"]
figures = [e for e in result.elements if e.label == "figure"]

# By confidence
confident = [e for e in result.elements if e.confidence >= 0.8]

# Exclude headers/footers
content = [e for e in result.elements
           if e.label not in ["page_header", "page_footer"]]

When to Use¶

✅ Document structure analysis ✅ Finding tables and figures ✅ Building multi-stage pipelines ✅ Filtering unwanted elements ✅ Domain-specific element detection (custom labels)

❌ Need readable text → Use Text Extraction ❌ Need word positions → Use OCR

Upcoming Models¶

Model	Description	Status
SuryaLayout	Modern layout detection	🔜 Soon