Layout Analysis¶
Detect document structure and element boundaries.
Input / Output¶
Input: Document image
Output: List of bounding boxes with labels and confidence scores
result = detector.extract(image)
for elem in result.elements:
print(f"{elem.label}: {elem.bbox} ({elem.confidence:.2f})")
title: [50, 20, 500, 60] (0.98)
text: [50, 80, 900, 300] (0.95)
table: [50, 320, 900, 600] (0.92)
figure: [50, 620, 400, 900] (0.89)
Quick Start¶
from omnidocs.tasks.layout_analysis import DocLayoutYOLO, DocLayoutYOLOConfig
from PIL import Image
image = Image.open("document.png")
detector = DocLayoutYOLO(
config=DocLayoutYOLOConfig(device="cuda")
)
result = detector.extract(image)
for elem in result.elements:
print(f"{elem.label}: {elem.bbox}")
Available Models¶
| Model | Speed | Labels | Best For |
|---|---|---|---|
| DocLayoutYOLO | 0.1-0.2s/page | Fixed (11) | Speed |
| RT-DETR | 0.3-0.5s/page | Fixed (11) | Accuracy |
| Qwen Layout | 2-3s/page | Custom | Flexibility |
| VLM API | Varies | Custom | No GPU, any cloud provider |
Fixed Labels¶
Models like DocLayoutYOLO and RT-DETR detect these predefined labels:
| Label | Description |
|---|---|
title |
Document/section headings |
text |
Body paragraphs |
list |
Bullet or numbered lists |
table |
Data tables |
figure |
Images, diagrams, charts |
caption |
Figure/table captions |
formula |
Math equations |
footnote |
Footnotes |
page_header |
Running headers |
page_footer |
Running footers |
Custom Labels (Qwen Layout)¶
Qwen Layout can detect any custom elements you define.
Simple String Labels¶
from omnidocs.tasks.layout_analysis import QwenLayoutDetector
from omnidocs.tasks.layout_analysis.qwen import QwenLayoutPyTorchConfig
detector = QwenLayoutDetector(
backend=QwenLayoutPyTorchConfig(device="cuda")
)
# Detect custom elements
result = detector.extract(
image,
custom_labels=["code_block", "sidebar", "pull_quote", "diagram"]
)
for elem in result.elements:
print(f"{elem.label}: {elem.bbox}")
Structured Labels with Metadata¶
For advanced use cases, use CustomLabel with descriptions:
from omnidocs.tasks.layout_analysis import QwenLayoutDetector, CustomLabel
from omnidocs.tasks.layout_analysis.qwen import QwenLayoutPyTorchConfig
detector = QwenLayoutDetector(
backend=QwenLayoutPyTorchConfig(device="cuda")
)
# Structured labels with metadata
result = detector.extract(
image,
custom_labels=[
CustomLabel(
name="code_block",
description="Programming source code areas",
detection_prompt="Regions with monospace text and syntax highlighting",
color="#2ecc71",
),
CustomLabel(
name="sidebar",
description="Sidebar or callout content",
detection_prompt="Boxed regions with supplementary information",
color="#3498db",
),
CustomLabel(
name="warning_box",
description="Warning or alert boxes",
detection_prompt="Highlighted boxes with warning icons or red/yellow colors",
color="#e74c3c",
),
]
)
for elem in result.elements:
print(f"{elem.label}: {elem.bbox}")
Reusable Label Sets¶
Create reusable label collections for your domain:
from omnidocs.tasks.layout_analysis import CustomLabel
class TechnicalDocLabels:
"""Labels for technical documentation."""
CODE_BLOCK = CustomLabel(
name="code_block",
description="Source code listings",
color="#2ecc71"
)
API_REFERENCE = CustomLabel(
name="api_reference",
description="API documentation tables",
color="#3498db"
)
DIAGRAM = CustomLabel(
name="diagram",
description="Architecture diagrams",
color="#9b59b6"
)
@classmethod
def all(cls):
return [cls.CODE_BLOCK, cls.API_REFERENCE, cls.DIAGRAM]
# Use across projects
result = detector.extract(image, custom_labels=TechnicalDocLabels.all())
Fixed vs Custom Labels¶
| Feature | Fixed (YOLO, RT-DETR) | Custom (Qwen) |
|---|---|---|
| Speed | 0.1-0.5s/page | 2-3s/page |
| Labels | 11 predefined | Unlimited custom |
| Accuracy | High on standard docs | Good on any doc |
| Use case | Standard documents | Domain-specific |
Choose Fixed Labels when: - Processing standard documents - Speed is critical - Standard elements are sufficient
Choose Custom Labels when: - Need domain-specific elements (code, sidebars, etc.) - Processing non-standard documents - Flexibility is more important than speed
Filtering Results¶
# By label
tables = [e for e in result.elements if e.label == "table"]
figures = [e for e in result.elements if e.label == "figure"]
# By confidence
confident = [e for e in result.elements if e.confidence >= 0.8]
# Exclude headers/footers
content = [e for e in result.elements
if e.label not in ["page_header", "page_footer"]]
When to Use¶
✅ Document structure analysis ✅ Finding tables and figures ✅ Building multi-stage pipelines ✅ Filtering unwanted elements ✅ Domain-specific element detection (custom labels)
❌ Need readable text → Use Text Extraction ❌ Need word positions → Use OCR
Upcoming Models¶
| Model | Description | Status |
|---|---|---|
| SuryaLayout | Modern layout detection | 🔜 Soon |