Reading Order¶
Determine the logical reading sequence of document elements. Essential for correct text flow in multi-column layouts, documents with figures, and complex page structures.
Overview¶
Reading order prediction takes layout detection and OCR results and produces:
- Ordered elements - Elements sorted in reading sequence
- Caption associations - Links between figures/tables and their captions
- Footnote mapping - Links between content and footnotes
- Merge suggestions - Elements that should be combined (split paragraphs)
Quick Start¶
from omnidocs.tasks.reading_order import RuleBasedReadingOrderPredictor
from omnidocs.tasks.layout_extraction import DocLayoutYOLO, DocLayoutYOLOConfig
from omnidocs.tasks.ocr_extraction import EasyOCR, EasyOCRConfig
# Initialize components
layout_extractor = DocLayoutYOLO(config=DocLayoutYOLOConfig())
ocr = EasyOCR(config=EasyOCRConfig())
predictor = RuleBasedReadingOrderPredictor()
# Process document
layout = layout_extractor.extract(image)
ocr_result = ocr.extract(image)
reading_order = predictor.predict(layout, ocr_result)
# Get text in reading order
text = reading_order.get_full_text()
print(text)
Available Models¶
| Model | Speed | Use Case |
|---|---|---|
| Rule-based (R-tree) | <0.1s | Multi-column, general documents |
Output Format¶
ReadingOrderOutput¶
result = predictor.predict(layout, ocr_result)
# Ordered elements
for elem in result.ordered_elements:
print(f"{elem.index}: {elem.element_type.value} - {elem.text[:50]}")
# Caption associations (figure_id -> [caption_ids])
for fig_id, caption_ids in result.caption_map.items():
print(f"Figure {fig_id} has captions: {caption_ids}")
# Footnote associations
for elem_id, footnote_ids in result.footnote_map.items():
print(f"Element {elem_id} has footnotes: {footnote_ids}")
# Merge suggestions (for split paragraphs)
for elem_id, merge_ids in result.merge_map.items():
print(f"Element {elem_id} should merge with: {merge_ids}")
OrderedElement¶
elem.index # Position in reading order
elem.element_type # ElementType (TITLE, TEXT, FIGURE, TABLE, etc.)
elem.bbox # BoundingBox(x1, y1, x2, y2)
elem.text # Text content (from OCR)
elem.confidence # Detection confidence
elem.page_no # Page number
elem.original_id # ID from original layout detection
Element Types¶
from omnidocs.tasks.reading_order import ElementType
ElementType.TITLE
ElementType.TEXT
ElementType.LIST
ElementType.FIGURE
ElementType.TABLE
ElementType.CAPTION
ElementType.FORMULA
ElementType.FOOTNOTE
ElementType.PAGE_HEADER
ElementType.PAGE_FOOTER
ElementType.CODE
ElementType.OTHER
Helper Methods¶
Get Full Text¶
Get Elements by Type¶
# Get all tables
tables = result.get_elements_by_type(ElementType.TABLE)
# Get all figures
figures = result.get_elements_by_type(ElementType.FIGURE)
Get Captions¶
# Get captions for a specific figure
for elem in result.ordered_elements:
if elem.element_type == ElementType.FIGURE:
captions = result.get_captions_for(elem.original_id)
print(f"Figure captions: {[c.text for c in captions]}")
Pipeline: Complete Document Processing¶
from omnidocs.tasks.reading_order import RuleBasedReadingOrderPredictor
from omnidocs.tasks.layout_extraction import DocLayoutYOLO, DocLayoutYOLOConfig
from omnidocs.tasks.ocr_extraction import EasyOCR, EasyOCRConfig
from PIL import Image
# Load document
image = Image.open("document.png")
# 1. Layout detection
layout_extractor = DocLayoutYOLO(config=DocLayoutYOLOConfig(device="cuda"))
layout = layout_extractor.extract(image)
print(f"Found {len(layout.bboxes)} elements")
# 2. OCR extraction
ocr = EasyOCR(config=EasyOCRConfig(gpu=True))
ocr_result = ocr.extract(image)
print(f"Found {len(ocr_result.text_blocks)} text blocks")
# 3. Reading order prediction
predictor = RuleBasedReadingOrderPredictor()
reading_order = predictor.predict(layout, ocr_result)
# 4. Process in reading order
for elem in reading_order.ordered_elements:
if elem.element_type == ElementType.TITLE:
print(f"# {elem.text}")
elif elem.element_type == ElementType.TEXT:
print(f"{elem.text}\n")
elif elem.element_type == ElementType.TABLE:
print(f"[Table at position {elem.index}]")
elif elem.element_type == ElementType.FIGURE:
captions = reading_order.get_captions_for(elem.original_id)
print(f"[Figure: {captions[0].text if captions else 'No caption'}]")
How It Works¶
The rule-based predictor uses:
- R-tree spatial indexing - Efficient spatial queries
- Column detection - Identifies multi-column layouts
- Vertical flow - Elements flow top-to-bottom within columns
- Header/footer separation - Processes these separately
- Caption proximity - Associates captions with nearby figures/tables
Tips¶
- Use quality layout detection - Reading order depends on accurate layout
- Include OCR - Text content enables better merge detection
- Check caption associations - Verify figures have correct captions
- Handle page headers/footers - These are processed separately
Limitations¶
- Works best with standard document layouts
- Very complex layouts (nested columns) may need tuning
- Depends on quality of layout detection input
- Single-page processing (process pages independently for multi-page docs)