RT-DETR¶

High-accuracy document layout detection.

Overview¶


Tasks	Layout Analysis
Backends	PyTorch
Speed	0.3-0.5s/page
Quality	Excellent
VRAM	4-6GB

Why RT-DETR¶

Higher accuracy than YOLO-based detectors
Better on small elements - catches details YOLO misses
Transformer architecture - modern, effective
Same labels as DocLayoutYOLO - drop-in replacement

Basic Usage¶

from omnidocs.tasks.layout_analysis import RTDETRLayoutDetector, RTDETRConfig
from PIL import Image

image = Image.open("document.png")

detector = RTDETRLayoutDetector(
    config=RTDETRConfig(device="cuda")
)

result = detector.extract(image)

for elem in result.elements:
    print(f"{elem.label}: {elem.bbox} ({elem.confidence:.2f})")

Configuration¶

config = RTDETRConfig(
    device="cuda",        # "cuda" or "cpu"
    confidence=0.3,       # Detection threshold
)

RT-DETR vs DocLayoutYOLO¶

	RT-DETR	DocLayoutYOLO
Speed	0.3-0.5s/page	0.1-0.2s/page
Accuracy	Higher	Good
Small elements	Better	May miss
Memory	4-6GB	2-4GB
Use case	Accuracy-critical	Speed-critical

When to Use¶

✅ Need highest accuracy ✅ Documents with small elements ✅ Quality over speed

❌ Speed-critical → Use DocLayoutYOLO ❌ Custom labels → Use Qwen Layout