📚 API Reference

Welcome to the comprehensive OmniDocs API Reference! This documentation provides detailed information about all classes, functions, and modules in the OmniDocs ecosystem.

🧩 Core Components

Core Classes - Base classes and fundamental components
Utils - Utility functions and helpers

📋 Tasks & Extractors

Layout Analysis - Document structure detection
Text Extraction - Text parsing from documents
Table Extraction - Tabular data extraction
OCR - Optical Character Recognition
Math Expression - Mathematical formula extraction

🎯 Getting Started with the API

Basic Usage Pattern

All OmniDocs extractors follow a consistent interface:

# 1. Import the extractor
from omnidocs.tasks.{task}.extractors.{extractor} import {ExtractorClass}

# 2. Initialize with configuration
extractor = ExtractorClass(
    # Common parameters
    device='cpu',           # or 'cuda' for GPU
    show_log=True,         # Enable logging
    languages=['en'],      # Supported languages
    # Extractor-specific parameters...
)

# 3. Extract from document
result = extractor.extract("path/to/document.pdf")

# 4. Access results
print(result.full_text)    # For text-based results
print(result.tables)       # For table results
print(result.texts)        # For OCR results

Common Parameters

Most extractors support these common parameters:

Parameter	Type	Default	Description
`device`	`str`	`'cpu'`	Device to run on ('cpu' or 'cuda')
`show_log`	`bool`	`False`	Enable detailed logging
`languages`	`List[str]`	`['en']`	Languages to support

Result Objects

All extractors return structured result objects:

OCROutput

class OCROutput:
    texts: List[OCRText]           # Individual text regions
    full_text: str                 # Combined text
    source_img_size: Tuple[int, int]  # Original image dimensions
    processing_time: Optional[float]   # Extraction time
    metadata: Dict[str, Any]       # Additional information

TableOutput

class TableOutput:
    tables: List[Table]            # Extracted tables
    source_file: str              # Source document path
    processing_time: Optional[float]  # Extraction time
    metadata: Dict[str, Any]      # Additional information

TextOutput

class TextOutput:
    text_blocks: List[TextBlock]   # Text blocks with positions
    full_text: str                # Combined text
    source_file: str              # Source document path
    processing_time: Optional[float]  # Extraction time
    metadata: Dict[str, Any]      # Additional information

🔧 Advanced Usage

Batch Processing

Process multiple documents efficiently:

from pathlib import Path
from omnidocs.tasks.ocr_extraction.extractors.easy_ocr import EasyOCRExtractor

extractor = EasyOCRExtractor()
documents = Path("documents/").glob("*.pdf")

results = []
for doc in documents:
    try:
        result = extractor.extract(str(doc))
        results.append({
            'file': doc.name,
            'text': result.full_text,
            'confidence': sum(t.confidence for t in result.texts) / len(result.texts)
        })
    except Exception as e:
        print(f"Error processing {doc}: {e}")

Custom Configuration

Configure extractors for specific use cases:

# High-accuracy OCR setup
ocr_extractor = EasyOCRExtractor(
    languages=['en', 'fr', 'de'],
    device='cuda',
    show_log=True
)

# Fast table extraction setup
table_extractor = CamelotExtractor(
    flavor='stream',        # Faster than 'lattice'
    edge_tol=500,          # Edge tolerance
    row_tol=2              # Row tolerance
)

Error Handling

Robust error handling patterns:

from omnidocs.tasks.table_extraction.extractors.camelot import CamelotExtractor

extractor = CamelotExtractor()

try:
    result = extractor.extract("document.pdf")
    if result.tables:
        print(f"Successfully extracted {len(result.tables)} tables")
    else:
        print("No tables found in document")
except FileNotFoundError:
    print("Document file not found")
except Exception as e:
    print(f"Extraction failed: {e}")

🎨 Visualization

Most extractors support result visualization:

# Visualize OCR results
ocr_result = ocr_extractor.extract("image.png")
ocr_extractor.visualize(
    result=ocr_result,
    image_path="image.png",
    output_path="ocr_visualization.png",
    show_text=True,
    show_confidence=True
)

# Visualize table extraction
table_result = table_extractor.extract("document.pdf")
table_extractor.visualize(
    result=table_result,
    image_path="document.pdf",
    output_path="table_visualization.png"
)

📊 Performance Optimization

GPU Acceleration

Enable GPU support for faster processing:

# Check GPU availability
import torch
if torch.cuda.is_available():
    device = 'cuda'
    print(f"Using GPU: {torch.cuda.get_device_name()}")
else:
    device = 'cpu'
    print("Using CPU")

# Initialize with GPU
extractor = EasyOCRExtractor(device=device)

Memory Management

For large-scale processing:

import gc
from omnidocs.tasks.ocr_extraction.extractors.easy_ocr import EasyOCRExtractor

extractor = EasyOCRExtractor()

for i, document in enumerate(large_document_list):
    result = extractor.extract(document)
    # Process result...

    # Clean up memory every 100 documents
    if i % 100 == 0:
        gc.collect()
        if torch.cuda.is_available():
            torch.cuda.empty_cache()

🔍 Debugging

Enable Detailed Logging

import logging
from omnidocs.utils.logging import get_logger

# Set up logging
logging.basicConfig(level=logging.DEBUG)
logger = get_logger(__name__)

# Initialize extractor with logging
extractor = EasyOCRExtractor(show_log=True)

Inspect Results

result = extractor.extract("document.pdf")

# Inspect result structure
print(f"Result type: {type(result)}")
print(f"Available attributes: {dir(result)}")

# For OCR results
if hasattr(result, 'texts'):
    print(f"Number of text regions: {len(result.texts)}")
    for i, text in enumerate(result.texts[:3]):  # First 3
        print(f"Text {i}: {text.text[:50]}...")
        print(f"Confidence: {text.confidence:.3f}")
        print(f"Bbox: {text.bbox}")

# For table results
if hasattr(result, 'tables'):
    print(f"Number of tables: {len(result.tables)}")
    for i, table in enumerate(result.tables):
        print(f"Table {i} shape: {table.df.shape}")

📚 Examples by Use Case

Document Digitization

from omnidocs.tasks.ocr_extraction.extractors.easy_ocr import EasyOCRExtractor

extractor = EasyOCRExtractor(languages=['en'])
result = extractor.extract("scanned_document.png")
with open("digitized.txt", "w") as f:
    f.write(result.full_text)

Financial Report Processing

from omnidocs.tasks.table_extraction.extractors.camelot import CamelotExtractor

extractor = CamelotExtractor()
result = extractor.extract("financial_report.pdf")
for i, table in enumerate(result.tables):
    table.df.to_csv(f"financial_table_{i}.csv", index=False)

Academic Paper Analysis

from omnidocs.tasks.math_expression_extraction.extractors.nougat import NougatExtractor

extractor = NougatExtractor()
result = extractor.extract("research_paper.pdf")
print("Extracted LaTeX formulas:")
print(result.full_text)

🚨 Common Issues & Solutions

Import Errors

# Check if dependencies are installed
try:
    from omnidocs.tasks.ocr_extraction.extractors.easy_ocr import EasyOCRExtractor
    print("✅ EasyOCR available")
except ImportError as e:
    print(f"❌ EasyOCR not available: {e}")
    print("Install with: pip install easyocr")

Memory Issues

# For large documents, process page by page
from omnidocs.tasks.table_extraction.extractors.camelot import CamelotExtractor

extractor = CamelotExtractor()
# Process specific pages instead of all pages
result = extractor.extract("large_document.pdf", pages="1-5")

Language Support

# Check supported languages
extractor = EasyOCRExtractor()
supported = extractor.get_supported_languages()
print(f"Supported languages: {supported}")

Getting Started Guide - Quick introduction
Task Tutorials - Detailed task-specific guides
GitHub Repository - Source code and issues
Contributing Guide - How to contribute

This API reference is automatically generated from the source code. For the most up-to-date information, please refer to the docstrings in the source code.