Skip to content

OmniDocs

DodFusion Banner

OmniDocs is a powerful framework that simplifies document understanding and analysis. It provides a unified, production-ready interface for essential document processing tasks including:

  • Layout Analysis & Detection
  • OCR & Text Extraction
  • Table Detection & Extraction
  • Reading Order Detection
  • Document Understanding & Analysis

By abstracting away the complexities of integrating multiple libraries and models, OmniDocs enables developers to build robust document processing workflows with minimal effort. Whether you're working with academic papers, business documents, or complex technical materials, OmniDocs provides the tools you need to extract, analyze, and understand document content efficiently.

πŸ”§ Installation

Prerequisites

  • Python 3.11 or higher
  • pip package manager
  • Optional (for GPU support): A compatible NVIDIA GPU with CUDA 12.1

Setting Up Your Environment

To set up your environment, you can choose one of the following methods:

  1. Using conda:

    conda create -n omnidocs python=3.11
    conda activate omnidocs
    

  2. Using venv:

    python3 -m venv omnidocs
    source omnidocs/bin/activate  # For Linux/macOS
    .\omnidocs\Scripts\activate   # For Windows
    

  3. Using poetry:

    poetry new omnidocs
    cd omnidocs
    poetry install
    

Installing PyTorch

To install PyTorch, choose one of the following options based on whether you want GPU support:

  • With GPU support (CUDA 12.1):

    pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
    

  • Without GPU support:

    pip install torch torchvision torchaudio
    

Installing OmniDocs

Once your environment is set up and PyTorch is installed, you can install OmniDocs:

  1. From PyPI:

    pip install omnidocs
    

  2. From source: If you prefer to install directly from the source, you can use the following command:

    pip install -e .
    

πŸ› οΈ Getting Started

Here's a quick example to demonstrate the power of OmniDocs:

tutorial coming soon

πŸ“š Supported Models and Libraries

OmniDocs integrates seamlessly with a variety of popular tools, including:

  • βœ… : working and tested
  • ⏳ : planned/in-progress support
  • ❌ : no support

Layout Analysis

Detection Model Source License CPU GPU Info
βœ…DocLayout YOLO GitHub - DocLayout-YOLO AGPL-3.0 ⏳ βœ… A robust layout detection model based on YOLO-v10, designed for diverse document types.
βœ…PPStructure (Paddle OCR) GitHub - PaddleOCR Apache 2.0 βœ… βœ… An OCR tool that supports multiple languages and provides layout detection capabilities.
βœ…RT DETR (Docling) GitHub - RT-DETR MIT ⏳ βœ… Implementation of RT-DETR, a real-time detection transformer focusing on object detection tasks.
βœ…Florence-2-DocLayNet-Fixed Hugging Face - Florence-2-DocLayNet-Fixed MIT ❌ βœ… Fine-tuned model for document layout analysis, improving bounding box accuracy in document images.
βœ…Surya Layout GitHub - Surya GPL-3.0-or-later βœ… βœ… OCR and layout analysis tool supporting 90+ languages, including reading order and table recognition.
⏳Layout LM V3 Hugging Face - LayoutLMv3 CC BY-NC-SA 4.0 ⏳ ⏳ A pre-trained multimodal Transformer for Document AI, effective for various document understanding tasks.
⏳Fast / Faster R CNN / MR CNN GitHub - Faster R-CNN MIT ⏳ ⏳ A library implementing the Faster R-CNN architecture for object detection, widely used in layout tasks.

Text Extraction

Extraction Libraries Source License CPU GPU Info
PyPDF2 GitHub - PyPDF2 MIT βœ… βœ… A library for extracting text from PDFs.
PyMuPDF GitHub - PyMuPDF MIT βœ… βœ… A library for extracting text from PDFs.
pdfplumber GitHub - pdfplumber MIT βœ… βœ… A library for extracting text from PDFs.
Docling Parse GitHub - Docling MIT ⏳ ⏳ A library for extracting text from PDFs.

OCR

OCR Library Source License CPU GPU Info
Paddle OCR GitHub - PaddleOCR Apache 2.0 βœ… βœ… An OCR tool that supports multiple languages and provides layout detection capabilities.
Tesseract GitHub - Tesseract BSD-3-Clause βœ… βœ… An open-source OCR engine that supports multiple languages and is widely used for text extraction from images.
EasyOCR GitHub - EasyOCR MIT βœ… βœ… A simple and easy-to-use OCR library that supports multiple languages and is built on PyTorch.

Table Extraction

Extraction Libraries/Models Source License CPU GPU Info
PPStructure (Paddle OCR) GitHub - PaddleOCR Apache 2.0 βœ… βœ… An OCR tool that supports multiple languages and provides layout detection capabilities.
Camelot GitHub - Camelot MIT βœ… βœ… A Python library for extracting tables from PDFs.
Tabula GitHub - Tabula MIT βœ… βœ… A tool for extracting tables from PDFs.
Table Transformer GitHub - Table Transformer MIT ⏳ ⏳ A transformer model for table extraction.
TableFormer (Docling) GitHub - Docling MIT ⏳ ⏳ A transformer model for table extraction.

πŸ—οΈ How It Works

OmniDocs organizes document processing tasks into modular components. Each component corresponds to a specific task and offers:

  1. A Unified Interface: Consistent input and output formats.
  2. Model Independence: Switch between libraries or models effortlessly.
  3. Pipeline Flexibility: Combine components to create custom workflows.

πŸ“ˆ Roadmap

  • Add support for semantic understanding tasks (e.g., entity extraction).
  • Integrate pre-trained transformer models for context-aware document analysis.
  • Expand pipelines for multilingual document processing.
  • Add CLI support for batch processing.

🀝 Contributing

We welcome contributions to OmniDocs! Here's how you can help:

  1. Fork the repository.
  2. Create a new branch for your feature or bug fix.
  3. Commit your changes and open a pull request.

For more details, refer to our CONTRIBUTING.md.

πŸ›‘οΈ License

This project is licensed under multiple licenses, depending on the models and libraries you use in your pipeline. Please refer to the individual licenses of each component for specific terms and conditions.

🌟 Support the Project

If you find OmniDocs helpful, please give us a ⭐ on GitHub and share it with others in the community.

πŸ—¨οΈ Join the Community

For discussions, questions, or feedback:

  • Issues: Report bugs or suggest features here.
  • Email: Reach out at adithyaskolavi@gmail.com