OmniDocs

DodFusion Banner

OmniDocs is a powerful framework that simplifies document understanding and analysis. It provides a unified, production-ready interface for essential document processing tasks including:

Layout Analysis & Detection
OCR & Text Extraction
Table Detection & Extraction
Reading Order Detection
Document Understanding & Analysis

By abstracting away the complexities of integrating multiple libraries and models, OmniDocs enables developers to build robust document processing workflows with minimal effort. Whether you're working with academic papers, business documents, or complex technical materials, OmniDocs provides the tools you need to extract, analyze, and understand document content efficiently.

🔧 Installation

Prerequisites

Python 3.11 or higher
pip package manager
Optional (for GPU support): A compatible NVIDIA GPU with CUDA 12.1

Setting Up Your Environment

To set up your environment, you can choose one of the following methods:

Using conda:

conda create -n omnidocs python=3.11
conda activate omnidocs

Using venv:

python3 -m venv omnidocs
source omnidocs/bin/activate  # For Linux/macOS
.\omnidocs\Scripts\activate   # For Windows

Using poetry:

poetry new omnidocs
cd omnidocs
poetry install

Installing PyTorch

To install PyTorch, choose one of the following options based on whether you want GPU support:

With GPU support (CUDA 12.1):

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

Without GPU support:

pip install torch torchvision torchaudio

Installing OmniDocs

Once your environment is set up and PyTorch is installed, you can install OmniDocs:

From PyPI:
```
pip install omnidocs
```
From source: If you prefer to install directly from the source, you can use the following command:
```
pip install -e .
```

🛠️ Getting Started

Here's a quick example to demonstrate the power of OmniDocs:

tutorial coming soon

📚 Supported Models and Libraries

OmniDocs integrates seamlessly with a variety of popular tools, including:

✅ : working and tested
⏳ : planned/in-progress support
❌ : no support

Layout Analysis

Detection Model	Source	License	CPU	GPU	Info
✅DocLayout YOLO	GitHub - DocLayout-YOLO	AGPL-3.0	⏳	✅	A robust layout detection model based on YOLO-v10, designed for diverse document types.
✅PPStructure (Paddle OCR)	GitHub - PaddleOCR	Apache 2.0	✅	✅	An OCR tool that supports multiple languages and provides layout detection capabilities.
✅RT DETR (Docling)	GitHub - RT-DETR	MIT	⏳	✅	Implementation of RT-DETR, a real-time detection transformer focusing on object detection tasks.
✅Florence-2-DocLayNet-Fixed	Hugging Face - Florence-2-DocLayNet-Fixed	MIT	❌	✅	Fine-tuned model for document layout analysis, improving bounding box accuracy in document images.
✅Surya Layout	GitHub - Surya	GPL-3.0-or-later	✅	✅	OCR and layout analysis tool supporting 90+ languages, including reading order and table recognition.
⏳Layout LM V3	Hugging Face - LayoutLMv3	CC BY-NC-SA 4.0	⏳	⏳	A pre-trained multimodal Transformer for Document AI, effective for various document understanding tasks.
⏳Fast / Faster R CNN / MR CNN	GitHub - Faster R-CNN	MIT	⏳	⏳	A library implementing the Faster R-CNN architecture for object detection, widely used in layout tasks.

Text Extraction

Extraction Libraries	Source	License	CPU	GPU	Info
PyPDF2	GitHub - PyPDF2	MIT	✅	✅	A library for extracting text from PDFs.
PyMuPDF	GitHub - PyMuPDF	MIT	✅	✅	A library for extracting text from PDFs.
pdfplumber	GitHub - pdfplumber	MIT	✅	✅	A library for extracting text from PDFs.
Docling Parse	GitHub - Docling	MIT	⏳	⏳	A library for extracting text from PDFs.

OCR

OCR Library	Source	License	CPU	GPU	Info
Paddle OCR	GitHub - PaddleOCR	Apache 2.0	✅	✅	An OCR tool that supports multiple languages and provides layout detection capabilities.
Tesseract	GitHub - Tesseract	BSD-3-Clause	✅	✅	An open-source OCR engine that supports multiple languages and is widely used for text extraction from images.
EasyOCR	GitHub - EasyOCR	MIT	✅	✅	A simple and easy-to-use OCR library that supports multiple languages and is built on PyTorch.

Table Extraction

Extraction Libraries/Models	Source	License	CPU	GPU	Info
PPStructure (Paddle OCR)	GitHub - PaddleOCR	Apache 2.0	✅	✅	An OCR tool that supports multiple languages and provides layout detection capabilities.
Camelot	GitHub - Camelot	MIT	✅	✅	A Python library for extracting tables from PDFs.
Tabula	GitHub - Tabula	MIT	✅	✅	A tool for extracting tables from PDFs.
Table Transformer	GitHub - Table Transformer	MIT	⏳	⏳	A transformer model for table extraction.
TableFormer (Docling)	GitHub - Docling	MIT	⏳	⏳	A transformer model for table extraction.

🏗️ How It Works

OmniDocs organizes document processing tasks into modular components. Each component corresponds to a specific task and offers:

A Unified Interface: Consistent input and output formats.
Model Independence: Switch between libraries or models effortlessly.
Pipeline Flexibility: Combine components to create custom workflows.

📈 Roadmap

Add support for semantic understanding tasks (e.g., entity extraction).
Integrate pre-trained transformer models for context-aware document analysis.
Expand pipelines for multilingual document processing.
Add CLI support for batch processing.

🤝 Contributing

We welcome contributions to OmniDocs! Here's how you can help:

Fork the repository.
Create a new branch for your feature or bug fix.
Commit your changes and open a pull request.

For more details, refer to our CONTRIBUTING.md.

🛡️ License

This project is licensed under multiple licenses, depending on the models and libraries you use in your pipeline. Please refer to the individual licenses of each component for specific terms and conditions.

🌟 Support the Project

If you find OmniDocs helpful, please give us a ⭐ on GitHub and share it with others in the community.

🗨️ Join the Community

For discussions, questions, or feedback:

Issues: Report bugs or suggest features here.
Email: Reach out at adithyaskolavi@gmail.com