Skip to content

Getting Started with OmniDocs

Welcome to OmniDocs! This guide will help you get up and running with powerful document AI extraction in just a few steps.


πŸš€ What is OmniDocs?

OmniDocs is a unified Python library for extracting tables, text, math, and OCR data from PDFs and images using state-of-the-art models and classic toolsβ€”all with a simple, consistent API.


πŸ› οΈ Installation

Choose your preferred method:

  • PyPI (Recommended):
    pip install omnidocs
    
  • uv pip (Fastest):
    uv pip install omnidocs
    
  • From Source:
    git clone https://github.com/adithya-s-k/OmniDocs.git
    cd OmniDocs
    pip install . 
    or 
    uv sync 
    
  • Conda (if available):
    conda install -c conda-forge omnidocs
    

πŸ—οΈ Setting Up Your Environment

It's best to use a virtual environment:

python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate


πŸ“„ Quick Example

Extract tables from a PDF in just a few lines:

from omnidocs.tasks.table_extraction.extractors.camelot import CamelotExtractor
extractor = CamelotExtractor()
results = extractor.extract("sample.pdf")
print(results.tables[0].df)  # Print first table as DataFrame


πŸ“š Explore Tutorials


πŸ§‘β€πŸ’» Need Help?


Happy Document AI-ing! πŸŽ‰