Getting Started with OmniDocs
Welcome to OmniDocs! This guide will help you get up and running with powerful document AI extraction in just a few steps.
π What is OmniDocs?
OmniDocs is a unified Python library for extracting tables, text, math, and OCR data from PDFs and images using state-of-the-art models and classic toolsβall with a simple, consistent API.
π οΈ Installation
Choose your preferred method:
- PyPI (Recommended):
- uv pip (Fastest):
- From Source:
- Conda (if available):
ποΈ Setting Up Your Environment
It's best to use a virtual environment:
π Quick Example
Extract tables from a PDF in just a few lines:
from omnidocs.tasks.table_extraction.extractors.camelot import CamelotExtractor
extractor = CamelotExtractor()
results = extractor.extract("sample.pdf")
print(results.tables[0].df) # Print first table as DataFrame
π Explore Tutorials
π§βπ» Need Help?
- See the API Reference
- Open an issue on GitHub
Happy Document AI-ing! π