Skip to content

Contributing to OmniDocs

Thank you for your interest in contributing to OmniDocs! 🎉

Development Setup

  1. Clone the repository:

    git clone https://github.com/adithya-s-k/OmniDocs.git
    cd OmniDocs/Omnidocs
    

  2. Install dependencies with uv:

    # Install uv if you don't have it
    curl -LsSf https://astral.sh/uv/install.sh | sh
    
    # Sync dependencies
    uv sync
    

  3. Run tests:

    uv run pytest tests/ -v
    

Project Structure

Omnidocs/
├── omnidocs/          # Main package
│   ├── document.py    # Document loading (✅ complete)
│   ├── tasks/         # Task extractors (🚧 in progress)
│   ├── inference/     # Backend implementations (planned)
│   └── utils/         # Utilities
├── tests/             # Test suite
│   ├── fixtures/      # Test data (PDFs, images)
│   └── tasks/         # Future task tests
└── docs/              # Documentation

Design Documents

🔴 IMPORTANT: Before implementing any new features, read the design documents: - docs/architecture.md - Backend and config system - docs/developer-guide.md - API design and usage patterns

These documents define the architecture for v0.2+.

Development Workflow

1. Testing Phase (modal_scripts/)

  • Test models in isolation using Modal scripts
  • Validate inference and outputs
  • Benchmark performance

2. Integration Phase (omnidocs/)

  • Follow the config pattern (single-backend vs multi-backend)
  • Use Pydantic for all configs and outputs
  • Maintain consistent .extract() API
  • Add comprehensive tests

3. Documentation

  • Add docstrings (Google style)
  • Update relevant docs
  • Add usage examples

Code Standards

✅ Required

  • Type hints for all public APIs
  • Pydantic models for configs (extra="forbid")
  • Docstrings (Google style) for classes and methods
  • Tests with >80% coverage

❌ Avoid

  • String-based factories (use class imports)
  • Storing task results in Document
  • Breaking changes without deprecation
  • Adding features beyond requirements
  • Over-engineering

Version Management

OmniDocs follows Semantic Versioning: - MAJOR (X.0.0): Breaking API changes - MINOR (0.X.0): New features, backward compatible - PATCH (0.0.X): Bug fixes, backward compatible

Incrementing Version

Version is managed in one place: omnidocs/_version.py

# omnidocs/_version.py
__version__ = "0.2.0"

The pyproject.toml reads from this file dynamically, so you only need to update one file.

Version Bump Process

  1. Update version:

    # Edit omnidocs/_version.py
    # Change __version__ = "0.2.0" to __version__ = "0.2.1"
    

  2. Verify (version should be accessible everywhere):

    from omnidocs import __version__
    print(__version__)  # Should show new version
    

  3. Commit:

    git add omnidocs/_version.py
    git commit -m "chore: bump version to 0.2.1"
    

  4. Tag and release (maintainers only):

    git tag v0.2.1
    git push origin v0.2.1
    

This will trigger the GitHub Actions workflow to: - Build the package - Create a GitHub release - Publish to PyPI

When to Bump

  • Patch (0.2.X): Bug fixes, typos, documentation
  • Minor (0.X.0): New features, new task extractors, new models
  • Major (X.0.0): Breaking API changes (rare, requires discussion)

Documentation

OmniDocs uses MkDocs Material for documentation.

Building Docs Locally

# Install docs dependencies
uv sync --group docs

# Serve docs with live reload
uv run mkdocs serve

# Open http://127.0.0.1:8000 in your browser

Building Static Site

# Build static site to site/ directory
uv run mkdocs build

# Build with strict mode (warnings become errors)
uv run mkdocs build --strict

Documentation Structure

docs/
├── README.md              # Homepage (also serves as package README)
├── architecture.md        # Backend and config system design
└── developer-guide.md     # API design and usage patterns

Automatic Deployment

Documentation is automatically built and deployed to GitHub Pages when: - Code is pushed to the master branch - A maintainer manually triggers the workflow

The docs are published at: https://adithya-s-k.github.io/OmniDocs/

Adding New Documentation

  1. Create markdown files in docs/
  2. Update mkdocs.yml nav section to include new pages
  3. Use Google-style docstrings in code (automatically extracted)
  4. Preview changes locally with uv run mkdocs serve

Pull Request Process

  1. Create a branch:

    git checkout -b feature/your-feature-name
    

  2. Make changes:

  3. Follow the design patterns in docs/
  4. Add tests for new functionality
  5. Update relevant documentation

  6. Run tests:

    # Run all tests
    uv run pytest tests/ -v
    
    # Run fast tests only
    uv run pytest tests/ -v -m "not slow"
    

  7. Submit PR:

  8. Provide clear description
  9. Reference any related issues
  10. Ensure tests pass

Commit Guidelines

Follow conventional commits:

feat: add DocLayoutYOLO extractor
fix: resolve page range validation
docs: update architecture guide
test: add fixture-based PDF tests

Need Help?

License

By contributing, you agree that your contributions will be licensed under the Apache 2.0 License.