Testing Guide¶

This guide covers how to run tests for OmniDocs across different platforms and backends.

Test Architecture¶

OmniDocs uses a multi-tier testing approach:

Tier	Platform	Tests	Command
Local CPU	Any machine	CPU-based extractors	`uv run python -m tests.runners.local_runner --cpu-only`
Local MLX	Apple Silicon	MLX extractors	`uv run python -m tests.runners.local_runner --mlx`
Modal GPU	Cloud (L4/A10G)	VLLM, PyTorch GPU	`modal run scripts/.../modal_runner.py`
pytest	Any	Unit/integration	`uv run pytest`

Directory Structure¶

Omnidocs/tests/
├── fixtures/
│   └── images/              # Test images
│       └── test_simple.png
├── standalone/              # Standalone test scripts
│   ├── text_extraction/
│   │   ├── qwen_vllm.py
│   │   ├── qwen_pytorch.py
│   │   ├── qwen_mlx.py
│   │   ├── nanonets_vllm.py
│   │   └── ...
│   ├── layout_extraction/
│   │   ├── doclayout_yolo_cpu.py
│   │   ├── doclayout_yolo_gpu.py
│   │   ├── rtdetr_cpu.py
│   │   └── ...
│   ├── ocr_extraction/
│   ├── table_extraction/
│   └── reading_order/
├── runners/
│   ├── local_runner.py      # Local test runner
│   ├── registry.py          # Test registry
│   └── report.py            # Result reporting
├── integration/             # pytest integration tests
└── utils/                   # Test utilities

Running Local Tests¶

Prerequisites¶

cd Omnidocs

# Install with test dependencies
uv sync --group dev

# For MLX tests (Apple Silicon only)
uv sync --group mlx

# For OCR tests
uv sync --group ocr

Using the Local Runner¶

The local runner executes standalone test scripts on your machine.

# Basic usage - run all CPU tests
uv run python -m tests.runners.local_runner \
    --image tests/fixtures/images/test_simple.png \
    --cpu-only

# Run all MLX tests (Apple Silicon)
uv run python -m tests.runners.local_runner \
    --image tests/fixtures/images/test_simple.png \
    --mlx

# Filter by task
uv run python -m tests.runners.local_runner \
    --image tests/fixtures/images/test_simple.png \
    --task layout_extraction \
    --cpu-only

# Run specific test
uv run python -m tests.runners.local_runner \
    --image tests/fixtures/images/test_simple.png \
    --test doclayout_yolo_cpu

Local Runner Options¶

Option	Description	Example
`--image`	Path to test image (required)	`tests/fixtures/images/test_simple.png`
`--cpu-only`	Run only CPU tests
`--mlx`	Run only MLX tests
`--task`	Filter by task type	`text_extraction`, `layout_extraction`
`--test`	Run specific test	`doclayout_yolo_cpu`
`--output`	Output JSON file	`results.json`

Example Output¶

Running 4 tests
Image: tests/fixtures/images/test_simple.png
---------------------------------------------------------------------------
  Running qwen_layout_mlx... [PASS] (3.78s)
  Running qwen_layout_api... [FAIL] (0.00s)
    Error: api_key required
  Running doclayout_yolo_cpu... [PASS] (0.46s)
  Running rtdetr_cpu... [PASS] (0.90s)

===========================================================================
SUMMARY: 3 passed, 1 failed (13.2s)
===========================================================================

GPU tests run on Modal cloud infrastructure with NVIDIA GPUs.

Prerequisites¶

Install Modal CLI:
```
pip install modal
```
Authenticate:
```
modal setup
```

Create HuggingFace secret (for model downloads):

modal secret create adithya-hf-wandb HF_TOKEN=your_hf_token

cd /path/to/omnidocs_Master

# List all available tests
modal run scripts/text_extract_omnidocs/modal_runner.py --list-tests

# Run a specific test
modal run scripts/text_extract_omnidocs/modal_runner.py --test qwen_vllm

# Run all tests
modal run scripts/text_extract_omnidocs/modal_runner.py --run-all

Text Extraction¶

Test	Backend	GPU	Model
`qwen_vllm`	VLLM	L4	Qwen3-VL-4B
`qwen_pytorch`	PyTorch	L4	Qwen3-VL-4B
`nanonets_vllm`	VLLM	L4	Nanonets-OCR-s
`nanonets_pytorch`	PyTorch	L4	Nanonets-OCR-s
`dotsocr_vllm`	VLLM	L4	dots.ocr
`dotsocr_pytorch`	PyTorch	L4	dots.ocr

Layout Extraction¶

Test	Backend	GPU	Model
`qwen_layout_vllm`	VLLM	L4	Qwen3-VL-4B
`qwen_layout_pytorch`	PyTorch	L4	Qwen3-VL-4B
`doclayout_yolo_gpu`	PyTorch	L4	DocLayoutYOLO
`rtdetr_gpu`	PyTorch	L4	RTDETR

Example Output¶

Running test: doclayout_yolo_gpu
============================================================
Testing DocLayoutYOLO with GPU
============================================================
Model load time: 8.34s
Inference time: 2.27s

--- Detected Layout Elements ---
Number of boxes: 8
  1. LayoutLabel.TITLE: conf=0.32
  2. LayoutLabel.TEXT: conf=0.72
  ...

============================================================
TEST RESULTS
============================================================
  status: success
  test: doclayout_yolo_gpu
  backend: pytorch_gpu
  model: DocLayoutYOLO
  num_boxes: 8
  load_time: 8.34
  inference_time: 2.27

Running pytest¶

For unit tests and integration tests:

cd Omnidocs

# Run all tests
uv run pytest

# Run with specific markers
uv run pytest -m "cpu"              # CPU-only tests
uv run pytest -m "not slow"         # Skip slow tests
uv run pytest -m "layout_extraction" # Layout tests only

# Run specific test file
uv run pytest tests/integration/test_layout_extractors.py -v

# Run with coverage
uv run pytest --cov=omnidocs

pytest Markers¶

Defined in pyproject.toml:

Marker	Description
`slow`	Long-running tests (network, large files)
`gpu`	Requires GPU
`cpu`	CPU-only tests
`vllm`	VLLM backend tests
`pytorch`	PyTorch backend tests
`mlx`	MLX backend tests (Apple Silicon)
`api`	API backend tests
`text_extraction`	Text extraction task
`layout_extraction`	Layout extraction task
`ocr_extraction`	OCR extraction task
`table_extraction`	Table extraction task
`reading_order`	Reading order task
`integration`	Integration tests requiring model inference

Writing Tests¶

Standalone Test Template¶

Create a new test in tests/standalone/<task>/<model>_<backend>.py:

"""
Model Name - Backend

Usage:
    python -m tests.standalone.<task>.<model>_<backend> path/to/image.png
"""
import sys
import time
from pathlib import Path
from PIL import Image


def run_extraction(img: Image.Image) -> dict:
    """Run extraction and return results."""
    from omnidocs.tasks.<task> import MyExtractor, MyConfig

    start = time.time()
    extractor = MyExtractor(config=MyConfig(device="cpu"))
    load_time = time.time() - start

    start = time.time()
    result = extractor.extract(img)
    inference_time = time.time() - start

    return {
        "result": result,
        "load_time": load_time,
        "inference_time": inference_time,
    }


if __name__ == "__main__":
    if len(sys.argv) < 2:
        print("Usage: python -m tests.standalone.<task>.<model>_<backend> <image_path>")
        sys.exit(1)

    img = Image.open(sys.argv[1])
    result = run_extraction(img)
    print(f"Load time: {result['load_time']:.2f}s")
    print(f"Inference time: {result['inference_time']:.2f}s")

Register the Test¶

Add to tests/runners/registry.py:

from .registry import TestSpec, Backend, Task

# Add to TEST_REGISTRY list
TestSpec(
    name="mymodel_cpu",
    module="<task>.mymodel_cpu",
    backend=Backend.PYTORCH_CPU,
    task=Task.<TASK>,
    gpu_type=None,  # None for CPU tests
),

Troubleshooting¶

VLLM Multiprocessing Error¶

If you see Cannot re-initialize CUDA in forked subprocess:

import os
os.environ["VLLM_USE_V1"] = "0"
os.environ["VLLM_WORKER_MULTIPROC_METHOD"] = "spawn"

Flash Attention Version Mismatch¶

Use attn_implementation="sdpa" instead of flash_attention_2:

config = MyConfig(attn_implementation="sdpa")

MLX Tests Fail on Non-Apple Hardware¶

MLX only works on Apple Silicon. Skip with:

uv run python -m tests.runners.local_runner --cpu-only  # Exclude MLX

API Tests Need Credentials¶

API tests require environment variables:

export OPENROUTER_API_KEY=your_key
uv run python -m tests.runners.local_runner --test qwen_layout_api

Test Results Reference¶

Model	Backend	Load Time	Inference Time
Qwen3-VL-4B	VLLM	84s	7.0s
Qwen3-VL-4B	PyTorch	54s	6.2s
Nanonets-OCR-s	VLLM	194s	8.4s
Nanonets-OCR-s	PyTorch	44s	6.3s
DotsOCR	VLLM	94s	10.0s
DotsOCR	PyTorch	42s	11.4s

Layout Extraction Performance¶

Model	Backend	Load Time	Inference Time
Qwen Layout	VLLM (L4)	237s	27.2s
Qwen Layout	PyTorch (L4)	54s	13.3s
Qwen Layout	MLX (local)	8.8s	13.7s
DocLayoutYOLO	GPU (L4)	8.3s	2.3s
DocLayoutYOLO	CPU (local)	0.5s	0.3s
RTDETR	GPU (L4)	12.4s	1.9s
RTDETR	CPU (local)	0.9s	0.5s

Table Extraction Performance¶

Model	Backend	Load Time	Inference Time
TableFormer (fast)	CPU (local)	0.5s	0.3s
TableFormer (accurate)	CPU (local)	0.5s	0.9s
TableFormer (fast)	GPU (L4)	8s	0.2s
TableFormer (accurate)	GPU (L4)	8s	0.5s

Reading Order Performance¶

Model	Backend	Load Time	Inference Time
Rule-based	CPU (local)	<0.1s	<0.1s

Testing Guide¶

Test Architecture¶

Directory Structure¶

Running Local Tests¶

Prerequisites¶

Using the Local Runner¶

Local Runner Options¶

Example Output¶

Running GPU Tests on Modal¶

Prerequisites¶

Using the Modal Runner¶

Available Modal Tests¶

Text Extraction¶

Layout Extraction¶

Example Output¶

Running pytest¶

pytest Markers¶

Writing Tests¶

Standalone Test Template¶

Register the Test¶

Troubleshooting¶

VLLM Multiprocessing Error¶

Flash Attention Version Mismatch¶

MLX Tests Fail on Non-Apple Hardware¶

API Tests Need Credentials¶

Test Results Reference¶

Text Extraction Performance (Modal L4 GPU)¶

Layout Extraction Performance¶

Table Extraction Performance¶

Reading Order Performance¶