Overview¶

TableFormer module for table structure extraction.

Provides the TableFormer-based table structure extractor.

TableFormerConfig ¶

Bases: BaseModel

Configuration for TableFormer table structure extractor.

TableFormer is a transformer-based model that predicts table structure using OTSL (Optimal Table Structure Language) tags and cell bounding boxes.

ATTRIBUTE	DESCRIPTION
`mode`	Inference mode - "fast" or "accurate" TYPE: `TableFormerMode`
`device`	Device for inference - "cpu", "cuda", "mps", or "auto" TYPE: `Literal['cpu', 'cuda', 'mps', 'auto']`
`num_threads`	Number of CPU threads for inference TYPE: `int`
`do_cell_matching`	Whether to match predicted cells with OCR text cells TYPE: `bool`
`artifacts_path`	Path to pre-downloaded model artifacts TYPE: `Optional[str]`
`repo_id`	HuggingFace model repository TYPE: `str`
`revision`	Model revision/tag TYPE: `str`

Example

from omnidocs.tasks.table_extraction import TableFormerExtractor, TableFormerConfig

# Fast mode
extractor = TableFormerExtractor(config=TableFormerConfig(mode="fast"))

# Accurate mode with GPU
extractor = TableFormerExtractor(
    config=TableFormerConfig(
        mode="accurate",
        device="cuda",
        do_cell_matching=True,
    )
)

TableFormerMode ¶

Bases: str, Enum

TableFormer inference mode.

TableFormerExtractor ¶

TableFormerExtractor(config: TableFormerConfig)

Bases: BaseTableExtractor

Table structure extractor using TableFormer model.

TableFormer is a transformer-based model that predicts table structure using OTSL (Optimal Table Structure Language) tags. It can detect: - Cell boundaries (bounding boxes) - Row and column spans - Header cells (column and row headers) - Section rows

Example

from omnidocs.tasks.table_extraction import TableFormerExtractor, TableFormerConfig

# Initialize extractor
extractor = TableFormerExtractor(
    config=TableFormerConfig(mode="fast", device="cuda")
)

# Extract table structure
result = extractor.extract(table_image)

# Get HTML output
html = result.to_html()

# Get DataFrame
df = result.to_dataframe()

Initialize TableFormer extractor.

PARAMETER	DESCRIPTION
`config`	TableFormerConfig with model settings TYPE: `TableFormerConfig`

Source code in omnidocs/tasks/table_extraction/tableformer/pytorch.py

def __init__(self, config: TableFormerConfig):
    """
    Initialize TableFormer extractor.

    Args:
        config: TableFormerConfig with model settings
    """
    self.config = config
    self._device = _resolve_device(config.device)
    self._predictor = None
    self._model_config: Optional[Dict] = None
    self._load_model()

extract ¶

extract(
    image: Union[Image, ndarray, str, Path],
    ocr_output: Optional[OCROutput] = None,
) -> TableOutput

Extract table structure from an image.

PARAMETER	DESCRIPTION
`image`	Table image (should be cropped to table region) TYPE: `Union[Image, ndarray, str, Path]`
`ocr_output`	Optional OCR results for cell text matching TYPE: `Optional[OCROutput]` DEFAULT: `None`

RETURNS	DESCRIPTION
`TableOutput`	TableOutput with cells, structure, and export methods

Example

result = extractor.extract(table_image)
print(f"Table: {result.num_rows}x{result.num_cols}")
html = result.to_html()

Source code in omnidocs/tasks/table_extraction/tableformer/pytorch.py

def extract(
    self,
    image: Union[Image.Image, np.ndarray, str, Path],
    ocr_output: Optional["OCROutput"] = None,
) -> TableOutput:
    """
    Extract table structure from an image.

    Args:
        image: Table image (should be cropped to table region)
        ocr_output: Optional OCR results for cell text matching

    Returns:
        TableOutput with cells, structure, and export methods

    Example:
        ```python
        result = extractor.extract(table_image)
        print(f"Table: {result.num_rows}x{result.num_cols}")
        html = result.to_html()
        ```
    """
    # Prepare image
    pil_image = self._prepare_image(image)
    width, height = pil_image.size

    # Convert to OpenCV format (required by TFPredictor)
    try:
        import cv2
    except ImportError:
        raise ImportError(
            "opencv-python is required for TableFormerExtractor. Install with: pip install opencv-python-headless"
        )

    cv_image = cv2.cvtColor(np.array(pil_image), cv2.COLOR_RGB2BGR)

    # Build iOCR page data
    tokens = self._build_tokens_from_ocr(ocr_output) if ocr_output else []
    iocr_page = {
        "width": width,
        "height": height,
        "image": cv_image,
        "tokens": tokens,
    }

    # Table bbox is the entire image
    table_bbox = [0, 0, width, height]

    # Run prediction
    results = self._predictor.multi_table_predict(
        iocr_page=iocr_page,
        table_bboxes=[table_bbox],
        do_matching=self.config.do_cell_matching,
        correct_overlapping_cells=self.config.correct_overlapping_cells,
        sort_row_col_indexes=self.config.sort_row_col_indexes,
    )

    # Convert results to TableOutput
    return self._convert_results(results, width, height)

config ¶

Configuration for TableFormer table structure extractor.

TableFormer uses a dual-decoder transformer architecture with OTSL+ support for recognizing table structure from images.

Example

from omnidocs.tasks.table_extraction import TableFormerExtractor, TableFormerConfig

# Fast mode (default)
extractor = TableFormerExtractor(config=TableFormerConfig())

# Accurate mode with GPU
extractor = TableFormerExtractor(
    config=TableFormerConfig(
        mode="accurate",
        device="cuda",
        do_cell_matching=True,
    )
)

TableFormerMode ¶

Bases: str, Enum

TableFormer inference mode.

TableFormerConfig ¶

Bases: BaseModel

Configuration for TableFormer table structure extractor.

TableFormer is a transformer-based model that predicts table structure using OTSL (Optimal Table Structure Language) tags and cell bounding boxes.

ATTRIBUTE	DESCRIPTION
`mode`	Inference mode - "fast" or "accurate" TYPE: `TableFormerMode`
`device`	Device for inference - "cpu", "cuda", "mps", or "auto" TYPE: `Literal['cpu', 'cuda', 'mps', 'auto']`
`num_threads`	Number of CPU threads for inference TYPE: `int`
`do_cell_matching`	Whether to match predicted cells with OCR text cells TYPE: `bool`
`artifacts_path`	Path to pre-downloaded model artifacts TYPE: `Optional[str]`
`repo_id`	HuggingFace model repository TYPE: `str`
`revision`	Model revision/tag TYPE: `str`

Example

from omnidocs.tasks.table_extraction import TableFormerExtractor, TableFormerConfig

# Fast mode
extractor = TableFormerExtractor(config=TableFormerConfig(mode="fast"))

# Accurate mode with GPU
extractor = TableFormerExtractor(
    config=TableFormerConfig(
        mode="accurate",
        device="cuda",
        do_cell_matching=True,
    )
)

pytorch ¶

TableFormer extractor implementation using PyTorch backend.

Uses the TFPredictor from docling-ibm-models for table structure recognition.

TableFormerExtractor ¶

TableFormerExtractor(config: TableFormerConfig)

Bases: BaseTableExtractor

Table structure extractor using TableFormer model.

TableFormer is a transformer-based model that predicts table structure using OTSL (Optimal Table Structure Language) tags. It can detect: - Cell boundaries (bounding boxes) - Row and column spans - Header cells (column and row headers) - Section rows

Example

from omnidocs.tasks.table_extraction import TableFormerExtractor, TableFormerConfig

# Initialize extractor
extractor = TableFormerExtractor(
    config=TableFormerConfig(mode="fast", device="cuda")
)

# Extract table structure
result = extractor.extract(table_image)

# Get HTML output
html = result.to_html()

# Get DataFrame
df = result.to_dataframe()

Initialize TableFormer extractor.

PARAMETER	DESCRIPTION
`config`	TableFormerConfig with model settings TYPE: `TableFormerConfig`

Source code in omnidocs/tasks/table_extraction/tableformer/pytorch.py

def __init__(self, config: TableFormerConfig):
    """
    Initialize TableFormer extractor.

    Args:
        config: TableFormerConfig with model settings
    """
    self.config = config
    self._device = _resolve_device(config.device)
    self._predictor = None
    self._model_config: Optional[Dict] = None
    self._load_model()

extract ¶

extract(
    image: Union[Image, ndarray, str, Path],
    ocr_output: Optional[OCROutput] = None,
) -> TableOutput

Extract table structure from an image.

PARAMETER	DESCRIPTION
`image`	Table image (should be cropped to table region) TYPE: `Union[Image, ndarray, str, Path]`
`ocr_output`	Optional OCR results for cell text matching TYPE: `Optional[OCROutput]` DEFAULT: `None`

RETURNS	DESCRIPTION
`TableOutput`	TableOutput with cells, structure, and export methods

Example

result = extractor.extract(table_image)
print(f"Table: {result.num_rows}x{result.num_cols}")
html = result.to_html()

Source code in omnidocs/tasks/table_extraction/tableformer/pytorch.py

def extract(
    self,
    image: Union[Image.Image, np.ndarray, str, Path],
    ocr_output: Optional["OCROutput"] = None,
) -> TableOutput:
    """
    Extract table structure from an image.

    Args:
        image: Table image (should be cropped to table region)
        ocr_output: Optional OCR results for cell text matching

    Returns:
        TableOutput with cells, structure, and export methods

    Example:
        ```python
        result = extractor.extract(table_image)
        print(f"Table: {result.num_rows}x{result.num_cols}")
        html = result.to_html()
        ```
    """
    # Prepare image
    pil_image = self._prepare_image(image)
    width, height = pil_image.size

    # Convert to OpenCV format (required by TFPredictor)
    try:
        import cv2
    except ImportError:
        raise ImportError(
            "opencv-python is required for TableFormerExtractor. Install with: pip install opencv-python-headless"
        )

    cv_image = cv2.cvtColor(np.array(pil_image), cv2.COLOR_RGB2BGR)

    # Build iOCR page data
    tokens = self._build_tokens_from_ocr(ocr_output) if ocr_output else []
    iocr_page = {
        "width": width,
        "height": height,
        "image": cv_image,
        "tokens": tokens,
    }

    # Table bbox is the entire image
    table_bbox = [0, 0, width, height]

    # Run prediction
    results = self._predictor.multi_table_predict(
        iocr_page=iocr_page,
        table_bboxes=[table_bbox],
        do_matching=self.config.do_cell_matching,
        correct_overlapping_cells=self.config.correct_overlapping_cells,
        sort_row_col_indexes=self.config.sort_row_col_indexes,
    )

    # Convert results to TableOutput
    return self._convert_results(results, width, height)