Base¶
Base class for table extractors.
Defines the abstract interface that all table extractors must implement.
BaseTableExtractor
¶
Bases: ABC
Abstract base class for table structure extractors.
Table extractors analyze table images to detect cell structure, identify headers, and extract text content.
Example
extract
abstractmethod
¶
extract(
image: Union[Image, ndarray, str, Path],
ocr_output: Optional[OCROutput] = None,
) -> TableOutput
Extract table structure from an image.
| PARAMETER | DESCRIPTION |
|---|---|
image
|
Table image (should be cropped to table region)
TYPE:
|
ocr_output
|
Optional OCR results for cell text matching. If not provided, model will attempt to extract text.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
TableOutput
|
TableOutput with cells, structure, and export methods |
Example
Source code in omnidocs/tasks/table_extraction/base.py
batch_extract
¶
batch_extract(
images: List[Union[Image, ndarray, str, Path]],
ocr_outputs: Optional[List[OCROutput]] = None,
progress_callback: Optional[
Callable[[int, int], None]
] = None,
) -> List[TableOutput]
Extract tables from multiple images.
Default implementation loops over extract(). Subclasses can override for optimized batching.
| PARAMETER | DESCRIPTION |
|---|---|
images
|
List of table images
TYPE:
|
ocr_outputs
|
Optional list of OCR results (same length as images)
TYPE:
|
progress_callback
|
Optional function(current, total) for progress
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
List[TableOutput]
|
List of TableOutput in same order as input |
Examples:
Source code in omnidocs/tasks/table_extraction/base.py
extract_document
¶
extract_document(
document: Document,
table_bboxes: Optional[List[List[float]]] = None,
progress_callback: Optional[
Callable[[int, int], None]
] = None,
) -> List[TableOutput]
Extract tables from all pages of a document.
| PARAMETER | DESCRIPTION |
|---|---|
document
|
Document instance
TYPE:
|
table_bboxes
|
Optional list of table bounding boxes per page. Each element should be a list of [x1, y1, x2, y2] coords.
TYPE:
|
progress_callback
|
Optional function(current, total) for progress
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
List[TableOutput]
|
List of TableOutput, one per detected table |
Examples: