🖹 OCR (Optical Character Recognition)

This section documents the API for OCR tasks, providing various extractors to recognize and extract text from images and scanned documents.

Overview

OCR in OmniDocs enables the conversion of images (e.g., scanned documents, photos) into machine-readable text. It supports multiple engines, allowing you to choose the best balance of speed, accuracy, and language support for your needs.

Available Extractors

EasyOCRExtractor

A simple and easy-to-use OCR library that supports multiple languages and is built on PyTorch.

omnidocs.tasks.ocr_extraction.extractors.easy_ocr.EasyOCRExtractor

EasyOCRExtractor(device: Optional[str] = None, show_log: bool = False, languages: Optional[List[str]] = None, gpu: bool = True, **kwargs)

Bases: BaseOCRExtractor

EasyOCR based text extraction implementation.

Initialize EasyOCR Extractor.

extract

extract(input_path: Union[str, Path, Image], detail: int = 1, paragraph: bool = False, width_ths: float = 0.7, height_ths: float = 0.7, **kwargs) -> OCROutput

Extract text using EasyOCR.

Usage Example

from omnidocs.tasks.ocr_extraction.extractors.easy_ocr import EasyOCRExtractor

extractor = EasyOCRExtractor(languages=['en'])
result = extractor.extract("scanned_document.png")
print(f"Extracted text: {result.full_text[:200]}...")

PaddleOCRExtractor

An OCR tool that supports multiple languages and provides layout detection capabilities.

omnidocs.tasks.ocr_extraction.extractors.paddle.PaddleOCRExtractor

PaddleOCRExtractor(device: Optional[str] = None, show_log: bool = False, languages: Optional[List[str]] = None, use_angle_cls: bool = True, use_gpu: bool = True, drop_score: float = 0.5, model_path: Optional[str] = None, **kwargs)

Bases: BaseOCRExtractor

PaddleOCR based text extraction implementation.

Initialize PaddleOCR Extractor.

extract

extract(input_path: Union[str, Path, Image], **kwargs) -> OCROutput

Extract text using PaddleOCR.

Usage Example

from omnidocs.tasks.ocr_extraction.extractors.paddle import PaddleOCRExtractor

extractor = PaddleOCRExtractor(languages=['en'])
result = extractor.extract("scanned_document.png")
print(f"Extracted text: {result.full_text[:200]}...")

SuryaOCRExtractor

A modern, high-accuracy OCR engine, part of the Surya library, with strong support for Indian languages.

omnidocs.tasks.ocr_extraction.extractors.surya_ocr.SuryaOCRExtractor

SuryaOCRExtractor(device: Optional[str] = None, show_log: bool = False, languages: Optional[List[str]] = None, **kwargs)

Bases: BaseOCRExtractor

Surya OCR based text extraction implementation.

Initialize Surya OCR Extractor.

extract

extract(input_path: Union[str, Path, Image], **kwargs) -> OCROutput

Extract text using Surya OCR.

Usage Example

from omnidocs.tasks.ocr_extraction.extractors.surya_ocr import SuryaOCRExtractor

extractor = SuryaOCRExtractor(languages=['en'])
result = extractor.extract("scanned_document.png")
print(f"Extracted text: {result.full_text[:200]}...")

TesseractOCRExtractor

An open-source OCR engine that supports multiple languages and is widely used for text extraction from images.

omnidocs.tasks.ocr_extraction.extractors.tesseract_ocr.TesseractOCRExtractor

TesseractOCRExtractor(device: Optional[str] = None, show_log: bool = False, languages: Optional[List[str]] = None, psm: int = 6, oem: int = 3, config: str = '', **kwargs)

Bases: BaseOCRExtractor

Tesseract OCR based text extraction implementation.

Initialize Tesseract OCR Extractor.

extract

extract(input_path: Union[str, Path, Image], **kwargs) -> OCROutput

Extract text using Tesseract OCR.

Usage Example

from omnidocs.tasks.ocr_extraction.extractors.tesseract_ocr import TesseractOCRExtractor

extractor = TesseractOCRExtractor(languages=['eng']) # Tesseract uses 'eng' for English
result = extractor.extract("scanned_document.png")
print(f"Extracted text: {result.full_text[:200]}...")

OCROutput

The standardized output format for OCR results.

omnidocs.tasks.ocr_extraction.base.OCROutput

Bases: BaseModel

Container for OCR extraction results.

Attributes:

Name	Type	Description
`texts`	`List[OCRText]`	List of detected text objects
`full_text`	`str`	Combined text from all detections
`source_img_size`	`Optional[Tuple[int, int]]`	Original image dimensions (width, height)
`processing_time`	`Optional[float]`	Time taken for OCR processing
`metadata`	`Optional[Dict[str, Any]]`	Additional metadata from the OCR engine

get_sorted_by_reading_order

get_sorted_by_reading_order() -> List[OCRText]

Get texts sorted by reading order (top-to-bottom, left-to-right if no reading_order).

get_text_by_confidence

get_text_by_confidence(min_confidence: float = 0.5) -> List[OCRText]

Filter texts by minimum confidence threshold.

save_json

save_json(output_path: Union[str, Path]) -> None

Save output to JSON file.

to_dict

to_dict() -> Dict

Convert to dictionary representation.

Key Properties

texts (List[OCRText]): List of individual text regions detected.
full_text (str): The combined text from all detected regions.
source_img_size (Tuple[int, int]): Dimensions of the source image.

Key Methods

save_json(output_path): Save results to a JSON file.
visualize(image_path, output_path): Visualize OCR results with bounding boxes on the source image.
get_text_by_confidence(min_confidence): Filter text regions by confidence score.
get_sorted_by_reading_order(): Sort text regions by reading order.

OCRText

Represents a single text region detected by OCR.

omnidocs.tasks.ocr_extraction.base.OCRText

Bases: BaseModel

Container for individual OCR text detection.

Attributes:

Name	Type	Description
`text`	`str`	Extracted text content
`confidence`	`Optional[float]`	Confidence score for the text detection
`bbox`	`Optional[List[float]]`	Bounding box coordinates [x1, y1, x2, y2]
`polygon`	`Optional[List[List[float]]]`	Optional polygon coordinates for irregular text regions
`language`	`Optional[str]`	Detected language code (e.g., 'en', 'zh', 'fr')
`reading_order`	`Optional[int]`	Optional reading order index for text sequencing

to_dict

to_dict() -> Dict

Convert to dictionary representation.

Attributes

text (str): The recognized text content.
confidence (float): Confidence score of the recognition (0.0-1.0).
bbox (List[float]): Bounding box coordinates [x1, y1, x2, y2].
polygon (List[List[float]]): Precise polygon coordinates of the text region.
language (Optional[str]): Detected language code.
reading_order (int): Reading order index of the text region.

BaseOCRExtractor

The abstract base class for all OCR extractors.

omnidocs.tasks.ocr_extraction.base.BaseOCRExtractor

BaseOCRExtractor(device: Optional[str] = None, show_log: bool = False, languages: Optional[List[str]] = None, engine_name: Optional[str] = None)

Bases: ABC

Base class for OCR text extraction models.

Initialize the OCR extractor.

Parameters:

Name	Type	Description	Default
`device`	`Optional[str]`	Device to run model on ('cuda' or 'cpu')	`None`
`show_log`	`bool`	Whether to show detailed logs	`False`
`languages`	`Optional[List[str]]`	List of language codes to support (e.g., ['en', 'zh'])	`None`
`engine_name`	`Optional[str]`	Name of the OCR engine for language mapping	`None`

extract `abstractmethod`

extract(input_path: Union[str, Path, Image], **kwargs) -> OCROutput

Extract text from input image.

Parameters:

Name	Type	Description	Default
`input_path`	`Union[str, Path, Image]`	Path to input image or image data	required
`**kwargs`		Additional model-specific parameters	`{}`

Returns:

Type	Description
`OCROutput`	OCROutput containing extracted text

extract_all

extract_all(input_paths: List[Union[str, Path, Image]], **kwargs) -> List[OCROutput]

Extract text from multiple images.

Parameters:

Name	Type	Description	Default
`input_paths`	`List[Union[str, Path, Image]]`	List of image paths or image data	required
`**kwargs`		Additional model-specific parameters	`{}`

Returns:

Type	Description
`List[OCROutput]`	List of OCROutput objects

extract_with_layout

extract_with_layout(input_path: Union[str, Path, Image], layout_regions: Optional[List[Dict]] = None, **kwargs) -> OCROutput

Extract text with optional layout information.

Parameters:

Name	Type	Description	Default
`input_path`	`Union[str, Path, Image]`	Path to input image or image data	required
`layout_regions`	`Optional[List[Dict]]`	Optional list of layout regions to focus OCR on	`None`
`**kwargs`		Additional model-specific parameters	`{}`

Returns:

Type	Description
`OCROutput`	OCROutput containing extracted text

preprocess_input

preprocess_input(input_path: Union[str, Path, Image, ndarray]) -> List[Image.Image]

Convert input to list of PIL Images.

Parameters:

Name	Type	Description	Default
`input_path`	`Union[str, Path, Image, ndarray]`	Input image path or image data	required

Returns:

Type	Description
`List[Image]`	List of PIL Images

postprocess_output

postprocess_output(raw_output: Any, img_size: Tuple[int, int]) -> OCROutput

Convert raw OCR output to standardized OCROutput format.

Parameters:

Name	Type	Description	Default
`raw_output`	`Any`	Raw output from OCR engine	required
`img_size`	`Tuple[int, int]`	Original image size (width, height)	required

Returns:

Type	Description
`OCROutput`	Standardized OCROutput object

visualize

visualize(ocr_result: OCROutput, image_path: Union[str, Path, Image], output_path: str = 'visualized.png', box_color: str = 'red', box_width: int = 2, show_text: bool = False, text_color: str = 'blue', font_size: int = 12) -> None

Visualize OCR results by drawing bounding boxes on the original image.

This method allows users to easily see which extractor is working better by visualizing the detected text regions with bounding boxes.

get_supported_languages

get_supported_languages() -> List[str]

Get list of supported language codes.

set_languages

set_languages(languages: List[str]) -> None

Update supported languages for OCR extraction.

BaseOCRMapper

Handles language code mapping and normalization for OCR engines.

omnidocs.tasks.ocr_extraction.base.BaseOCRMapper

BaseOCRMapper(engine_name: str)

Base class for mapping OCR engine-specific outputs to standardized format.

Initialize mapper for specific OCR engine.

Parameters:

Name	Type	Description	Default
`engine_name`	`str`	Name of the OCR engine (e.g., 'tesseract', 'paddle', 'easyocr')	required

detect_text_language

detect_text_language(text: str) -> Optional[str]

Detect language of extracted text.

from_standard_language

from_standard_language(standard_language: str) -> str

Convert standard ISO 639-1 language code to engine-specific format.

get_supported_languages

get_supported_languages() -> List[str]

Get list of supported languages for this engine.

normalize_bbox

normalize_bbox(bbox: List[float], img_width: int, img_height: int) -> List[float]

Normalize bounding box coordinates to absolute pixel values.

to_standard_language

to_standard_language(engine_language: str) -> str

Convert engine-specific language code to standard ISO 639-1.

🖹 OCR (Optical Character Recognition)

Overview

Available Extractors

EasyOCRExtractor

omnidocs.tasks.ocr_extraction.extractors.easy_ocr.EasyOCRExtractor

extract

Usage Example

PaddleOCRExtractor

omnidocs.tasks.ocr_extraction.extractors.paddle.PaddleOCRExtractor

extract

Usage Example

SuryaOCRExtractor

omnidocs.tasks.ocr_extraction.extractors.surya_ocr.SuryaOCRExtractor

extract

Usage Example

TesseractOCRExtractor

omnidocs.tasks.ocr_extraction.extractors.tesseract_ocr.TesseractOCRExtractor

extract

Usage Example

OCROutput

omnidocs.tasks.ocr_extraction.base.OCROutput

get_sorted_by_reading_order

get_text_by_confidence

save_json

to_dict

Key Properties

Key Methods

OCRText

omnidocs.tasks.ocr_extraction.base.OCRText

to_dict

Attributes

BaseOCRExtractor

omnidocs.tasks.ocr_extraction.base.BaseOCRExtractor

extract abstractmethod

extract_all

extract_with_layout

preprocess_input

postprocess_output

visualize

get_supported_languages

set_languages

BaseOCRMapper

omnidocs.tasks.ocr_extraction.base.BaseOCRMapper

detect_text_language

from_standard_language

get_supported_languages

normalize_bbox

to_standard_language

Related Resources

extract `abstractmethod`