Models¶
Pydantic models for OCR extraction outputs.
Defines standardized output types for OCR detection including text blocks with bounding boxes, confidence scores, and granularity levels.
Key difference from Text Extraction: - OCR returns text WITH bounding boxes (word/line/character level) - Text Extraction returns formatted text (MD/HTML) WITHOUT bboxes
Coordinate Systems
- Absolute (default): Coordinates in pixels relative to original image size
- Normalized (0-1024): Coordinates scaled to 0-1024 range (virtual 1024x1024 canvas)
Use bbox.to_normalized(width, height) or output.get_normalized_blocks()
to convert to normalized coordinates.
Example
OCRGranularity
¶
Bases: str, Enum
OCR detection granularity levels.
Different OCR engines return results at different granularity levels. This enum standardizes the options across all extractors.
BoundingBox
¶
Bases: BaseModel
Bounding box coordinates in pixel space.
Coordinates follow the convention: (x1, y1) is top-left, (x2, y2) is bottom-right. For rotated text, use the polygon field in TextBlock instead.
Example
to_list
¶
to_xyxy
¶
to_xywh
¶
from_list
classmethod
¶
Create from [x1, y1, x2, y2] list.
Source code in omnidocs/tasks/ocr_extraction/models.py
from_polygon
classmethod
¶
Create axis-aligned bounding box from polygon points.
| PARAMETER | DESCRIPTION |
|---|---|
polygon
|
List of [x, y] points (usually 4 for quadrilateral)
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
BoundingBox
|
BoundingBox that encloses all polygon points |
Source code in omnidocs/tasks/ocr_extraction/models.py
to_normalized
¶
Convert to normalized coordinates (0-1024 range).
Scales coordinates from absolute pixel values to a virtual 1024x1024 canvas. This provides consistent coordinates regardless of original image size.
| PARAMETER | DESCRIPTION |
|---|---|
image_width
|
Original image width in pixels
TYPE:
|
image_height
|
Original image height in pixels
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
BoundingBox
|
New BoundingBox with coordinates in 0-1024 range |
Source code in omnidocs/tasks/ocr_extraction/models.py
to_absolute
¶
Convert from normalized (0-1024) to absolute pixel coordinates.
| PARAMETER | DESCRIPTION |
|---|---|
image_width
|
Target image width in pixels
TYPE:
|
image_height
|
Target image height in pixels
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
BoundingBox
|
New BoundingBox with absolute pixel coordinates |
Source code in omnidocs/tasks/ocr_extraction/models.py
TextBlock
¶
Bases: BaseModel
Single detected text element with text, bounding box, and confidence.
This is the fundamental unit of OCR output - can represent a character, word, line, or block depending on the OCR model and configuration.
Example
to_dict
¶
Convert to dictionary representation.
Source code in omnidocs/tasks/ocr_extraction/models.py
get_normalized_bbox
¶
Get bounding box in normalized (0-1024) coordinates.
| PARAMETER | DESCRIPTION |
|---|---|
image_width
|
Original image width
TYPE:
|
image_height
|
Original image height
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
BoundingBox
|
BoundingBox with normalized coordinates |
Source code in omnidocs/tasks/ocr_extraction/models.py
OCROutput
¶
Bases: BaseModel
Complete OCR extraction results for a single image.
Contains all detected text blocks with their bounding boxes, plus metadata about the extraction.
Example
filter_by_confidence
¶
Filter text blocks by minimum confidence.
filter_by_granularity
¶
Filter text blocks by granularity level.
to_dict
¶
Convert to dictionary representation.
Source code in omnidocs/tasks/ocr_extraction/models.py
sort_by_position
¶
Return a new OCROutput with blocks sorted by position.
| PARAMETER | DESCRIPTION |
|---|---|
top_to_bottom
|
If True, sort by y-coordinate (reading order)
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
OCROutput
|
New OCROutput with sorted text blocks |
Source code in omnidocs/tasks/ocr_extraction/models.py
get_normalized_blocks
¶
Get all text blocks with normalized (0-1024) coordinates.
| RETURNS | DESCRIPTION |
|---|---|
List[Dict]
|
List of dicts with normalized bbox coordinates and metadata. |
Source code in omnidocs/tasks/ocr_extraction/models.py
visualize
¶
visualize(
image: Image,
output_path: Optional[Union[str, Path]] = None,
show_text: bool = True,
show_confidence: bool = False,
line_width: int = 2,
box_color: str = "#2ECC71",
text_color: str = "#000000",
) -> Image.Image
Visualize OCR results on the image.
Draws bounding boxes around detected text with optional labels.
| PARAMETER | DESCRIPTION |
|---|---|
image
|
PIL Image to draw on (will be copied, not modified)
TYPE:
|
output_path
|
Optional path to save the visualization
TYPE:
|
show_text
|
Whether to show detected text
TYPE:
|
show_confidence
|
Whether to show confidence scores
TYPE:
|
line_width
|
Width of bounding box lines
TYPE:
|
box_color
|
Color for bounding boxes (hex)
TYPE:
|
text_color
|
Color for text labels (hex)
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Image
|
PIL Image with visualizations drawn |
Source code in omnidocs/tasks/ocr_extraction/models.py
363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 | |
load_json
classmethod
¶
Load an OCROutput instance from a JSON file.
| PARAMETER | DESCRIPTION |
|---|---|
file_path
|
Path to JSON file
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
OCROutput
|
OCROutput instance |
Source code in omnidocs/tasks/ocr_extraction/models.py
save_json
¶
Save OCROutput instance to a JSON file.
| PARAMETER | DESCRIPTION |
|---|---|
file_path
|
Path where JSON file should be saved
TYPE:
|