Skip to content

Models

Pydantic models for text extraction outputs.

Defines output types and format enums for text extraction.

OutputFormat

Bases: str, Enum

Supported text extraction output formats.

Each format has different characteristics
  • HTML: Structured with div elements, preserves layout semantics
  • MARKDOWN: Portable, human-readable, good for documentation
  • JSON: Structured data with layout information (Dots OCR)

TextOutput

Bases: BaseModel

Text extraction output from a document image.

Contains the extracted text content in the requested format, along with optional raw output and plain text versions.

Example
result = extractor.extract(image, output_format="markdown")
print(result.content)  # Clean markdown
print(result.plain_text)  # Plain text without formatting

content_length property

content_length: int

Length of the extracted content in characters.

word_count property

word_count: int

Approximate word count of the plain text.

LayoutElement

Bases: BaseModel

Single layout element from document layout detection.

Represents a detected region in the document with its bounding box, category label, and extracted text content.

ATTRIBUTE DESCRIPTION
bbox

Bounding box coordinates [x1, y1, x2, y2] (normalized to 0-1024)

TYPE: List[int]

category

Layout category (e.g., "Text", "Title", "Table", "Formula")

TYPE: str

text

Extracted text content (None for pictures)

TYPE: Optional[str]

confidence

Detection confidence score (optional)

TYPE: Optional[float]

DotsOCRTextOutput

Bases: BaseModel

Text extraction output from Dots OCR with layout information.

Dots OCR provides structured output with: - Layout detection (11 categories) - Bounding boxes (normalized to 0-1024) - Multi-format text (Markdown/LaTeX/HTML) - Reading order preservation

Layout Categories

Caption, Footnote, Formula, List-item, Page-footer, Page-header, Picture, Section-header, Table, Text, Title

Text Formatting
  • Text/Title/Section-header: Markdown
  • Formula: LaTeX
  • Table: HTML
  • Picture: (text omitted)
Example
from omnidocs.tasks.text_extraction import DotsOCRTextExtractor
result = extractor.extract(image, include_layout=True)
print(result.content)  # Full text with formatting
for elem in result.layout:
        print(f"{elem.category}: {elem.bbox}")

num_layout_elements property

num_layout_elements: int

Number of detected layout elements.

content_length property

content_length: int

Length of extracted content in characters.