Models¶
Pydantic models for text extraction outputs.
Defines output types and format enums for text extraction.
OutputFormat
¶
Bases: str, Enum
Supported text extraction output formats.
Each format has different characteristics
- HTML: Structured with div elements, preserves layout semantics
- MARKDOWN: Portable, human-readable, good for documentation
- JSON: Structured data with layout information (Dots OCR)
TextOutput
¶
Bases: BaseModel
Text extraction output from a document image.
Contains the extracted text content in the requested format, along with optional raw output and plain text versions.
Example
LayoutElement
¶
Bases: BaseModel
Single layout element from document layout detection.
Represents a detected region in the document with its bounding box, category label, and extracted text content.
| ATTRIBUTE | DESCRIPTION |
|---|---|
bbox |
Bounding box coordinates [x1, y1, x2, y2] (normalized to 0-1024)
TYPE:
|
category |
Layout category (e.g., "Text", "Title", "Table", "Formula")
TYPE:
|
text |
Extracted text content (None for pictures)
TYPE:
|
confidence |
Detection confidence score (optional)
TYPE:
|
DotsOCRTextOutput
¶
Bases: BaseModel
Text extraction output from Dots OCR with layout information.
Dots OCR provides structured output with: - Layout detection (11 categories) - Bounding boxes (normalized to 0-1024) - Multi-format text (Markdown/LaTeX/HTML) - Reading order preservation
Layout Categories
Caption, Footnote, Formula, List-item, Page-footer, Page-header, Picture, Section-header, Table, Text, Title
Text Formatting
- Text/Title/Section-header: Markdown
- Formula: LaTeX
- Table: HTML
- Picture: (text omitted)