Tasks¶
Tasks define what you want to extract. Models define how.
Available Tasks¶
| Task | Input | Output | Status |
|---|---|---|---|
| Text Extraction | Image / PDF | Markdown, HTML | ✅ Ready |
| Layout Analysis | Image | Bounding boxes + labels | ✅ Ready |
| OCR | Image | Text + coordinates | ✅ Ready |
| Table Extraction | Table image | Structured table data | ✅ Ready |
| Reading Order | Layout + OCR | Ordered elements | ✅ Ready |
| Structured Extraction | Image + Schema | Typed Pydantic objects | ✅ Ready |
Choosing a Task¶
"I want readable text from a PDF" → Text Extraction
"I need to know where tables and figures are" → Layout Analysis
"I need word positions for downstream processing" → OCR
"I want structured data from a table" → Table Extraction
"I need elements in reading order" → Reading Order
"I want typed data from invoices/forms" → Structured Extraction
Upcoming Tasks¶
| Task | Description | Status |
|---|---|---|
| Math Recognition | LaTeX from equations | 🔜 Soon |
| Chart Understanding | Data extraction from charts | 🔜 Planned |
| Image Captioning | Caption figures and images | 🔜 Planned |
See Roadmap for full tracking.