Aggregation¶
Result aggregation utilities for batch processing.
Provides containers and utilities for storing, aggregating, and exporting results from batch document processing.
DocumentResult
¶
Container for results from processing a single document.
Stores results by page for easy access and serialization.
Examples:
doc_result = DocumentResult(source_path="paper.pdf", page_count=10)
doc_result.add_page_result(0, text_output)
doc_result.add_page_result(1, text_output)
# Access results
all_results = doc_result.all_results
page_0_result = doc_result.get_page_result(0)
# Save to file
doc_result.save_json("paper_result.json")
Initialize DocumentResult.
| PARAMETER | DESCRIPTION |
|---|---|
source_path
|
Path to source document
TYPE:
|
page_count
|
Total number of pages
TYPE:
|
Source code in omnidocs/utils/aggregation.py
all_results
property
¶
Get all results in page order.
| RETURNS | DESCRIPTION |
|---|---|
List[Any]
|
List of results sorted by page number |
add_page_result
¶
Add result for a specific page.
| PARAMETER | DESCRIPTION |
|---|---|
page_num
|
Page number (0-indexed)
TYPE:
|
result
|
Extraction result (TextOutput, LayoutOutput, etc.)
TYPE:
|
Source code in omnidocs/utils/aggregation.py
get_page_result
¶
Get result for a specific page.
| PARAMETER | DESCRIPTION |
|---|---|
page_num
|
Page number (0-indexed)
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Optional[Any]
|
Result for the page, or None if not found |
Source code in omnidocs/utils/aggregation.py
to_dict
¶
Convert to dictionary for serialization.
| RETURNS | DESCRIPTION |
|---|---|
dict
|
Dictionary representation |
Source code in omnidocs/utils/aggregation.py
save_json
¶
Save results to JSON file.
| PARAMETER | DESCRIPTION |
|---|---|
path
|
Output file path
TYPE:
|
Source code in omnidocs/utils/aggregation.py
BatchResult
¶
Container for results from processing multiple documents.
Examples:
batch_result = BatchResult()
batch_result.add_document_result("doc1", doc_result1)
batch_result.add_document_result("doc2", doc_result2)
# Access results
doc1_result = batch_result.get_document_result("doc1")
all_ids = batch_result.document_ids
# Save all results
batch_result.save_json("all_results.json")
Initialize empty BatchResult.
Source code in omnidocs/utils/aggregation.py
add_document_result
¶
Add result for a document.
| PARAMETER | DESCRIPTION |
|---|---|
doc_id
|
Document identifier (usually filename without extension)
TYPE:
|
result
|
DocumentResult instance
TYPE:
|
Source code in omnidocs/utils/aggregation.py
get_document_result
¶
Get result for a specific document.
| PARAMETER | DESCRIPTION |
|---|---|
doc_id
|
Document identifier
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Optional[DocumentResult]
|
DocumentResult or None if not found |
Source code in omnidocs/utils/aggregation.py
to_dict
¶
Convert to dictionary.
| RETURNS | DESCRIPTION |
|---|---|
dict
|
Dictionary representation |
Source code in omnidocs/utils/aggregation.py
save_json
¶
Save all results to JSON file.
| PARAMETER | DESCRIPTION |
|---|---|
path
|
Output file path
TYPE:
|
Source code in omnidocs/utils/aggregation.py
merge_text_results
¶
Merge multiple TextOutput results into single string.
| PARAMETER | DESCRIPTION |
|---|---|
results
|
List of TextOutput (or objects with .content attribute)
TYPE:
|
separator
|
String to join pages (default: double newline)
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str
|
Combined content string |
Examples:
all_results = doc_result.all_results
full_text = merge_text_results(all_results)
full_text_with_dividers = merge_text_results(all_results, separator="\n\n---\n\n")