Extractor¶
MinerU VL text extractor with layout-aware two-step extraction.
MinerU VL performs document extraction in two steps: 1. Layout Detection: Detect regions with types (text, table, equation, etc.) 2. Content Recognition: Extract text/table/equation content from each region
MinerUVLTextExtractor
¶
Bases: BaseTextExtractor
MinerU VL text extractor with layout-aware extraction.
Performs two-step extraction: 1. Layout detection (detect regions) 2. Content recognition (extract text/table/equation from each region)
Supports multiple backends: - PyTorch (HuggingFace Transformers) - VLLM (high-throughput GPU) - MLX (Apple Silicon) - API (VLLM OpenAI-compatible server)
Example
from omnidocs.tasks.text_extraction import MinerUVLTextExtractor
from omnidocs.tasks.text_extraction.mineruvl import MinerUVLTextPyTorchConfig
extractor = MinerUVLTextExtractor(
backend=MinerUVLTextPyTorchConfig(device="cuda")
)
result = extractor.extract(image)
print(result.content) # Combined text + tables + equations
print(result.blocks) # List of ContentBlock objects
Initialize MinerU VL text extractor.
| PARAMETER | DESCRIPTION |
|---|---|
backend
|
Backend configuration (PyTorch, VLLM, MLX, or API)
TYPE:
|
Source code in omnidocs/tasks/text_extraction/mineruvl/extractor.py
extract
¶
extract(
image: Union[Image, ndarray, str, Path],
output_format: Literal["html", "markdown"] = "markdown",
) -> TextOutput
Extract text with layout-aware two-step extraction.
| PARAMETER | DESCRIPTION |
|---|---|
image
|
Input image (PIL Image, numpy array, or file path)
TYPE:
|
output_format
|
Output format ('html' or 'markdown')
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
TextOutput
|
TextOutput with extracted content and metadata |
Source code in omnidocs/tasks/text_extraction/mineruvl/extractor.py
extract_with_blocks
¶
extract_with_blocks(
image: Union[Image, ndarray, str, Path],
output_format: Literal["html", "markdown"] = "markdown",
) -> tuple[TextOutput, List[ContentBlock]]
Extract text and return both TextOutput and ContentBlocks.
This method provides access to the detailed block information including bounding boxes and block types.
| PARAMETER | DESCRIPTION |
|---|---|
image
|
Input image
TYPE:
|
output_format
|
Output format
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
tuple[TextOutput, List[ContentBlock]]
|
Tuple of (TextOutput, List[ContentBlock]) |