Donut
Donut Math Extraction¶
In [1]:
Copied!
from omnidocs.tasks.math_expression_extraction.extractors.donut import DonutExtractor
print("DonutExtractor imported successfully!")
from omnidocs.tasks.math_expression_extraction.extractors.donut import DonutExtractor
print("DonutExtractor imported successfully!")
c:\Users\laxma\OneDrive\Desktop\CogLab\11-07-2025\Omnidocs\new\Lib\site-packages\transformers\utils\hub.py:111: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead. warnings.warn(
DonutExtractor imported successfully!
In [2]:
Copied!
image_path = "../../../../tests/math_expression_extraction/assets/math_equation.png"
extractor = DonutExtractor(device='cpu', show_log=False)
result = extractor.extract(image_path)
image_path = "../../../../tests/math_expression_extraction/assets/math_equation.png"
extractor = DonutExtractor(device='cpu', show_log=False)
result = extractor.extract(image_path)
INFO [timestamp]2025-07-31 12:50:25[/] | [logger.name]omnidocs.tasks.math_expression_extraction.extractors.donut[/] | [function]donut.py:153[/] | [info]Loading Donut model from local path: C:\Users\laxma\OneDrive\Desktop\CogLab\11-07-2025\Omnidocs\omnidocs\models\donut_models\naver-clova-ix_don ut-base-finetuned-cord-v2[/]
INFO [timestamp]2025-07-31 12:50:25[/] | [logger.name]omnidocs.tasks.math_expression_extraction.extractors.donut[/] | [function]donut.py:153[/] | [info]Loading Donut model from local path: C:\Users\laxma\OneDrive\Desktop\CogLab\11-07-2025\Omnidocs\omnidocs\models\donut_models\naver-clova-ix_don ut-base-finetuned-cord-v2[/]
Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`. The following generation flags are not valid and may be ignored: ['early_stopping']. Set `TRANSFORMERS_VERBOSITY=info` for more details. Passing a tuple of `past_key_values` is deprecated and will be removed in Transformers v4.58.0. You should pass an instance of `EncoderDecoderCache` instead, e.g. `past_key_values=EncoderDecoderCache.from_legacy_cache(past_key_values)`.
INFO [timestamp]2025-07-31 12:50:40[/] | [logger.name]omnidocs.tasks.math_expression_extraction.extractors.donut[/] | [function]logging.py:150[/] | [info]extract completed in 10.32s[/]
INFO [timestamp]2025-07-31 12:50:40[/] | [logger.name]omnidocs.tasks.math_expression_extraction.extractors.donut[/] | [function]logging.py:150[/] | [info]extract completed in 10.32s[/]
In [3]:
Copied!
expr = result.expressions[0]
print(f"LaTeX: {expr[:80]}...")
expr = result.expressions[0]
print(f"LaTeX: {expr[:80]}...")
LaTeX: <s_menu><s_nm> lim</s_nm><s_unitprice> 9x-6</s_unitprice><s_cnt> 9x+24</s_cnt><s...
In [ ]:
Copied!