Donut

Donut Math Extraction¶

In [1]:

Copied!

from omnidocs.tasks.math_expression_extraction.extractors.donut import DonutExtractor
print("DonutExtractor imported successfully!")
from omnidocs.tasks.math_expression_extraction.extractors.donut import DonutExtractor
print("DonutExtractor imported successfully!")

c:\Users\laxma\OneDrive\Desktop\CogLab\11-07-2025\Omnidocs\new\Lib\site-packages\transformers\utils\hub.py:111: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
  warnings.warn(

DonutExtractor imported successfully!

In [2]:

Copied!

image_path = "../../../../tests/math_expression_extraction/assets/math_equation.png"
extractor = DonutExtractor(device='cpu', show_log=False)
result = extractor.extract(image_path)
image_path = "../../../../tests/math_expression_extraction/assets/math_equation.png"
extractor = DonutExtractor(device='cpu', show_log=False)
result = extractor.extract(image_path)

INFO     [timestamp]2025-07-31 12:50:25[/] |                                                                       
         [logger.name]omnidocs.tasks.math_expression_extraction.extractors.donut[/] | [function]donut.py:153[/] |  
         [info]Loading Donut model from local path:                                                                
         C:\Users\laxma\OneDrive\Desktop\CogLab\11-07-2025\Omnidocs\omnidocs\models\donut_models\naver-clova-ix_don
         ut-base-finetuned-cord-v2[/]

INFO     [timestamp]2025-07-31 12:50:25[/] |                                                                       
         [logger.name]omnidocs.tasks.math_expression_extraction.extractors.donut[/] | [function]donut.py:153[/] |  
         [info]Loading Donut model from local path:                                                                
         C:\Users\laxma\OneDrive\Desktop\CogLab\11-07-2025\Omnidocs\omnidocs\models\donut_models\naver-clova-ix_don
         ut-base-finetuned-cord-v2[/]

Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.
The following generation flags are not valid and may be ignored: ['early_stopping']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
Passing a tuple of `past_key_values` is deprecated and will be removed in Transformers v4.58.0. You should pass an instance of `EncoderDecoderCache` instead, e.g. `past_key_values=EncoderDecoderCache.from_legacy_cache(past_key_values)`.

INFO     [timestamp]2025-07-31 12:50:40[/] |                                                                       
         [logger.name]omnidocs.tasks.math_expression_extraction.extractors.donut[/] | [function]logging.py:150[/] |
         [info]extract completed in 10.32s[/]

INFO     [timestamp]2025-07-31 12:50:40[/] |                                                                       
         [logger.name]omnidocs.tasks.math_expression_extraction.extractors.donut[/] | [function]logging.py:150[/] |
         [info]extract completed in 10.32s[/]

In [3]:

Copied!

expr = result.expressions[0]
print(f"LaTeX: {expr[:80]}...")
expr = result.expressions[0]
print(f"LaTeX: {expr[:80]}...")

LaTeX: <s_menu><s_nm> lim</s_nm><s_unitprice> 9x-6</s_unitprice><s_cnt> 9x+24</s_cnt><s...

In [ ]: