Nougat for formula
We performed fune-tuning on small-sized Nougat model using data from IM2LATEX-100K to make it especially powerful in identifying formula from images.
Model Details
Model Description
Nougat for formula is good at identifying formula from images. It takes images with white backgroud and formula written in black as input and return with accurate Latex code for the formula.
The Naugat model (Neural Optical Understanding for Academic Documents) was proposed by Meta AI in August 2023 as a visual Transformer model for processing scientific documents. It can convert PDF format documents into Markup language, especially with good recognition ability for mathematical expressions and tables.The goal of this model is to improve the accessibility of scientific knowledge by bridging human readable documents with machine readable text.
- Model type: Vision Encoder Decoder
- Finetuned from model: Nougat model, small-sized version
Uses
Nougat for formula can be used as a tool for converting complicated formula to Latex code. It has potential to be a good substitute for other tools.
For example, when you are taking notes and tired at coding long Latex/Markdown formula code, just make a screen shot of them and put it into Nougat for formula. Then you can get the exact code for the formula as long as it won't exceed the max length of the model you use.
You can also continue fine-tuning the model to make it more powerful in identifying formulas from certain subjects.
Nougat for formula may be useful when developing tools or apps aiming at generating Latex code.
How to Get Started with the Model
Demo below shows how to input an image into the model and generate Latex/Markdown formula code.
from transformers import NougatProcessor, VisionEncoderDecoderModel
from PIL import Image
max_length = 100 # defing max length of output
processor = NougatProcessor.from_pretrained(r".", max_length = max_length) # Replace with your path
model = VisionEncoderDecoderModel.from_pretrained(r".") # Replace with your path
image = Image.open(r"image_path") # Replace with your path
image = processor(image, return_tensors="pt").pixel_values # The processor will resize the image according to our model
result_tensor = model.generate(
image,
max_length=max_length,
bad_words_ids=[[processor.tokenizer.unk_token_id]]
) # generate id tensor
result = processor.batch_decode(result_tensor, skip_special_tokens=True) # Using the processor to decode the result
result = processor.post_process_generation(result, fix_markdown=False)
print(*result)
Training Details
Training Data
Preprocessing
The preprocessing of X(image) has been showed in the short demo above.
The preprocessing of Y(formula) is done by:
- Remove the space in the formula string.
- Using
processor
to tokenize the string.
Training Hyperparameters
- Training regime:
torch.optim.AdamW(model.parameters(), lr=1e-4)
Evaluation
Testing Data, Factors & Metrics
Testing Data
The tesing data is also taken from IM2LATEX-100K. Note that the train, validation and test data has been well split before downloading.
Metrics
BLEU and CER.
Results
The BLEU is 0.8157 and CER is 0.1601 on test data.
- Downloads last month
- 19
Model tree for CuiSiwei/nougat-for-formula
Base model
facebook/nougat-small