Update README.md (#1)

9b852cc verified 5 months ago

No virus

4.62 kB

	---
	license: apache-2.0
	datasets:
	- yuntian-deng/im2latex-100k
	metrics:
	- bleu
	- cer
	pipeline_tag: image-to-text
	tags:
	- vision
	- nougat
	---
	# Nougat for formula

	<!-- Provide a quick summary of what the model is/does. -->

	We performed fune-tuning on [small-sized Nougat model](https://huggingface.co/facebook/nougat-small) using data
	from [IM2LATEX-100K](https://www.kaggle.com/datasets/shahrukhkhan/im2latex100k) to make it especially powerful in
	identifying formula from images.

	## Model Details

	### Model Description

	<!-- Provide a longer summary of what this model is. -->

	Nougat for formula is good at identifying formula from images. It takes images with white backgroud and formula written in
	black as input and return with accurate Latex code for the formula.

	The Naugat model (Neural Optical Understanding for Academic Documents) was proposed by Meta AI in August 2023 as
	a visual Transformer model for processing scientific documents. It can convert PDF format documents into Markup language,
	especially with good recognition ability for mathematical expressions and tables.The goal of this model is to improve the accessibility
	of scientific knowledge by bridging human readable documents with machine readable text.



	- Model type: Vision Encoder Decoder
	- Finetuned from model: [Nougat model, small-sized version](https://huggingface.co/facebook/nougat-small)


	## Uses

	<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->

	Nougat for formula can be used as a tool for converting complicated formula to Latex code. It has potential to be
	a good substitute for other tools.

	For example, when you are taking notes and tired at coding long Latex/Markdown formula code, just make a screen shot
	of them and put it into Nougat for formula. Then you can get the exact code for the formula as long as it won't exceed
	the max length of the model you use.

	You can also continue fine-tuning the model to make it more powerful in identifying formulas from certain subjects.

	Nougat for formula may be useful when developing tools or apps aiming at generating Latex code.



	## How to Get Started with the Model

	Demo below shows how to input an image into the model and generate Latex/Markdown formula code.

	``` python
	from transformers import NougatProcessor, VisionEncoderDecoderModel
	from PIL import Image

	max_length = 100 # defing max length of output
	processor = NougatProcessor.from_pretrained(r".", max_length = max_length) # Replace with your path
	model = VisionEncoderDecoderModel.from_pretrained(r".") # Replace with your path

	image = Image.open(r"image_path") # Replace with your path
	image = processor(image, return_tensors="pt").pixel_values # The processor will resize the image according to our model

	result_tensor = model.generate(
	image,
	max_length=max_length,
	bad_words_ids=[[processor.tokenizer.unk_token_id]]
	) # generate id tensor

	result = processor.batch_decode(result_tensor, skip_special_tokens=True) # Using the processor to decode the result
	result = processor.post_process_generation(result, fix_markdown=False)

	print(*result)
	```


	## Training Details

	### Training Data

	<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->

	[IM2LATEX-100K](https://www.kaggle.com/datasets/shahrukhkhan/im2latex100k)


	#### Preprocessing

	The preprocessing of X(image) has been showed in the short demo above.

	The preprocessing of Y(formula) is done by:

	1. Remove the space in the formula string.
	2. Using `processor` to tokenize the string.


	#### Training Hyperparameters

	- Training regime: `torch.optim.AdamW(model.parameters(), lr=1e-4)` <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->


	## Evaluation

	<!-- This section describes the evaluation protocols and provides the results. -->

	### Testing Data, Factors & Metrics

	#### Testing Data

	<!-- This should link to a Dataset Card if possible. -->
	The tesing data is also taken from [IM2LATEX-100K](https://www.kaggle.com/datasets/shahrukhkhan/im2latex100k).
	Note that the train, validation and test data has been well split before downloading.


	#### Metrics

	<!-- These are the evaluation metrics being used, ideally with a description of why. -->

	BLEU and CER.

	### Results

	The BLEU is 0.8157 and CER is 0.1601 on test data.