Norm
/

nougat-latex-base

vision-encoder-decoder

image-text-to-text

Inference Endpoints

Model card Files Files and versions Community

Norm commited on Oct 9, 2023

Commit

f7bba7a

·

1 Parent(s): 0ece95d

Update README.md

Files changed (1) hide show

README.md +7 -3

README.md CHANGED Viewed

@@ -12,8 +12,7 @@ pipeline_tag: image-to-text
 - **Repository:** [source code](https://github.com/NormXU/nougat-latex-ocr)
 Nougat-LaTeX-based is fine-tuned from [facebook/nougat-base](https://huggingface.co/facebook/nougat-base) with [im2latex-100k](https://zenodo.org/record/56198#.V2px0jXT6eA) to boost its proficiency in generating LaTeX code from images.
-Since the initial encoder input image size of nougat was unsuitable for equation image segments, leading to potential rescaling artifacts that degrades the generation quality of LaTeX code. To address this, Nougat-LaTeX-based adjusts the input resolution to a height of 224 and a width of 560.
-Additionally, an adaptive padding approach is used to ensure that equation image segments in the wild are resized to closely match the resolution of the training data.
 ### Evaluation
@@ -21,9 +20,14 @@ Evaluated on an image-equation pair dataset collected from Wikipedia, arXiv, and
 |model| token_acc ↑ | normed edit distance ↓ |
 | --- | --- | --- |
 |pix2tex*|0.60|0.10|
 |nougat-latex-based| **0.623850** | **0.06180** |
-pix2tex*: reported from [LaTeX-OCR](https://github.com/lukas-blecher/LaTeX-OCR); nougat-latex-based is evaluated on results generated with beam-search strategy.
 ## Requirements
 ```text

 - **Repository:** [source code](https://github.com/NormXU/nougat-latex-ocr)
 Nougat-LaTeX-based is fine-tuned from [facebook/nougat-base](https://huggingface.co/facebook/nougat-base) with [im2latex-100k](https://zenodo.org/record/56198#.V2px0jXT6eA) to boost its proficiency in generating LaTeX code from images.
+Since the initial encoder input image size of nougat was unsuitable for equation image segments, leading to potential rescaling artifacts that degrades the generation quality of LaTeX code. To address this, Nougat-LaTeX-based adjusts the input resolution and use an adaptive padding approach ensure that equation image segments in the wild are resized to closely match the resolution of the training data.
 ### Evaluation
 |model| token_acc ↑ | normed edit distance ↓ |
 | --- | --- | --- |
+|pix2tex| 0.5346 | 0.10312
 |pix2tex*|0.60|0.10|
 |nougat-latex-based| **0.623850** | **0.06180** |
+pix2tex is a ResNet + ViT + Text Decoder architecture introduced in [LaTeX-OCR](https://github.com/lukas-blecher/LaTeX-OCR).
+**pix2tex***: reported from [LaTeX-OCR](https://github.com/lukas-blecher/LaTeX-OCR);  **pix2tex**: my evaluation with the released [checkpoint](https://github.com/lukas-blecher/LaTeX-OCR/releases/tag/v0.0.1) ; **nougat-latex-based**: evaluated on results generated with beam-search strategy.
 ## Requirements
 ```text