Update README.md
Browse files
README.md
CHANGED
@@ -12,8 +12,7 @@ pipeline_tag: image-to-text
|
|
12 |
- **Repository:** [source code](https://github.com/NormXU/nougat-latex-ocr)
|
13 |
|
14 |
Nougat-LaTeX-based is fine-tuned from [facebook/nougat-base](https://huggingface.co/facebook/nougat-base) with [im2latex-100k](https://zenodo.org/record/56198#.V2px0jXT6eA) to boost its proficiency in generating LaTeX code from images.
|
15 |
-
Since the initial encoder input image size of nougat was unsuitable for equation image segments, leading to potential rescaling artifacts that degrades the generation quality of LaTeX code. To address this, Nougat-LaTeX-based adjusts the input resolution
|
16 |
-
Additionally, an adaptive padding approach is used to ensure that equation image segments in the wild are resized to closely match the resolution of the training data.
|
17 |
|
18 |
|
19 |
### Evaluation
|
@@ -21,9 +20,14 @@ Evaluated on an image-equation pair dataset collected from Wikipedia, arXiv, and
|
|
21 |
|
22 |
|model| token_acc ↑ | normed edit distance ↓ |
|
23 |
| --- | --- | --- |
|
|
|
24 |
|pix2tex*|0.60|0.10|
|
25 |
|nougat-latex-based| **0.623850** | **0.06180** |
|
26 |
-
|
|
|
|
|
|
|
|
|
27 |
|
28 |
## Requirements
|
29 |
```text
|
|
|
12 |
- **Repository:** [source code](https://github.com/NormXU/nougat-latex-ocr)
|
13 |
|
14 |
Nougat-LaTeX-based is fine-tuned from [facebook/nougat-base](https://huggingface.co/facebook/nougat-base) with [im2latex-100k](https://zenodo.org/record/56198#.V2px0jXT6eA) to boost its proficiency in generating LaTeX code from images.
|
15 |
+
Since the initial encoder input image size of nougat was unsuitable for equation image segments, leading to potential rescaling artifacts that degrades the generation quality of LaTeX code. To address this, Nougat-LaTeX-based adjusts the input resolution and use an adaptive padding approach ensure that equation image segments in the wild are resized to closely match the resolution of the training data.
|
|
|
16 |
|
17 |
|
18 |
### Evaluation
|
|
|
20 |
|
21 |
|model| token_acc ↑ | normed edit distance ↓ |
|
22 |
| --- | --- | --- |
|
23 |
+
|pix2tex| 0.5346 | 0.10312
|
24 |
|pix2tex*|0.60|0.10|
|
25 |
|nougat-latex-based| **0.623850** | **0.06180** |
|
26 |
+
|
27 |
+
pix2tex is a ResNet + ViT + Text Decoder architecture introduced in [LaTeX-OCR](https://github.com/lukas-blecher/LaTeX-OCR).
|
28 |
+
|
29 |
+
**pix2tex***: reported from [LaTeX-OCR](https://github.com/lukas-blecher/LaTeX-OCR); **pix2tex**: my evaluation with the released [checkpoint](https://github.com/lukas-blecher/LaTeX-OCR/releases/tag/v0.0.1) ; **nougat-latex-based**: evaluated on results generated with beam-search strategy.
|
30 |
+
|
31 |
|
32 |
## Requirements
|
33 |
```text
|