Norm commited on
Commit
f7bba7a
•
1 Parent(s): 0ece95d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -3
README.md CHANGED
@@ -12,8 +12,7 @@ pipeline_tag: image-to-text
12
  - **Repository:** [source code](https://github.com/NormXU/nougat-latex-ocr)
13
 
14
  Nougat-LaTeX-based is fine-tuned from [facebook/nougat-base](https://huggingface.co/facebook/nougat-base) with [im2latex-100k](https://zenodo.org/record/56198#.V2px0jXT6eA) to boost its proficiency in generating LaTeX code from images.
15
- Since the initial encoder input image size of nougat was unsuitable for equation image segments, leading to potential rescaling artifacts that degrades the generation quality of LaTeX code. To address this, Nougat-LaTeX-based adjusts the input resolution to a height of 224 and a width of 560.
16
- Additionally, an adaptive padding approach is used to ensure that equation image segments in the wild are resized to closely match the resolution of the training data.
17
 
18
 
19
  ### Evaluation
@@ -21,9 +20,14 @@ Evaluated on an image-equation pair dataset collected from Wikipedia, arXiv, and
21
 
22
  |model| token_acc ↑ | normed edit distance ↓ |
23
  | --- | --- | --- |
 
24
  |pix2tex*|0.60|0.10|
25
  |nougat-latex-based| **0.623850** | **0.06180** |
26
- pix2tex*: reported from [LaTeX-OCR](https://github.com/lukas-blecher/LaTeX-OCR); nougat-latex-based is evaluated on results generated with beam-search strategy.
 
 
 
 
27
 
28
  ## Requirements
29
  ```text
 
12
  - **Repository:** [source code](https://github.com/NormXU/nougat-latex-ocr)
13
 
14
  Nougat-LaTeX-based is fine-tuned from [facebook/nougat-base](https://huggingface.co/facebook/nougat-base) with [im2latex-100k](https://zenodo.org/record/56198#.V2px0jXT6eA) to boost its proficiency in generating LaTeX code from images.
15
+ Since the initial encoder input image size of nougat was unsuitable for equation image segments, leading to potential rescaling artifacts that degrades the generation quality of LaTeX code. To address this, Nougat-LaTeX-based adjusts the input resolution and use an adaptive padding approach ensure that equation image segments in the wild are resized to closely match the resolution of the training data.
 
16
 
17
 
18
  ### Evaluation
 
20
 
21
  |model| token_acc ↑ | normed edit distance ↓ |
22
  | --- | --- | --- |
23
+ |pix2tex| 0.5346 | 0.10312
24
  |pix2tex*|0.60|0.10|
25
  |nougat-latex-based| **0.623850** | **0.06180** |
26
+
27
+ pix2tex is a ResNet + ViT + Text Decoder architecture introduced in [LaTeX-OCR](https://github.com/lukas-blecher/LaTeX-OCR).
28
+
29
+ **pix2tex***: reported from [LaTeX-OCR](https://github.com/lukas-blecher/LaTeX-OCR); **pix2tex**: my evaluation with the released [checkpoint](https://github.com/lukas-blecher/LaTeX-OCR/releases/tag/v0.0.1) ; **nougat-latex-based**: evaluated on results generated with beam-search strategy.
30
+
31
 
32
  ## Requirements
33
  ```text