atlijas
/

byt5-is-ocr-post-processing-modern-texts

Text2Text Generation

Inference Endpoints

text-generation-inference

Model card Files Files and versions Community

atlijas commited on Nov 15, 2022

Commit

fc58e46

•

1 Parent(s): 5dce506

Update README.md

Files changed (1) hide show

README.md +3 -1

README.md CHANGED Viewed

@@ -12,7 +12,9 @@ Authors: *Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang,
 # Details of byt5-is-ocr-post-processing-modern-texts
 *Note: This model is almost the same as [atlijas/byt5-is-ocr-post-processing-old-texts](https://huggingface.co/atlijas/byt5-is-ocr-post-processing-old-texts/). The only difference is the amount of epochs trained.*
-This model generates a revised version of a given Icelandic OCRed text. The model was trained with [simpleT5](https://github.com/Shivanandroy/simpleT5) on 900.000 lines (\~7.000.000 tokens) of which only 50.000 (\~400.000 tokens) were from real OCRed texts. The rest were extracted from [The Icelandic Gigaword Corpus](https://clarin.is/en/resources/gigaword/) and augmented with artificial errors. It can be assumed that increasing the amount of OCRed data can significantly improve the model.
 # Usage
 ```python

 # Details of byt5-is-ocr-post-processing-modern-texts
 *Note: This model is almost the same as [atlijas/byt5-is-ocr-post-processing-old-texts](https://huggingface.co/atlijas/byt5-is-ocr-post-processing-old-texts/). The only difference is the amount of epochs trained.*
+This model generates a revised version of a given Icelandic OCRed text. The model was trained with [simpleT5](https://github.com/Shivanandroy/simpleT5) on 900.000 lines (\~7.000.000 tokens) of which only 50.000 (\~400.000 tokens) were from real OCRed texts. The rest were extracted from [The Icelandic Gigaword Corpus](https://clarin.is/en/resources/gigaword/) and augmented with artificial errors. It can be assumed that increasing the amount of OCRed data can significantly improve the model.
+For inference, it is recommended to feed the model one line (not necessarily whole sentences, though) at a time.
 # Usage
 ```python