Update README.md
Browse files
README.md
CHANGED
@@ -12,7 +12,9 @@ Authors: *Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang,
|
|
12 |
|
13 |
# Details of byt5-is-ocr-post-processing-modern-texts
|
14 |
*Note: This model is almost the same as [atlijas/byt5-is-ocr-post-processing-old-texts](https://huggingface.co/atlijas/byt5-is-ocr-post-processing-old-texts/). The only difference is the amount of epochs trained.*
|
15 |
-
This model generates a revised version of a given Icelandic OCRed text. The model was trained with [simpleT5](https://github.com/Shivanandroy/simpleT5) on 900.000 lines (\~7.000.000 tokens) of which only 50.000 (\~400.000 tokens) were from real OCRed texts. The rest were extracted from [The Icelandic Gigaword Corpus](https://clarin.is/en/resources/gigaword/) and augmented with artificial errors. It can be assumed that increasing the amount of OCRed data can significantly improve the model.
|
|
|
|
|
16 |
|
17 |
# Usage
|
18 |
```python
|
|
|
12 |
|
13 |
# Details of byt5-is-ocr-post-processing-modern-texts
|
14 |
*Note: This model is almost the same as [atlijas/byt5-is-ocr-post-processing-old-texts](https://huggingface.co/atlijas/byt5-is-ocr-post-processing-old-texts/). The only difference is the amount of epochs trained.*
|
15 |
+
This model generates a revised version of a given Icelandic OCRed text. The model was trained with [simpleT5](https://github.com/Shivanandroy/simpleT5) on 900.000 lines (\~7.000.000 tokens) of which only 50.000 (\~400.000 tokens) were from real OCRed texts. The rest were extracted from [The Icelandic Gigaword Corpus](https://clarin.is/en/resources/gigaword/) and augmented with artificial errors. It can be assumed that increasing the amount of OCRed data can significantly improve the model.
|
16 |
+
|
17 |
+
For inference, it is recommended to feed the model one line (not necessarily whole sentences, though) at a time.
|
18 |
|
19 |
# Usage
|
20 |
```python
|