Pclanglais
commited on
Commit
•
7018bf6
1
Parent(s):
18a2b69
Update README.md
Browse files
README.md
CHANGED
@@ -12,7 +12,7 @@ language:
|
|
12 |
|
13 |
OCROnos models are versatile tools supporting the correction of OCR errors, wrong word cut/merge and overall broken text structures. The training data includes a highly diverse set of ocrized texts in multiple languages from PleIAs open pre-training corpus, drawn from cultural heritage sources (Common Corpus) and financial and administrative documents in open data (Finance Commons).
|
14 |
|
15 |
-
This release currently features a model based on llama-3-8b that has been the most tested to date. Future release will focus on smaller internal models that provides a better ratio of generation cost/quality.
|
16 |
|
17 |
OCRonos is generally faithful to what the original material, provides sensible restitution of deteriorated text and will rarely rewrite correct words. On highly deteriorated content, OCRonos can act as a synthetic rewriting tool rather than a strict correction tool.
|
18 |
|
|
|
12 |
|
13 |
OCROnos models are versatile tools supporting the correction of OCR errors, wrong word cut/merge and overall broken text structures. The training data includes a highly diverse set of ocrized texts in multiple languages from PleIAs open pre-training corpus, drawn from cultural heritage sources (Common Corpus) and financial and administrative documents in open data (Finance Commons).
|
14 |
|
15 |
+
This release currently features a model based on llama-3-8b that has been the most tested to date. The model was trained using HPC resources from GENCI–IDRIS (Grant 2023-AD011014736) on Jean-Zay. Future release will focus on smaller internal models that provides a better ratio of generation cost/quality.
|
16 |
|
17 |
OCRonos is generally faithful to what the original material, provides sensible restitution of deteriorated text and will rarely rewrite correct words. On highly deteriorated content, OCRonos can act as a synthetic rewriting tool rather than a strict correction tool.
|
18 |
|