Full-text search
482 results
viklofg / swedish-ocr-correction
README.md
model
6 matches
![](https://cdn-avatars.huggingface.co/v1/production/uploads/1607088946666-5f44b7cb79c1ba4c353d0fbb.png)
ml6team / byt5-base-dutch-ocr-correction
README.md
model
4 matches
tags:
transformers, pytorch, t5, text2text-generation, autotrain_compatible, text-generation-inference, endpoints_compatible, region:us
1
2
3
4
5
utch OCR Correction
This model is a finetuned byT5 model that corrects OCR mistakes found in dutch sentences. The [google/byt5-base](https://huggingface.co/google/byt5-base) model is finetuned on the dutch section of the [OSCAR](https://huggingface.co/datasets/oscar) dataset.
yelpfeast / byt5-base-english-ocr-correction
README.md
model
14 matches
tags:
transformers, pytorch, t5, text2text-generation, en, dataset:wikitext, arxiv:2105.13626, autotrain_compatible, text-generation-inference, endpoints_compatible, region:us
7
8
9
10
11
for OCR Correction
This model is a fine-tuned version of the [byt5-base](https://huggingface.co/google/byt5-base) for OCR Correction. ByT5 was
introduced in [this paper](https://arxiv.org/abs/2105.13626) and the idea and code for fine-tuning the model for OCR Correction was taken from [here](https://blog.ml6.eu/ocr-correction-with-byt5-5994d1217c07).
![](https://cdn-avatars.huggingface.co/v1/production/uploads/64ce091a9e9ca8123d7a42b0/vCajdXlzRs0zAU-b_KJ_G.png)
PleIAs / OCRonos
README.md
model
10 matches
tags:
transformers, safetensors, llama, text-generation, conversational, fr, en, de, es, it, license:apache-2.0, autotrain_compatible, text-generation-inference, endpoints_compatible, region:us
11
12
13
14
15
the correction of badly digitized texts, as part of the **Bad Data Toolbox**.
OCROnos models are versatile tools supporting the correction of OCR errors, wrong word cut/merge and overall broken text structures. The training data includes a highly diverse set of ocrized texts in multiple languages from PleIAs open pre-training corpus, drawn from cultural heritage sources (Common Corpus) and financial and administrative documents in open data (Finance Commons).
This release currently features a model based on llama-3-8b that has been the most tested to date. The model was trained using HPC resources from GENCI–IDRIS (Grant 2023-AD011014736) on Jean-Zay. Future release will focus on smaller internal models that provides a better ratio of generation cost/quality.
![](https://cdn-avatars.huggingface.co/v1/production/uploads/65215ce89004117947fff867/mE362pR9LyB1A-E8oAdAY.jpeg)
pykale / bart-base-ocr
README.md
model
7 matches
tags:
transformers, safetensors, bart, text2text-generation, en, license:mit, autotrain_compatible, endpoints_compatible, region:us
8
9
10
11
12
-OCR Correction of Historical Newspapers](https://aclanthology.org/2024.lt4hala-1.14/) and designed to correct OCR text. [BART-base](https://huggingface.co/facebook/bart-base) is fine-tuned for post-OCR correction of historical English, using [BLN600](https://aclanthology.org/2024.lrec-main.219/), a parallel corpus of 19th century newspaper machine/human transcription.
## Usage
```python
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline
![](https://cdn-avatars.huggingface.co/v1/production/uploads/65215ce89004117947fff867/mE362pR9LyB1A-E8oAdAY.jpeg)
pykale / bart-large-ocr
README.md
model
7 matches
tags:
transformers, safetensors, bart, text2text-generation, en, license:mit, autotrain_compatible, endpoints_compatible, region:us
8
9
10
11
12
-OCR Correction of Historical Newspapers](https://aclanthology.org/2024.lt4hala-1.14/) and designed to correct OCR text. [BART-large](https://huggingface.co/facebook/bart-large) is fine-tuned for post-OCR correction of historical English, using [BLN600](https://aclanthology.org/2024.lrec-main.219/), a parallel corpus of 19th century newspaper machine/human transcription.
## Usage
```python
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline
![](https://cdn-avatars.huggingface.co/v1/production/uploads/65215ce89004117947fff867/mE362pR9LyB1A-E8oAdAY.jpeg)
pykale / llama-2-7b-ocr
README.md
model
8 matches
tags:
peft, safetensors, en, base_model:meta-llama/Llama-2-7b-hf, license:mit, region:us
9
10
11
12
13
-OCR Correction of Historical Newspapers](https://aclanthology.org/2024.lt4hala-1.14/) and designed to correct OCR text. [Llama 2 7B](https://huggingface.co/meta-llama/Llama-2-7b-hf) is instruction-tuned for post-OCR correction of historical English, using [BLN600](https://aclanthology.org/2024.lrec-main.219/), a parallel corpus of 19th century newspaper machine/human transcription.
## Usage
```python
from peft import AutoPeftModelForCausalLM
![](https://cdn-avatars.huggingface.co/v1/production/uploads/65215ce89004117947fff867/mE362pR9LyB1A-E8oAdAY.jpeg)
pykale / llama-2-13b-ocr
README.md
model
8 matches
tags:
peft, safetensors, en, base_model:meta-llama/Llama-2-13b-hf, license:mit, region:us
9
10
11
12
13
-OCR Correction of Historical Newspapers](https://aclanthology.org/2024.lt4hala-1.14/) and designed to correct OCR text. [Llama 2 13B](https://huggingface.co/meta-llama/Llama-2-13b-hf) is instruction-tuned for post-OCR correction of historical English, using [BLN600](https://aclanthology.org/2024.lrec-main.219/), a parallel corpus of 19th century newspaper machine/human transcription.
## Usage
```python
from peft import AutoPeftModelForCausalLM
![](https://cdn-avatars.huggingface.co/v1/production/uploads/1593016943046-noauth.jpeg)
versae / filiberto-7B-instruct-exp1
README.md
model
3 matches
![](https://cdn-avatars.huggingface.co/v1/production/uploads/1664733328468-60cb82cc00df405fe5430a70.jpeg)
jvdzwaan / ocrpostcorrection-task-1
README.md
model
10 matches
tags:
transformers, pytorch, bert, token-classification, post-ocr correction, ocr postcorrection, bg, cs, de, en, es, fi, fr, nl, pl, sl, multilingual, autotrain_compatible, endpoints_compatible, region:us
22
23
24
25
26
# OCR postcorrection task 1
This is a BertForTokenClassification model that predicts whether a token is an OCR
mistake or not. It is based on [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased)
and finetuned on the dataset of the
![](https://cdn-avatars.huggingface.co/v1/production/uploads/64f1bf6a8b550e875926a590/xdZHPQGdI2jISWcKhWTMQ.png)
DeepMount00 / OCR_corrector
README.md
model
8 matches
tags:
transformers, safetensors, t5, text2text-generation, it, license:apache-2.0, autotrain_compatible, text-generation-inference, endpoints_compatible, region:us
7
8
9
10
11
lian OCR Error Correction Sequence-to-Sequence Model
## Model Details
This model represents the first version of an experimental sequence-to-sequence architecture designed specifically for the Italian language. It aims to correct approximately 93% of the errors generated by low-quality Optical Character Recognition (OCR) systems, which tend to perform poorly on Italian text. By taking raw, OCR-scanned text as input, the model outputs the corrected version of the text, significantly reducing errors and improving readability and accuracy.
![](https://cdn-avatars.huggingface.co/v1/production/uploads/1654090481550-60f2e021adf471cbdf8bb660.jpeg)
manu / ocr_correction
README.md
model
2 matches
Var3n / hmByT5_anno
README.md
model
2 matches
tags:
transformers, pytorch, t5, text2text-generation, ByT5, historical, ocr-correction, de, license:mit, autotrain_compatible, text-generation-inference, endpoints_compatible, region:us
15
16
17
18
19
rect OCR mistakes. The max_length was set to 350.
## Performance
```
SacreBLEU eval dataset: 10.83
![](https://cdn-avatars.huggingface.co/v1/production/uploads/64ce091a9e9ca8123d7a42b0/vCajdXlzRs0zAU-b_KJ_G.png)
PleIAs / Segmentext
README.md
model
2 matches
Thang203 / general_nlp_research_paper
README.md
model
10 matches
tags:
bertopic, text-classification, region:us
99
100
101
102
103
rror correction - error correction - correction | 40 | 57_gec_grammatical error_grammatical error correction_error correction |
| 58 | intent - intent detection - slot - slot filling - filling | 40 | 58_intent_intent detection_slot_slot filling |
| 59 | temporal - events - temporal relations - expressions - temporal relation | 39 | 59_temporal_events_temporal relations_expressions |
| 60 | adaptation - domain - domain adaptation - indomain - translation | 37 | 60_adaptation_domain_domain adaptation_indomain |
| 61 | stance - stance detection - detection - tweets - veracity | 37 | 61_stance_stance detection_detection_tweets |
![](https://cdn-avatars.huggingface.co/v1/production/uploads/605a1630f70acedea1f74abc/ZtM4Tp3zlpcIDCmfFMJEv.png)
slone / canine-c-bashkir-gec-v1
README.md
model
4 matches
tags:
transformers, pytorch, canine, token-classification, grammatical error correction, ba, license:apache-2.0, autotrain_compatible, endpoints_compatible, region:us
9
10
11
12
13
ling Correction v1
This model is a version of [google/canine-c](https://huggingface.co/openai/whisper-small) fine-tuned to fix corrupted texts.
It was trained on a mixture of two parallel datasets in the Bashkir language:
- sentences post-edited by humans after OCR
![](https://cdn-avatars.huggingface.co/v1/production/uploads/60bccec062080d33f875cd0c/KvEhYxx9-Tff_Qb7PsjAL.png)
pszemraj / grammar-synthesis-large
README.md
model
11 matches
tags:
transformers, pytorch, safetensors, t5, text2text-generation, grammar, spelling, punctuation, error-correction, grammar synthesis, dataset:jfleg, arxiv:2107.06751, license:cc-by-nc-sa-4.0, license:apache-2.0, autotrain_compatible, text-generation-inference, endpoints_compatible, region:us
54
55
56
57
58
mmar correction on an expanded version of the [JFLEG](https://paperswithcode.com/dataset/jfleg) dataset.
usage in Python (after `pip install transformers`):
```python
![](https://cdn-avatars.huggingface.co/v1/production/uploads/60bccec062080d33f875cd0c/KvEhYxx9-Tff_Qb7PsjAL.png)
pszemraj / grammar-synthesis-base
README.md
model
10 matches
tags:
transformers, pytorch, safetensors, t5, text2text-generation, grammar, spelling, punctuation, error-correction, grammar synthesis, dataset:jfleg, arxiv:2107.06751, license:cc-by-nc-sa-4.0, autotrain_compatible, text-generation-inference, endpoints_compatible, region:us
45
46
47
48
49
mmar correction on an expanded version of the [JFLEG](https://paperswithcode.com/dataset/jfleg) dataset. Check out a [demo notebook on Colab here](https://colab.research.google.com/gist/pszemraj/91abb08aa99a14d9fdc59e851e8aed66/demo-for-grammar-synthesis-base.ipynb).
usage in Python (after `pip install transformers`):
```python
![](https://cdn-avatars.huggingface.co/v1/production/uploads/60bccec062080d33f875cd0c/KvEhYxx9-Tff_Qb7PsjAL.png)
pszemraj / grammar-synthesis-small
README.md
model
10 matches
tags:
transformers, pytorch, onnx, safetensors, t5, text2text-generation, grammar, spelling, punctuation, error-correction, grammar synthesis, FLAN, dataset:jfleg, arxiv:2107.06751, license:cc-by-nc-sa-4.0, license:apache-2.0, autotrain_compatible, endpoints_compatible, text-generation-inference, region:us
57
58
59
60
61
mmar correction on an expanded version of the [JFLEG](https://paperswithcode.com/dataset/jfleg) dataset.
usage in Python (after `pip install transformers`):
```python
![](https://cdn-avatars.huggingface.co/v1/production/uploads/60bccec062080d33f875cd0c/KvEhYxx9-Tff_Qb7PsjAL.png)
pszemraj / flan-t5-large-grammar-synthesis
README.md
model
12 matches
tags:
transformers, pytorch, onnx, safetensors, t5, text2text-generation, grammar, spelling, punctuation, error-correction, grammar synthesis, FLAN, dataset:jfleg, arxiv:2107.06751, doi:10.57967/hf/0138, license:cc-by-nc-sa-4.0, license:apache-2.0, autotrain_compatible, endpoints_compatible, text-generation-inference, region:us
62
63
64
65
66
mmar correction on an expanded version of the [JFLEG](https://paperswithcode.com/dataset/jfleg) dataset. [Demo](https://huggingface.co/spaces/pszemraj/FLAN-grammar-correction) on HF spaces.
## Example
![example](https://i.imgur.com/PIhrc7E.png)