# ByT5 Dutch OCR correction 

This model is a finetuned byT5 model that corrects OCR mistakes found in dutch sentences. The [google/byt5-base](https://huggingface.co/google/byt5-base) model is finetuned on the dutch section of the [OSCAR](https://huggingface.co/datasets/oscar) dataset. 


## Usage

```python
from transformers import AutoTokenizer, T5ForConditionalGeneration

example_sentence = "Een algoritme dat op basis van kunstmatige inte11i9entie vkijwe1 geautomatiseerd een Nederlandstalige tekst samenstelt."

tokenizer = AutoTokenizer.from_pretrained('ml6team/byt5-small-dutch-ocr-correction')

model_inputs = tokenizer(example_sentence, max_length=128, truncation=True, return_tensors="pt")

model = T5ForConditionalGeneration.from_pretrained('ml6team/byt5-small-dutch-ocr-correction')
outputs = model.generate(**model_inputs, max_length=128)

tokenizer.decode(outputs[0])
```