YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

ByT5 Dutch OCR Correction

This model is a finetuned byT5 model that corrects OCR mistakes found in dutch sentences. The google/byt5-base model is finetuned on the dutch section of the OSCAR dataset.

Usage

from transformers import AutoTokenizer, T5ForConditionalGeneration

example_sentence = "Ben algoritme dat op ba8i8 van kunstmatige inte11i9entie vkijwel geautomatiseerd een tekst herstelt met OCR fuuten."

tokenizer = AutoTokenizer.from_pretrained('ml6team/byt5-base-dutch-ocr-correction')

model_inputs = tokenizer(example_sentence, max_length=128, truncation=True, return_tensors="pt")

model = T5ForConditionalGeneration.from_pretrained('ml6team/byt5-base-dutch-ocr-correction')
outputs = model.generate(**model_inputs, max_length=128)

tokenizer.decode(outputs[0])
Downloads last month
31
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Spaces using ml6team/byt5-base-dutch-ocr-correction 2