ml6team
/

byt5-base-dutch-ocr-correction

Text2Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

byt5-base-dutch-ocr-correction / README.md

simondg

update README

c86f565 almost 3 years ago

|

history blame contribute delete

No virus

882 Bytes

	# ByT5 Dutch OCR Correction

	This model is a finetuned byT5 model that corrects OCR mistakes found in dutch sentences. The [google/byt5-base](https://huggingface.co/google/byt5-base) model is finetuned on the dutch section of the [OSCAR](https://huggingface.co/datasets/oscar) dataset.


	## Usage

	```python
	from transformers import AutoTokenizer, T5ForConditionalGeneration

	example_sentence = "Ben algoritme dat op ba8i8 van kunstmatige inte11i9entie vkijwel geautomatiseerd een tekst herstelt met OCR fuuten."

	tokenizer = AutoTokenizer.from_pretrained('ml6team/byt5-base-dutch-ocr-correction')

	model_inputs = tokenizer(example_sentence, max_length=128, truncation=True, return_tensors="pt")

	model = T5ForConditionalGeneration.from_pretrained('ml6team/byt5-base-dutch-ocr-correction')
	outputs = model.generate(**model_inputs, max_length=128)

	tokenizer.decode(outputs[0])
	```