A newer version of this model is available: ybracke/transnormer-19c-beta-v02

Transnormer 19th century (beta v01)

This model normalizes spelling variants in historical German text to the modern spelling. It is a fine-tuned version of google/byt5-small on a modified version of the DTA EvalCorpus (1780-1901).

Demo Usage

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
from transformers.generation import GenerationConfig

tokenizer = AutoTokenizer.from_pretrained("ybracke/transnormer-19c-beta-v01")
model = AutoModelForSeq2SeqLM.from_pretrained("ybracke/transnormer-19c-beta-v01")

gen_cfg = GenerationConfig.from_model_config(model.config)
gen_cfg.max_new_tokens = 512

sentence = "Der Officier mußte ſich dazu setzen, man trank und ließ ſich’s wohl ſeyn."
inputs = tokenizer(sentence, return_tensors="pt",)
outputs = model.generate(**inputs, generation_config=gen_cfg)
print(tokenizer.batch_decode(outputs, skip_special_tokens=True))
# >>> ['Der Offizier musste sich dazusetzen, man trank und ließ sich es wohl sein.'

Here is how to use this model with the pipeline API:

from transformers import pipeline

transnormer = pipeline('text2text-generation', model='ybracke/transnormer-19c-beta-v01')
sentence = "Der Officier mußte ſich dazu setzen, man trank und ließ ſich’s wohl ſeyn."
print(transnormer(sentence))
# >>> [{'generated_text': 'Der Offizier musste sich dazusetzen, man trank und ließ sich es wohl sein.'}]

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 32
  • eval_batch_size: 32
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 3.76

Framework versions

  • Transformers 4.31.0
  • Pytorch 2.1.0+cu121
  • Datasets 2.18.0
  • Tokenizers 0.13.3
Downloads last month
11
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for ybracke/transnormer-19c-beta-v01

Base model

google/byt5-small
Finetuned
(20)
this model