Transnormer models
Collection
Byte-level seq2seq models that can normalize historical German spellings.
•
3 items
•
Updated
This model normalizes spelling variants in historical German text to the modern spelling. It is a fine-tuned version of google/byt5-small on a modified version of the DTA EvalCorpus (1780-1901).
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
from transformers.generation import GenerationConfig
tokenizer = AutoTokenizer.from_pretrained("ybracke/transnormer-19c-beta-v01")
model = AutoModelForSeq2SeqLM.from_pretrained("ybracke/transnormer-19c-beta-v01")
gen_cfg = GenerationConfig.from_model_config(model.config)
gen_cfg.max_new_tokens = 512
sentence = "Der Officier mußte ſich dazu setzen, man trank und ließ ſich’s wohl ſeyn."
inputs = tokenizer(sentence, return_tensors="pt",)
outputs = model.generate(**inputs, generation_config=gen_cfg)
print(tokenizer.batch_decode(outputs, skip_special_tokens=True))
# >>> ['Der Offizier musste sich dazusetzen, man trank und ließ sich es wohl sein.'
Here is how to use this model with the pipeline API:
from transformers import pipeline
transnormer = pipeline('text2text-generation', model='ybracke/transnormer-19c-beta-v01')
sentence = "Der Officier mußte ſich dazu setzen, man trank und ließ ſich’s wohl ſeyn."
print(transnormer(sentence))
# >>> [{'generated_text': 'Der Offizier musste sich dazusetzen, man trank und ließ sich es wohl sein.'}]
The following hyperparameters were used during training:
Base model
google/byt5-small