Edit model card

German GPT2-XL (1.5B)

  • trained with BigScience's DeepSpeed-Megatron-LM code base
  • word embedding initialized with WECHSEL and all other weights taken from English gpt2-xl
  • ~ 3 days on 16xA100 GPUs (~ 80 TFLOPs / GPU)
  • stopped after 100k steps
  • 26.2B tokens
  • less than a single epoch on oscar_unshuffled_deduplicated_de (excluding validation set; original model was trained for 75 epochs on less data)
  • bf16
  • zero stage 0
  • tp/pp = 1

How to use

You can use this model directly with a pipeline for text generation. Since the generation relies on some randomness, we set a seed for reproducibility:

>>> from transformers import pipeline, set_seed
>>> generator = pipeline('text-generation', model='malteos/gpt2-xl-wechsel-german')
>>> set_seed(42)
>>> generator("Hello, I'm a language model,", max_length=30, num_return_sequences=5)

[{'generated_text': "Hello, I'm a language model, a language for thinking, a language for expressing thoughts."},
 {'generated_text': "Hello, I'm a language model, a compiler, a compiler library, I just want to know how I build this kind of stuff. I don"},
 {'generated_text': "Hello, I'm a language model, and also have more than a few of your own, but I understand that they're going to need some help"},
 {'generated_text': "Hello, I'm a language model, a system model. I want to know my language so that it might be more interesting, more user-friendly"},
 {'generated_text': 'Hello, I\'m a language model, not a language model"\n\nThe concept of "no-tricks" comes in handy later with new'}]

Here is how to use this model to get the features of a given text in PyTorch:

from transformers import GPT2Tokenizer, GPT2Model
tokenizer = GPT2Tokenizer.from_pretrained('malteos/gpt2-xl-wechsel-german')
model = GPT2Model.from_pretrained('malteos/gpt2-xl-wechsel-german')
text = "Replace me by any text you'd like."
encoded_input = tokenizer(text, return_tensors='pt')
output = model(**encoded_input)

Evaluation

Model (size) PPL
gpt2-xl-wechsel-german (1.5B) 14.5
gpt2-wechsel-german-ds-meg (117M) 26.4
gpt2-wechsel-german (117M) 26.8
gpt2 (retrained from scratch) (117M) 27.63

Other German language models

License

MIT

Downloads last month
468
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.