Edit model card

Replication of gpt2-wechsel-german

  • trained with BigScience's DeepSpeed-Megatron-LM code base
  • 22hrs on 4xA100 GPUs (~ 80 TFLOPs / GPU)
  • stopped after 100k steps
  • less than a single epoch on oscar_unshuffled_deduplicated_de (excluding validation set; original model was trained for 75 epochs on less data)
  • bf16
  • zero stage 1
  • tp/pp = 1

Evaluation

Model PPL
gpt2-wechsel-german-ds-meg 26.4
gpt2-wechsel-german 26.8
gpt2 (retrained from scratch) 27.63

License

MIT

Downloads last month
11
Safetensors
Model size
137M params
Tensor type
F32
·
U8
·

Space using malteos/gpt2-wechsel-german-ds-meg 1