Edit model card

Replication of gpt2-wechsel-german

  • trained with BigScience's DeepSpeed-Megatron-LM code base
  • 22hrs on 4xA100 GPUs (~ 80 TFLOPs / GPU)
  • stopped after 100k steps
  • less than a single epoch on oscar_unshuffled_deduplicated_de (excluding validation set; original model was trained for 75 epochs on less data)
  • bf16
  • zero stage 1
  • tp/pp = 1

Evaluation

Model PPL
gpt2-wechsel-german-ds-meg 26.4
gpt2-wechsel-german 26.8
gpt2 (retrained from scratch) 27.63

License

MIT

Downloads last month
18
Safetensors
Model size
137M params
Tensor type
F32
·
U8
·
Inference API
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.