Replication of gpt2-wechsel-german
- trained with BigScience's DeepSpeed-Megatron-LM code base
- 22hrs on 4xA100 GPUs (~ 80 TFLOPs / GPU)
- stopped after 100k steps
- less than a single epoch on
oscar_unshuffled_deduplicated_de
(excluding validation set; original model was trained for 75 epochs on less data) - bf16
- zero stage 1
- tp/pp = 1
Evaluation
Model | PPL |
---|---|
gpt2-wechsel-german-ds-meg |
26.4 |
gpt2-wechsel-german |
26.8 |
gpt2 (retrained from scratch) |
27.63 |
License
MIT
- Downloads last month
- 11
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.