jonabur commited on
Commit
4118a0f
1 Parent(s): 564a58e

update note about GAS

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -59,7 +59,7 @@ model = transformers.AutoModelForCausalLM.from_pretrained(
59
 
60
  ## Training
61
 
62
- Poro was trained on the LUMI supercomputer, using 512 AMD MI250X GPUs. Each MI250X GPU has two Graphics Complex Dies (GCDs) for a world size of 1024 during training, using activation checkpointing, a micro batch size of 1, and a 3D parallelism strategy of TP=2, PP=4, DP=128.
63
 
64
  Training began in September 2023 using a [custom fork](https://github.com/TurkuNLP/Megatron-DeepSpeed) of the Megatron-Deepspeed framework.
65
 
@@ -117,4 +117,4 @@ Poro is an advanced language model, primarily optimized for English, Finnish and
117
 
118
  ## License
119
 
120
- Poro is released under the Apache 2.0 license.
 
59
 
60
  ## Training
61
 
62
+ Poro was trained on the LUMI supercomputer, using 512 AMD MI250X GPUs. Each MI250X GPU has two Graphics Complex Dies (GCDs) for a world size of 1024 during training, using activation checkpointing, a micro batch size of 1, gradient accumulation of 16, and a 3D parallelism strategy of TP=2, PP=4, DP=128.
63
 
64
  Training began in September 2023 using a [custom fork](https://github.com/TurkuNLP/Megatron-DeepSpeed) of the Megatron-Deepspeed framework.
65
 
 
117
 
118
  ## License
119
 
120
+ Poro is released under the Apache 2.0 license.