Update README.md
Browse files
README.md
CHANGED
@@ -69,7 +69,7 @@ model = transformers.AutoModelForCausalLM.from_pretrained(
|
|
69 |
|
70 |
Poro was trained on the LUMI supercomputer, using 512 AMD MI250X GPUs. Each MI250X GPU has two Graphics Complex Dies (GCDs) for a world size of 1024 during training, using activation checkpointing, a micro batch size of 1, gradient accumulation of 16, and a 3D parallelism strategy of TP=2, PP=4, DP=128.
|
71 |
|
72 |
-
Training began in September 2023 using a
|
73 |
|
74 |
## Training Hyperparameters
|
75 |
|
|
|
69 |
|
70 |
Poro was trained on the LUMI supercomputer, using 512 AMD MI250X GPUs. Each MI250X GPU has two Graphics Complex Dies (GCDs) for a world size of 1024 during training, using activation checkpointing, a micro batch size of 1, gradient accumulation of 16, and a 3D parallelism strategy of TP=2, PP=4, DP=128.
|
71 |
|
72 |
+
Training began in September 2023 using a custom fork of the Megatron-Deepspeed framework. The code is available [here](https://github.com/TurkuNLP/Megatron-DeepSpeed).
|
73 |
|
74 |
## Training Hyperparameters
|
75 |
|