talman-fi commited on
Commit
e24b393
1 Parent(s): e9441f3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -69,7 +69,7 @@ model = transformers.AutoModelForCausalLM.from_pretrained(
69
 
70
  Poro was trained on the LUMI supercomputer, using 512 AMD MI250X GPUs. Each MI250X GPU has two Graphics Complex Dies (GCDs) for a world size of 1024 during training, using activation checkpointing, a micro batch size of 1, gradient accumulation of 16, and a 3D parallelism strategy of TP=2, PP=4, DP=128.
71
 
72
- Training began in September 2023 using a [custom fork](https://github.com/TurkuNLP/Megatron-DeepSpeed) of the Megatron-Deepspeed framework.
73
 
74
  ## Training Hyperparameters
75
 
 
69
 
70
  Poro was trained on the LUMI supercomputer, using 512 AMD MI250X GPUs. Each MI250X GPU has two Graphics Complex Dies (GCDs) for a world size of 1024 during training, using activation checkpointing, a micro batch size of 1, gradient accumulation of 16, and a 3D parallelism strategy of TP=2, PP=4, DP=128.
71
 
72
+ Training began in September 2023 using a custom fork of the Megatron-Deepspeed framework. The code is available [here](https://github.com/TurkuNLP/Megatron-DeepSpeed).
73
 
74
  ## Training Hyperparameters
75