LumiOpen
/

Poro-34B

@@ -69,7 +69,7 @@ model = transformers.AutoModelForCausalLM.from_pretrained(
 Poro was trained on the LUMI supercomputer, using 512 AMD MI250X GPUs. Each MI250X GPU has two Graphics Complex Dies (GCDs) for a world size of 1024 during training, using activation checkpointing, a micro batch size of 1, gradient accumulation of 16, and a 3D parallelism strategy of TP=2, PP=4, DP=128.
-Training began in September 2023 using a custom fork of the Megatron-Deepspeed framework. The code is available [here](https://github.com/TurkuNLP/Megatron-DeepSpeed).
 ## Training Hyperparameters

 Poro was trained on the LUMI supercomputer, using 512 AMD MI250X GPUs. Each MI250X GPU has two Graphics Complex Dies (GCDs) for a world size of 1024 during training, using activation checkpointing, a micro batch size of 1, gradient accumulation of 16, and a 3D parallelism strategy of TP=2, PP=4, DP=128.
+Training began in September 2023 using a custom fork of the Megatron-Deepspeed framework. Our code is available [here](https://github.com/TurkuNLP/Megatron-DeepSpeed).
 ## Training Hyperparameters