WeightsnWizardry
commited on
Commit
•
73bed7a
1
Parent(s):
feb92bd
Update README.md
Browse files
README.md
CHANGED
@@ -96,7 +96,7 @@ Alfred-40B-0723 was trained on a mixture of publicly available and in-house cura
|
|
96 |
|
97 |
### Training Procedure
|
98 |
|
99 |
-
`Alfred-40B-0723` was trained on 128 A100 40GB GPUs, using a 3D parallelism strategy (TP=8, PP=4, DP=4) combined with ZeRO.
|
100 |
|
101 |
#### Preprocessing
|
102 |
|
|
|
96 |
|
97 |
### Training Procedure
|
98 |
|
99 |
+
`Alfred-40B-0723` was trained on 128 A100 40GB GPUs, using a 3D parallelism strategy (TP=8, PP=4, DP=4) combined with ZeRO. The value model is initialized from the reward model and does not have any shared parameters with the policy network.
|
100 |
|
101 |
#### Preprocessing
|
102 |
|