Update README.md
Browse files
README.md
CHANGED
@@ -24,6 +24,10 @@ The GOAT-70B-Storytelling model has been developed as an integral component with
|
|
24 |
- **License:** llama2
|
25 |
- **Context window length:** 4096 tokens
|
26 |
|
|
|
|
|
|
|
|
|
27 |
### Learn more
|
28 |
|
29 |
- **Blog:** TBA
|
|
|
24 |
- **License:** llama2
|
25 |
- **Context window length:** 4096 tokens
|
26 |
|
27 |
+
### Training details
|
28 |
+
|
29 |
+
For training, we apply the standard recipe with learning rate 1e-5, batch size per GPU 6, optimizer AdamW without weight decay and we train the model via ZeRO-3 on 64xH100 GPU cluster
|
30 |
+
|
31 |
### Learn more
|
32 |
|
33 |
- **Blog:** TBA
|