forrest-gradient commited on
Commit
32295b6
1 Parent(s): e77d64c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -2
README.md CHANGED
@@ -42,13 +42,12 @@ For training data, we generate long contexts by augmenting [SlimPajama](https://
42
  | Initialize From | Llama-3-70B-Instruct | 65K | 262K | 524K |
43
  | Sequence Length 2^N | 16 | 18 | 19 | 20 |
44
  | RoPE theta | 15296098 | 207112184 | 1062356830 | 3580165449 |
45
- | Batch Size | 1 | 1 | 1 | 1 |
46
  | Gradient Accumulation Steps | 1 | 1 | 2 | 4 |
47
  | Steps | 20 | 25 | 25 | 8 |
48
  | Total Tokens | 83886080 | 104857600 | 209715200 | 33554432 |
49
  | Learning rate | 2.00E-05 | 2.00E-05 | 2.00E-05 | 2.00E-05 |
50
  | # GPUs | 512 | 512 | 512 | 128 |
51
- | Ring parallelism | 64 | 16 | 8 | 1 |
52
  | GPU Type | NVIDIA L40S | NVIDIA L40S | NVIDIA L40S | NVIDIA L40S |
53
  | Minutes to Train (Wall)| 100 | 170 | 284 | 516 |
54
 
 
42
  | Initialize From | Llama-3-70B-Instruct | 65K | 262K | 524K |
43
  | Sequence Length 2^N | 16 | 18 | 19 | 20 |
44
  | RoPE theta | 15296098 | 207112184 | 1062356830 | 3580165449 |
45
+ | Batch Size | 64 | 16 | 8 | 1 |
46
  | Gradient Accumulation Steps | 1 | 1 | 2 | 4 |
47
  | Steps | 20 | 25 | 25 | 8 |
48
  | Total Tokens | 83886080 | 104857600 | 209715200 | 33554432 |
49
  | Learning rate | 2.00E-05 | 2.00E-05 | 2.00E-05 | 2.00E-05 |
50
  | # GPUs | 512 | 512 | 512 | 128 |
 
51
  | GPU Type | NVIDIA L40S | NVIDIA L40S | NVIDIA L40S | NVIDIA L40S |
52
  | Minutes to Train (Wall)| 100 | 170 | 284 | 516 |
53