leo-pekelis-gradient commited on
Commit
b8050a4
1 Parent(s): 42476c1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -13
README.md CHANGED
@@ -37,19 +37,20 @@ For training data, we generate long contexts by augmenting [SlimPajama](https://
37
 
38
  **Progressive Training Details:**
39
 
40
- | Initialize From | 65K | 262K |
41
- |-------------------------|----------------------|------------|
42
- | Sequence Length 2^N | 16 | 18 |
43
- | RoPE Theta | 15,296,098 | 207,112,184|
44
- | Batch Size | 1 | 1 |
45
- | Gradient Accumulation Steps | 1 | 1 |
46
- | Steps | 20 | 25 |
47
- | Total Tokens | 83,886,080 | 104,857,600|
48
- | Learning Rate | 0.00002 | 0.00002 |
49
- | # GPUs | 512 | 512 |
50
- | Ring Parallelism | 64 | 16 |
51
- | GPU Type | NVIDIA L40S | NVIDIA L40S|
52
- | Minutes to Train (Wall) | 100 | 170 |
 
53
 
54
  **Evaluation Details:**
55
 
 
37
 
38
  **Progressive Training Details:**
39
 
40
+ | | 65K | 262K |
41
+ |--------------------------|-----------------|-----------------|
42
+ | Initialize From | Llama-3-70B-Instruct | 65K |
43
+ | Sequence Length 2^N | 16 | 18 |
44
+ | RoPE theta | 15,296,098 | 207,112,184 |
45
+ | Batch Size | 1 | 1 |
46
+ | Gradient Accumulation Steps | 1 | 1 |
47
+ | Steps | 20 | 25 |
48
+ | Total Tokens | 83,886,080 | 104,857,600 |
49
+ | Learning rate | 0.00002 | 0.00002 |
50
+ | # GPUs | 512 | 512 |
51
+ | Ring parallelism | 64 | 16 |
52
+ | GPU Type | NVIDIA L40S | NVIDIA L40S |
53
+ | Minutes to Train (Wall) | 100 | 170 |
54
 
55
  **Evaluation Details:**
56