michaelfeil commited on
Commit
990790f
1 Parent(s): bdf8c7d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -12
README.md CHANGED
@@ -30,18 +30,17 @@ For training data, we generate long contexts by augmenting [SlimPajama](https://
30
 
31
  **Progressive Training Details:**
32
 
33
- | Parameter | 65K | 262K |
34
- |-----------------------------|------------|------------|
35
- | Initialize From | LLaMA-3 8B | 65K |
36
- | Sequence Length | 2^16 | 2^18 |
37
- | RoPE theta | 15.3 M | 207.1 M |
38
- | Batch Size | 1 | 1 |
39
- | Gradient Accumulation Steps | 32 | 16 |
40
- | Steps | 30 | 24 |
41
- | Total Tokens | 63 M | 101 M |
42
- | Learning Rate | 2.00E-05 | 2.00E-05 |
43
- | # GPUs | 32 | 32 |
44
- | GPU Type | NVIDIA L40S| NVIDIA L40S|
45
 
46
  ## The Gradient AI Team
47
 
 
30
 
31
  **Progressive Training Details:**
32
 
33
+ | Parameter | 65K | 262K |
34
+ |-----------------------------|----------------|------------|
35
+ | Initialize From | LLaMA-3-8B-Inst| 65K |
36
+ | Sequence Length | 2^16 | 2^18 |
37
+ | RoPE theta | 15.3 M | 207.1 M |
38
+ | Batch Size (Tokens / Step) | 2M | 4M |
39
+ | Steps | 30 | 24 |
40
+ | Total Tokens | 63 M | 101 M |
41
+ | Learning Rate | 2.00E-05 | 2.00E-05 |
42
+ | # GPUs | 32 | 32 |
43
+ | GPU Type | NVIDIA L40S | NVIDIA L40S|
 
44
 
45
  ## The Gradient AI Team
46