leo-pekelis-gradient
commited on
Commit
•
7484062
1
Parent(s):
ea25e66
Update README.md
Browse files
README.md
CHANGED
@@ -37,21 +37,19 @@ For training data, we generate long contexts by augmenting [SlimPajama](https://
|
|
37 |
|
38 |
**Progressive Training Details:**
|
39 |
|
40 |
-
|
|
41 |
-
|
42 |
-
|
|
43 |
-
|
|
44 |
-
|
|
45 |
-
|
|
46 |
-
|
|
47 |
-
|
|
48 |
-
|
|
49 |
-
|
|
50 |
-
|
|
51 |
-
|
|
52 |
-
|
|
53 |
-
| Minutes to Train (Wall)| 100 | 170 | 284 |
|
54 |
-
|
55 |
|
56 |
**Evaluation Details:**
|
57 |
|
|
|
37 |
|
38 |
**Progressive Training Details:**
|
39 |
|
40 |
+
| Initialize From | 65K | 262K |
|
41 |
+
|-------------------------|----------------------|------------|
|
42 |
+
| Sequence Length 2^N | 16 | 18 |
|
43 |
+
| RoPE theta | 15,296,098 | 207,112,184|
|
44 |
+
| Batch Size | 1 | 1 |
|
45 |
+
| Gradient Accumulation Steps | 1 | 1 |
|
46 |
+
| Steps | 20 | 25 |
|
47 |
+
| Total Tokens | 83,886,080 | 104,857,600|
|
48 |
+
| Learning rate | 0.00002 | 0.00002 |
|
49 |
+
| # GPUs | 512 | 512 |
|
50 |
+
| Ring parallelism | 64 | 16 |
|
51 |
+
| GPU Type | NVIDIA L40S | NVIDIA L40S|
|
52 |
+
| Minutes to Train (Wall) | 100 | 170 |
|
|
|
|
|
53 |
|
54 |
**Evaluation Details:**
|
55 |
|