Update README.md
Browse files
README.md
CHANGED
@@ -56,6 +56,7 @@ The following table shows the hyper-paramters we used in our training process.
|
|
56 |
| Peak Learning Rate | 5e-5 |
|
57 |
| Batch Size | 4M |
|
58 |
| Weight Decay | 0.1 |
|
|
|
59 |
|
60 |
**Second phase**: We further adjusted the training corpus ratio, incorporating more domain-specific datasets (e.g., Math, Coding), and continued training for 50B tokens.
|
61 |
|
@@ -66,6 +67,7 @@ The following table shows the hyper-paramters we used in our training process.
|
|
66 |
| Peak Learning Rate | 5e-6 |
|
67 |
| Batch Size | 4M |
|
68 |
| Weight Decay | 0.01 |
|
|
|
69 |
|
70 |
## Performance Evaluation Results
|
71 |
|
|
|
56 |
| Peak Learning Rate | 5e-5 |
|
57 |
| Batch Size | 4M |
|
58 |
| Weight Decay | 0.1 |
|
59 |
+
| Context Length | 2k |
|
60 |
|
61 |
**Second phase**: We further adjusted the training corpus ratio, incorporating more domain-specific datasets (e.g., Math, Coding), and continued training for 50B tokens.
|
62 |
|
|
|
67 |
| Peak Learning Rate | 5e-6 |
|
68 |
| Batch Size | 4M |
|
69 |
| Weight Decay | 0.01 |
|
70 |
+
| Context Length | 4k |
|
71 |
|
72 |
## Performance Evaluation Results
|
73 |
|