sedrickkeh
commited on
Commit
•
bca90b5
1
Parent(s):
f945251
Update README.md
Browse files
README.md
CHANGED
@@ -100,7 +100,6 @@ We follow their training recipe and release our version of Mamba-7B.
|
|
100 |
| Optimizer | AdamW |
|
101 |
| Learning rate | 3e-4 |
|
102 |
| LR cooldown end | 1e-5 |
|
103 |
-
| QK-norm | False |
|
104 |
| Warmup steps | 2000 |
|
105 |
| Z-loss | 1e-4 |
|
106 |
| Batch size | 2M |
|
|
|
100 |
| Optimizer | AdamW |
|
101 |
| Learning rate | 3e-4 |
|
102 |
| LR cooldown end | 1e-5 |
|
|
|
103 |
| Warmup steps | 2000 |
|
104 |
| Z-loss | 1e-4 |
|
105 |
| Batch size | 2M |
|