Update README.md
Browse filesAlso next morning I realized that Mamba module doesn't come with built-in normalization and residual as I thought,
which explains why BF16 attempts failed. Oops.
README.md
CHANGED
@@ -32,3 +32,7 @@ rather than last as if it was with [1, 1, 1, 0, 1, 1, 0, 0]. Intuition with that
|
|
32 |
* I tried to use BF16 originally, but model went into nan (with default big LR) or gradients were so small weights didn't change(smaller LR). I switched back to F32, however some layers still initialize weight with factor x0.001 as I hoped it
|
33 |
would stop model from going to nan.
|
34 |
|
|
|
|
|
|
|
|
|
|
32 |
* I tried to use BF16 originally, but model went into nan (with default big LR) or gradients were so small weights didn't change(smaller LR). I switched back to F32, however some layers still initialize weight with factor x0.001 as I hoped it
|
33 |
would stop model from going to nan.
|
34 |
|
35 |
+
--------
|
36 |
+
|
37 |
+
Also next morning I realized that Mamba module doesn't come with built-in normalization and residual as I thought,
|
38 |
+
which explains why BF16 attempts failed. Oops.
|