Maykeye commited on
Commit
a981b7e
1 Parent(s): d35fedc

Update README.md

Browse files

Also next morning I realized that Mamba module doesn't come with built-in normalization and residual as I thought,
which explains why BF16 attempts failed. Oops.

Files changed (1) hide show
  1. README.md +4 -0
README.md CHANGED
@@ -32,3 +32,7 @@ rather than last as if it was with [1, 1, 1, 0, 1, 1, 0, 0]. Intuition with that
32
  * I tried to use BF16 originally, but model went into nan (with default big LR) or gradients were so small weights didn't change(smaller LR). I switched back to F32, however some layers still initialize weight with factor x0.001 as I hoped it
33
  would stop model from going to nan.
34
 
 
 
 
 
 
32
  * I tried to use BF16 originally, but model went into nan (with default big LR) or gradients were so small weights didn't change(smaller LR). I switched back to F32, however some layers still initialize weight with factor x0.001 as I hoped it
33
  would stop model from going to nan.
34
 
35
+ --------
36
+
37
+ Also next morning I realized that Mamba module doesn't come with built-in normalization and residual as I thought,
38
+ which explains why BF16 attempts failed. Oops.