Commit
•
46797f2
1
Parent(s):
5191834
Update Report/REPORT.md
Browse files- Report/REPORT.md +1 -1
Report/REPORT.md
CHANGED
@@ -40,7 +40,7 @@ Together, this reduced the dictionary size to 28 by eliminating tokens for '0',
|
|
40 |
|
41 |
This more efficient representation allowed for longer games within the context size and more efficient training.
|
42 |
|
43 |
-
## 3.
|
44 |
|
45 |
In the course of developing the Mamba 50M model, a series of experiments were conducted to determine the optimal configuration of d_state, d_model, and layer count. These experiments were critical in understanding the architecture's performance and informed the development of three versions of the Mamba 50M model.
|
46 |
|
|
|
40 |
|
41 |
This more efficient representation allowed for longer games within the context size and more efficient training.
|
42 |
|
43 |
+
## 3. Model Configuration Experiments and Scaling Analysis
|
44 |
|
45 |
In the course of developing the Mamba 50M model, a series of experiments were conducted to determine the optimal configuration of d_state, d_model, and layer count. These experiments were critical in understanding the architecture's performance and informed the development of three versions of the Mamba 50M model.
|
46 |
|