update readme
Browse files
README.md
CHANGED
@@ -57,8 +57,6 @@ Each MoA and MoE layer has 8 expert, and 2 experts are activated for each input
|
|
57 |
It has 8 billion parameters in total and 2.2B active parameters.
|
58 |
JetMoE-8B is trained on 1.25T tokens from publicly available datasets, with a learning rate of 5.0 x 10<sup>-4</sup> and a global batch-size of 4M tokens.
|
59 |
|
60 |
-
**Model Developers** JetMoE is developed by Yikang Shen and MyShell.
|
61 |
-
|
62 |
**Input** Models input text only.
|
63 |
|
64 |
**Output** Models generate text only.
|
|
|
57 |
It has 8 billion parameters in total and 2.2B active parameters.
|
58 |
JetMoE-8B is trained on 1.25T tokens from publicly available datasets, with a learning rate of 5.0 x 10<sup>-4</sup> and a global batch-size of 4M tokens.
|
59 |
|
|
|
|
|
60 |
**Input** Models input text only.
|
61 |
|
62 |
**Output** Models generate text only.
|