YikangS commited on
Commit
b94f300
•
1 Parent(s): 64210c3

update readme

Browse files
Files changed (1) hide show
  1. README.md +6 -7
README.md CHANGED
@@ -13,16 +13,15 @@ AutoConfig.register("moduleformer", ModuleFormerConfig)
13
  AutoModelForCausalLM.register(ModuleFormerConfig, ModuleFormerForCausalLM)
14
  AutoModelForSequenceClassification.register(ModuleFormerConfig, ModuleFormerForSequenceClassification)
15
 
16
- tokenizer = AutoTokenizer.from_pretrained('ibm/MoLM-350M-4B')
17
- model = AutoModelForCausalLM.from_pretrained('ibm/MoLM-350M-4B')
18
  ```
19
 
20
  **Model Details**
21
- MoLM-350M-4B is a MoE-based language models. It has 4 billion parameters, but each input token only use 350M parameteres during its inference. Thus, it's computationally equivelant to a 350M dense model.
22
- MoLM-700M-4B has 4 billion parameters and computationally equivelant to a 700M dense model.
23
- MoLM-700M-8B has 8 billion parameters and computationally equivelant to a 700M dense model.
24
- Both models are trained on 300 billion tokens from publicly available sources, with a learning rate of 3.0 x 10<sup>-4</sup> and a global batch-size of 3M tokens.
25
-
26
  **Model Developers** IBM
27
 
28
  **Variations** MoLM comes in two different parameter sizes — 4B and 8B. The 4B models has two variants with different computation cost — 350M and 700M.
 
13
  AutoModelForCausalLM.register(ModuleFormerConfig, ModuleFormerForCausalLM)
14
  AutoModelForSequenceClassification.register(ModuleFormerConfig, ModuleFormerForSequenceClassification)
15
 
16
+ tokenizer = AutoTokenizer.from_pretrained('ibm/MoLM-700M-8B')
17
+ model = AutoModelForCausalLM.from_pretrained('ibm/MoLM-700M-8B')
18
  ```
19
 
20
  **Model Details**
21
+ MoLM-350M-4B is a MoE-based language model. It has 4 billion parameters, but each input token only activates 350M parameters. Thus, it's computationally equivalent to a 350M dense model.
22
+ MoLM-700M-4B has 4 billion parameters and is computationally equivalent to a 700M dense model.
23
+ MoLM-700M-8B has 8 billion parameters and is computationally equivalent to a 700M dense model. All models are trained on 300 billion tokens from publicly available sources.
24
+ All models are trained on 300 billion tokens from publicly available sources, with a learning rate of 3.0 x 10<sup>-4</sup> and a global batch-size of 3M tokens.
 
25
  **Model Developers** IBM
26
 
27
  **Variations** MoLM comes in two different parameter sizes — 4B and 8B. The 4B models has two variants with different computation cost — 350M and 700M.