YikangS commited on
Commit
e780836
1 Parent(s): f66a060

update config and readme

Browse files
Files changed (2) hide show
  1. README.md +3 -2
  2. config.json +3 -3
README.md CHANGED
@@ -79,7 +79,6 @@ AutoModelForSequenceClassification.register(JetMoEConfig, JetMoEForSequenceClass
79
  tokenizer = AutoTokenizer.from_pretrained('jetmoe/jetmoe-8b')
80
  model = AutoModelForCausalLM.from_pretrained('jetmoe/jetmoe-8b')
81
  ```
82
- The MoE code is based on the [ScatterMoE](https://github.com/shawntan/scattermoe). The code is still under active development, we are happy to receive any feedback or suggestions.
83
 
84
  ## Model Details
85
  JetMoE-8B has 24 blocks.
@@ -111,7 +110,9 @@ For more details, please refer to the JetMoE Technical Report (Coming Soon).
111
  ## JetMoE Model Index
112
  |Model|Index|
113
  |---|---|
114
- |JetMoE-8B| [Link](https://huggingface.co/jetmoe/jetmoe-8B) |
 
 
115
 
116
  ## Acknowledgement
117
  We express our gratitude to [Shengding Hu](https://shengdinghu.github.io/) for his valuable advice on the Phase 2 data mixture. We also express our gratitude to [Exabits](https://www.exabits.ai/) for their assistance in setting up the GPU clusters, and to [Lepton AI](https://www.lepton.ai/) for their support in setting up the chat demo.
 
79
  tokenizer = AutoTokenizer.from_pretrained('jetmoe/jetmoe-8b')
80
  model = AutoModelForCausalLM.from_pretrained('jetmoe/jetmoe-8b')
81
  ```
 
82
 
83
  ## Model Details
84
  JetMoE-8B has 24 blocks.
 
110
  ## JetMoE Model Index
111
  |Model|Index|
112
  |---|---|
113
+ |JetMoE-8B-Base| [Link](https://huggingface.co/jetmoe/jetmoe-8B) |
114
+ |JetMoE-8B-SFT| [Link](https://huggingface.co/jetmoe/jetmoe-8B-sft) |
115
+ |JetMoE-8B-Chat| [Link](https://huggingface.co/jetmoe/jetmoe-8B-chat) |
116
 
117
  ## Acknowledgement
118
  We express our gratitude to [Shengding Hu](https://shengdinghu.github.io/) for his valuable advice on the Phase 2 data mixture. We also express our gratitude to [Exabits](https://www.exabits.ai/) for their assistance in setting up the GPU clusters, and to [Lepton AI](https://www.lepton.ai/) for their support in setting up the chat demo.
config.json CHANGED
@@ -12,10 +12,10 @@
12
  "length_penalty": 1.0,
13
  "moe_num_experts": 8,
14
  "moe_top_k": 2,
15
- "n_embd": 2048,
16
- "n_layer": 24,
17
  "n_positions": 4096,
18
- "n_head": 16,
19
  "num_key_value_heads": 16,
20
  "num_layers": 24,
21
  "rms_norm_eps": 1e-05,
 
12
  "length_penalty": 1.0,
13
  "moe_num_experts": 8,
14
  "moe_top_k": 2,
15
+ "hidden_size": 2048,
16
+ "num_hidden_layers": 24,
17
  "n_positions": 4096,
18
+ "num_attention_heads": 32,
19
  "num_key_value_heads": 16,
20
  "num_layers": 24,
21
  "rms_norm_eps": 1e-05,