JosephusCheung commited on
Commit
e3ce4d1
1 Parent(s): 23acf09

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -2
README.md CHANGED
@@ -2,8 +2,10 @@
2
  license: gpl-3.0
3
  ---
4
 
5
- Only intended for conceptual validation, however the expert models do not seem to be working as expected.
6
 
7
  There are 8 completely different expert models based on Qwen-7B / CausalLM, six of which are specific domain models that have seen 50~100 billion tokens, including: a Toolformer/Agent expert model, a multilingual translation expert model, a mathematics expert model, a visual expert model, a coding and computer expert model, and an unreviewed knowledge model — together forming the MoE model along with Qwen-Chat and Qwen-Base.
8
 
9
- The initialization of the gate is based on the hidden state of the few-shot prompt input from each expert model and undergoes simple alignment training.
 
 
 
2
  license: gpl-3.0
3
  ---
4
 
5
+ Only intended for conceptual validation, however the expert models do not seem to be working as expected. The model could output text and complete the conversation normally, but the performance of the expert model was not significant.
6
 
7
  There are 8 completely different expert models based on Qwen-7B / CausalLM, six of which are specific domain models that have seen 50~100 billion tokens, including: a Toolformer/Agent expert model, a multilingual translation expert model, a mathematics expert model, a visual expert model, a coding and computer expert model, and an unreviewed knowledge model — together forming the MoE model along with Qwen-Chat and Qwen-Base.
8
 
9
+ The initialization of the gate is based on the hidden state of the few-shot prompt input from each expert model and undergoes simple alignment training.
10
+
11
+ Prompt format: ChatML