CausalLM
/

8x7B-MoE-test-NOT-MIXTRAL

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

JosephusCheung commited on Dec 16, 2023

Commit

755d331

•

1 Parent(s): f2cbd96

Update README.md

Files changed (1) hide show

README.md +3 -1

README.md CHANGED Viewed

@@ -8,4 +8,6 @@ There are 8 completely different expert models based on Qwen-7B / CausalLM, six
 The initialization of the gate is based on the hidden state of the few-shot prompt input from each expert model and undergoes simple alignment training.
-Prompt format: ChatML

 The initialization of the gate is based on the hidden state of the few-shot prompt input from each expert model and undergoes simple alignment training.
+Prompt format: ChatML
+A simple verification found that the expert model occasionally had routing errors, resulting in suboptimal results and required further fine-tuning.