Commit
•
755d331
1
Parent(s):
f2cbd96
Update README.md
Browse files
README.md
CHANGED
@@ -8,4 +8,6 @@ There are 8 completely different expert models based on Qwen-7B / CausalLM, six
|
|
8 |
|
9 |
The initialization of the gate is based on the hidden state of the few-shot prompt input from each expert model and undergoes simple alignment training.
|
10 |
|
11 |
-
Prompt format: ChatML
|
|
|
|
|
|
8 |
|
9 |
The initialization of the gate is based on the hidden state of the few-shot prompt input from each expert model and undergoes simple alignment training.
|
10 |
|
11 |
+
Prompt format: ChatML
|
12 |
+
|
13 |
+
A simple verification found that the expert model occasionally had routing errors, resulting in suboptimal results and required further fine-tuning.
|