Update README.md
Browse files
README.md
CHANGED
@@ -49,7 +49,7 @@ If all our tokens are sent to just a few popular experts, that will make trainin
|
|
49 |
|
50 |
|
51 |
## "Wait...but you called this a frankenMoE?"
|
52 |
-
The difference between MoE and "frankenMoE" lies in the fact that the router layer in a model like the one on this repo is not trained simultaneously. There are rumors about someone developing a way for us to unscuff these frankenMoE models by training the router layer simultaneously. For now, frankenMoE remains psychotic.
|
53 |
|
54 |
## "Are there at least any datasets or plans for this model, in any way?"
|
55 |
There are many datasets included as a result of merging four models...for one, Silicon Maid is a merge of xDan which is trained on the [OpenOrca Dataset](https://huggingface.co/datasets/Open-Orca/OpenOrca) and the [OpenOrca DPO pairs](https://huggingface.co/datasets/Intel/orca_dpo_pairs). Loyal-Macaroni-Maid uses OpenChat-3.5, Starling and NeuralChat which has so many datasets I'm not going to list them all here. Dolphin 2.6 Mistral also has a large variety of datasets. Panda-7B-v0.1 was fine tuned by the person collaborating on this project with me using a base mistral and a private dataset. Panda gives the model the creativity it has while the rest act as support.
|
|
|
49 |
|
50 |
|
51 |
## "Wait...but you called this a frankenMoE?"
|
52 |
+
The difference between MoE and "frankenMoE" lies in the fact that the router layer in a model like the one on this repo is not trained simultaneously. There are rumors about someone developing a way for us to unscuff these frankenMoE models by training the router layer simultaneously. For now, frankenMoE remains psychotic. This model does exceedingly well by FrankenMoE standards, however.
|
53 |
|
54 |
## "Are there at least any datasets or plans for this model, in any way?"
|
55 |
There are many datasets included as a result of merging four models...for one, Silicon Maid is a merge of xDan which is trained on the [OpenOrca Dataset](https://huggingface.co/datasets/Open-Orca/OpenOrca) and the [OpenOrca DPO pairs](https://huggingface.co/datasets/Intel/orca_dpo_pairs). Loyal-Macaroni-Maid uses OpenChat-3.5, Starling and NeuralChat which has so many datasets I'm not going to list them all here. Dolphin 2.6 Mistral also has a large variety of datasets. Panda-7B-v0.1 was fine tuned by the person collaborating on this project with me using a base mistral and a private dataset. Panda gives the model the creativity it has while the rest act as support.
|