Kquant03
/

CognitiveFusion-4x7B-bf16-MoE

Text Generation

Mixture of Experts

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Kquant03 commited on Jan 4

Commit

b704158

•

1 Parent(s): b376a35

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -49,7 +49,7 @@ If all our tokens are sent to just a few popular experts, that will make trainin
 ## "Wait...but you called this a frankenMoE?"
-The difference between MoE and "frankenMoE" lies in the fact that the router layer in a model like the one on this repo is not trained simultaneously. There are rumors about someone developing a way for us to unscuff these frankenMoE models by training the router layer simultaneously. For now, frankenMoE remains psychotic. Raiden does improve upon the base heegyu/WizardVicuna-Uncensored-3B-0719, though.
 ## "Are there at least any datasets or plans for this model, in any way?"
 There are many datasets included as a result of merging four models...for one, Silicon Maid is a merge of xDan which is trained on the [OpenOrca Dataset](https://huggingface.co/datasets/Open-Orca/OpenOrca) and the [OpenOrca DPO pairs](https://huggingface.co/datasets/Intel/orca_dpo_pairs). Loyal-Macaroni-Maid uses OpenChat-3.5, Starling and NeuralChat which has so many datasets I'm not going to list them all here. Dolphin 2.6 Mistral also has a large variety of datasets. Panda-7B-v0.1 was fine tuned by the person collaborating on this project with me using a base mistral and a private dataset. Panda gives the model the creativity it has while the rest act as support.

 ## "Wait...but you called this a frankenMoE?"
+The difference between MoE and "frankenMoE" lies in the fact that the router layer in a model like the one on this repo is not trained simultaneously. There are rumors about someone developing a way for us to unscuff these frankenMoE models by training the router layer simultaneously. For now, frankenMoE remains psychotic. This model does exceedingly well by FrankenMoE standards, however.
 ## "Are there at least any datasets or plans for this model, in any way?"
 There are many datasets included as a result of merging four models...for one, Silicon Maid is a merge of xDan which is trained on the [OpenOrca Dataset](https://huggingface.co/datasets/Open-Orca/OpenOrca) and the [OpenOrca DPO pairs](https://huggingface.co/datasets/Intel/orca_dpo_pairs). Loyal-Macaroni-Maid uses OpenChat-3.5, Starling and NeuralChat which has so many datasets I'm not going to list them all here. Dolphin 2.6 Mistral also has a large variety of datasets. Panda-7B-v0.1 was fine tuned by the person collaborating on this project with me using a base mistral and a private dataset. Panda gives the model the creativity it has while the rest act as support.