More model info

by vimota - opened Dec 17, 2023

Discussion

vimota

Dec 17, 2023

Curious what this model actually is ? Sounds like a blend of Starling on Mixtral ?

perlthoughts

Owner Dec 17, 2023

•

edited Dec 17, 2023

It is a mixtral moe with 8 7b experts comprised of only starling-lm 7b mistral layers. I just finished uploading the GGUF files.

vimota

Dec 17, 2023

Oh interesting! Pardon my ignorance, but are all the starling-lm 7b mistral layers the exact same? Would moe make a difference in that case - or is it further trained after that?

perlthoughts

Owner Dec 17, 2023

According to my tests of 11b starling which is basically the first version, i noticed improved outputs. Based on those tests I made a MoE model that did the same thing, and this one shows even better generations. This can also be finetuned, for even more improved results. For now i just left it with extending the base layers, to not lose the hard work from berkeley.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment