More model info

#1
by vimota - opened

Curious what this model actually is ? Sounds like a blend of Starling on Mixtral ?

It is a mixtral moe with 8 7b experts comprised of only starling-lm 7b mistral layers. I just finished uploading the GGUF files.

Oh interesting! Pardon my ignorance, but are all the starling-lm 7b mistral layers the exact same? Would moe make a difference in that case - or is it further trained after that?

According to my tests of 11b starling which is basically the first version, i noticed improved outputs. Based on those tests I made a MoE model that did the same thing, and this one shows even better generations. This can also be finetuned, for even more improved results. For now i just left it with extending the base layers, to not lose the hard work from berkeley.

Sign up or log in to comment