What model is this?

#1
by rjmehta - opened

I see this is picking a lot of attention. Is this a new Mistral architecture release or a new model tuned on a different dataset?

You mean Mixtral in general, or v2 specifically?

v2 is a fine tune by DiscoResearch

Mixtral is a 7B x 8 MOE model stealth released by MistralAI on Friday via BitTorrent. It will be officially uploaded by them soon, I assume. In the meantime they provided just the weights with no code, which only works with one specific inference framework and not with general HF Transformers. But DiscoResearch did the work to get it working with Hugging Face before that, and I quantised it in my other Mixtral repo.

Then Disco have done a fine tuning of it, which is this v2 model here.

Much appreciated for all you do @TheBloke .

Sign up or log in to comment