8x22B?

#4
by saishf - opened

I don't know if mistral will release it on hf because of this release, but there's a new model on their twitter? A 8x22b parameter model.

Announced nowhere else though?
Mystery model I don't have the compute for :3
Edit - messed up formatting

Yeah its over 100 billion params so that kinda takes it off the table for me, unless i run a very small imatrix quant

Yeah its over 100 billion params so that kinda takes it off the table for me, unless i run a very small imatrix quant

176B, I'm interested in how it ranks on lmsys leaderboard!

Its something like 40B active params I believe, but you need the memory for all of them to run it. I dont think theres a way to only load the active ones.

Its something like 40B active params I believe, but you need the memory for all of them to run it. I dont think theres a way to only load the active ones.

I'd be guessing a minimum of 256gb of ram + GPU vram for local running, probably more once context is included.

Yeah its over 100 billion params so that kinda takes it off the table for me, unless i run a very small imatrix quant

Unsure if you've seen it, you may be able to run this.
I don't know how it performs though
Vezora/Mistral-22B-v0.1

Sign up or log in to comment