We are working on creating a single 22b from this model

#5
by rombodawg - opened

Currently me and a friend are attempting to remove 1 22b expert from this mixtral model to hopefully create it own mistral 22b parameter standalone model. If we succeed, would you like us to upload the weights to this hf account?

We already have code to do something similar, but we just need to adjust it slightly
https://github.com/MeNicefellow/Mixtral-Expert-Trimmer

Do you think it's possible to maybe create a 6x22b or 4x22b model to make it fit into 2x24gb cards better?

@CyberTimon Unfortunatly this is not possible without severly degrading the performance. The resulting model would basically be useless without fully retraining the router and possibly the entire model. So we are hoping by only removing 1 model and using it by itself it would work well as a standalone model without MoE

Ah that's unfortunate. But as far as I understand megablocks / MoE your experiment will also not work. 1 "expert" learns for example sentence positions or have more activations when asking history related facts etc so how are you planning to extract a "working" 22b model?

"how are you planning to extract a "working" 22b model?"

With alot of hope and prayer

Hi

A noobie question, Mistral have released the 8x22B model which is 260 GB (on torrent). So how can this be used for inference ? Does it require the entire model to be laoded into memory, and therefore > 260GB of RAM. Or is this model supposed to be used to create smaller models, that can then be used on normal desktops with decent GPU/RAM) ?

Mistral Community org

You can use the BnB 4bit quantized version:

https://huggingface.co/mistral-community/Mixtral-8x22B-v0.1-4bit

If you manage to grab 1 expert why not each of all 8? It's possible some kind of merge would make them more useful from there? (Or less useful!)

Mistral Community org

great idea

Mistral Community org

Currently me and a friend are attempting to remove 1 22b expert from this mixtral model to hopefully create it own mistral 22b parameter standalone model. If we succeed, would you like us to upload the weights to this hf account?

Just fyi, the author of MergeKit did something similar with mixtral 8x7b and each expert didn’t generate comprehensible text (see DeMixtral), also merging experts together didn’t work. So you might need to fine tune quite a bit to fix it

@mrfakename I was able to find demixtral, but couldn't find any reports of merging all the experts together. Can you help me find the source on failing to merge experts? Thanks in advance

Mistral Community org

Unfortunately also uninterpretable garbage. :( Maybe there's a merge technique that would make something work, but I haven't found one yet.

Thank you! (for future reference that was said by cg in this GH issue thread)

Looks like someone did it, but the model seems to lack knowledge
https://huggingface.co/Vezora/Mistral-22B-v0.1

Mistral Community org

Looks like someone did it, but the model seems to lack knowledge
https://huggingface.co/Vezora/Mistral-22B-v0.1

The model generated incomprehensible text so they QLoRA'd it and it became a usable model

Sign up or log in to comment