Error loading Q5_K_M with ctransformers

#1
by george713 - opened

Hey @TheBloke ,

I ran into a problem using "mixtral-8x7b-instruct-v0.1-limarp-zloss-dare-ties.Q5_K_M.gguf" with ctransformers resulting in this error:

RuntimeError: Failed to create LLM 'mistral' from 'mixtral-8x7b-instruct-v0.1-limarp-zloss-dare-ties.Q5_K_M.gguf'.
Exchanging 'mixtral' for 'mistral' as model_type did not resolve this.

Could this be a similar error to the previous one here: https://huggingface.co/TheBloke/CausalLM-7B-GGUF/discussions/3

Best
George

@george713 I dont think ctransformers supports mixtral yet so use llama cpp or llama cpp python since that supports it and its considerably faster.

@george713 I dont think ctransformers supports mixtral yet so use llama cpp or llama cpp python since that supports it and its considerably faster.

I successfully used a dolphin-2.0-mistral model with ctransformers. Is that different from the 'mixtral' models?

@george713 yes very different
mixtral models actually are like a completely different architecture then mistral models.

mixtral is a moe(mixture of experts) where basically there are 8 experts(each 7b in size) and each token, 2 of them are chosen to do inference.
This results in 13b speed but takes the same vram as a 46b.

unlike mistral and llama models where its just 1 model instead of 8 different expert models.

thats why it says 8x7b. the actual original mixtral had 8 experts, each 7b in size.

Sign up or log in to comment