Error loading Q5_K_M with ctransformers
Hey @TheBloke ,
I ran into a problem using "mixtral-8x7b-instruct-v0.1-limarp-zloss-dare-ties.Q5_K_M.gguf" with ctransformers
resulting in this error:
RuntimeError: Failed to create LLM 'mistral' from 'mixtral-8x7b-instruct-v0.1-limarp-zloss-dare-ties.Q5_K_M.gguf'.
Exchanging 'mixtral' for 'mistral' as model_type did not resolve this.
Could this be a similar error to the previous one here: https://huggingface.co/TheBloke/CausalLM-7B-GGUF/discussions/3
Best
George
@george713 I dont think ctransformers supports mixtral yet so use llama cpp or llama cpp python since that supports it and its considerably faster.
@george713 I dont think ctransformers supports mixtral yet so use llama cpp or llama cpp python since that supports it and its considerably faster.
I successfully used a dolphin-2.0-mistral
model with ctransformers. Is that different from the 'mixtral' models?
@george713
yes very different
mixtral models actually are like a completely different architecture then mistral models.
mixtral is a moe(mixture of experts) where basically there are 8 experts(each 7b in size) and each token, 2 of them are chosen to do inference.
This results in 13b speed but takes the same vram as a 46b.
unlike mistral and llama models where its just 1 model instead of 8 different expert models.
thats why it says 8x7b. the actual original mixtral had 8 experts, each 7b in size.