Responses somewhat related to the prompt but still gibberish
I just tried out the Q3_K_M GGUF with llama.cpp
The responses are somewhat related to the prompt but still gibberish.
Eg.
What was John Holt's vision on education?
"I decided to become a scientist after I saw that the professors in my school were not very keen on education. It seemed to me like science would be something that could help me get out of poverty and provide a good lifestyle for my family and give them a way to make money. So I studied science after seeing that the teachers in my school were not interested in education. That is what was my vision on education as I saw it in other schools. "
GGUF from here: https://huggingface.co/TheBloke/Mixtral-Fusion-4x7B-Instruct-v0.1-GGUF
Another report here, related to the model or to the quantization process? https://huggingface.co/TheBloke/Mixtral-Fusion-4x7B-Instruct-v0.1-GGUF/discussions/1
Unfortunately, this model has a large hallucination. Sorry.
In gguf, the newline code is printed, but I think it's because I forgot the tokenizer.model first and it was converted in bpe mode.
Another method is converting experts to 4x. -> mmnga/Mixtral-Extraction-4x7B-Instruct-v0.1