Text Generation
Transformers
PyTorch
English
mixtral
conversational
Inference Endpoints
text-generation-inference

Odd repeating behaviour on Q8 GGUF

#3
by FiditeNemini - opened
Cognitive Computations org

Hi,
I've noticed that the model's output seems to repeat the last sentence infinitely, or just output random words in the final sentence, every one or two interactions. This behaviour seems to be specific to the Dolphin 2.6 Mixtral version. I'm using both 2.5 and 2.6, both Q8.0 GGUF quants from Tom. I'm running both using LMStudio on MacOS, using exactly the same config. System prompt is the default suggested in this repo (save the kittens makes me laugh every time), ChatML prompt format, temp 0.8, tokens -1, top_k 40, repeat_penalty 1.1 (tried up to 1.5, no difference), min_p 0.05, top_p 0.95, n_ctx 32768. I have noticed this behaviour on other Mixtral-based models in the past, but Dolphin 2.5 Mixtral seemed to fix it. Any ideas on how to fix this repeating/random gibberish problem? Any help/suggestions appreciated.
Thanks,
Will.

I noticed a similar issue with the Q5 and Q4_K_M, every GGUF quant in fact ended up repeating lines after only a few generations.
But I had no such problems with the GPTQ quants, in fact with those I had to tone down repetition penalty quite a bit.

I've had this problem with every GGUF I've tried of mixtral finetunes, including dolphin 2.5. I would be thrilled if anyone has insights.

I heard it mentioned somewhere that MoE models were considerably more sensitive to learning rate, Possible contributing factor?

Sign up or log in to comment