Endless generation
Hi there, these GGUF models (used through llama.cpp via LM Studio 0.2.20) keep generating endlessly. Each answer is immediately followed by 'assistant' and it continues the conversation all by itself. I'm using the official Llama 3 prompt preset:
{
"name": "Llama 3",
"inference_params": {
"input_prefix": "<|start_header_id|>user<|end_header_id|>\n\n",
"input_suffix": "<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n",
"pre_prompt": "You are a helpful, smart, kind, and efficient AI assistant. You always fulfill the user's requests to the best of your ability.",
"pre_prompt_prefix": "<|start_header_id|>system<|end_header_id|>\n\n",
"pre_prompt_suffix": "<|eot_id|>",
"antiprompt": [
"<|start_header_id|>",
"<|eot_id|>"
]
}
}
Any idea why this happens, and how to fix it?
I can confirm that it's not just the GGUF models, it is also ALL the quantized exl2 versions as well. The agent/model will just continue rambling endlessly. I've tested the 3,4,4.65 and 5bpw models and all do exactly the same thing. Will test other people's similarly quantized models too but I suspect it may be the same result.
Getting the same. Seems to answer well, but endlessly repeats the answer.
The GGUF models may need to be regenerated with the latest llama.cpp changes. The exl2 quants might just need a config change. I'll update when I get a chance.