LoneStriker/Meta-Llama-3-70B-Instruct-GGUF

Apr 20

Hi there, these GGUF models (used through llama.cpp via LM Studio 0.2.20) keep generating endlessly. Each answer is immediately followed by 'assistant' and it continues the conversation all by itself. I'm using the official Llama 3 prompt preset:

{
  "name": "Llama 3",
  "inference_params": {
    "input_prefix": "<|start_header_id|>user<|end_header_id|>\n\n",
    "input_suffix": "<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n",
    "pre_prompt": "You are a helpful, smart, kind, and efficient AI assistant. You always fulfill the user's requests to the best of your ability.",
    "pre_prompt_prefix": "<|start_header_id|>system<|end_header_id|>\n\n",
    "pre_prompt_suffix": "<|eot_id|>",
    "antiprompt": [
      "<|start_header_id|>",
      "<|eot_id|>"
    ]
  }
}

Any idea why this happens, and how to fix it?

Quantera

Apr 20

I can confirm that it's not just the GGUF models, it is also ALL the quantized exl2 versions as well. The agent/model will just continue rambling endlessly. I've tested the 3,4,4.65 and 5bpw models and all do exactly the same thing. Will test other people's similarly quantized models too but I suspect it may be the same result.

charltonh

Apr 21

Getting the same. Seems to answer well, but endlessly repeats the answer.

LoneStriker

Owner Apr 23

The GGUF models may need to be regenerated with the latest llama.cpp changes. The exl2 quants might just need a config change. I'll update when I get a chance.

LoneStriker
/

Meta-Llama-3-70B-Instruct-GGUF

Endless generation