TheBloke/fiction.live-Kimiko-V2-70B-GGUF

Is there something wrong with the GGUF quants of this model? I have downloaded both the Q4KM and the Q5KM and the model starts repeating the same outputs around 3k context. The longer it goes, the harder this repetition is to break. I have to change parameters and edit the context and it still tries to pull up old sentences verbatim.

I also downloaded the GPTQ but the same issue is not present. Most of my other llama.cpp models were d/l as GGML and converted via the script, not quantized to GGUF directly. When I use them, this issue definitely doesn't occur whatsoever. I know there were some commits to the llama.cpp scripts post this conversion but not sure if that matters.

I thought it could be the samplers but again, I don't have this problem with other CPP models of the same size which were converted GGML->GGUF. I use the same settings.

I am now tempted to get the GGML and re-convert them to see if the issue is present but that's another 40GB of downloading for an unknown result.

TheBloke
/

fiction.live-Kimiko-V2-70B-GGUF

Repetition issues?