k-quant models?

#4
by Mykee - opened

I see that the k-quant models have been deleted. Will there be a new version or Llama2 release?

Yeah, that confused me too. You can still get them from an older version of the repo (here).

I deleted them a while ago because they were at risk of producing garbage output. This model uses a non-standard vocab size (32001) and for a while that broke k-quants. The issue was resolved a few weeks ago, but I've not had a chance to go back and re-make k-quants for this or some other older models.

Are you saying that the k-quants I deleted do in fact work? I think it may be the case that they sometimes produce garbage output, as a change was required in llama.cpp to produce valid k-quants for models like this, changing how certain layers of the model were quantised. But maybe they work most of the time?

There hasn't been a Llama 2 Chronos Hermes yet, but there is a Llama 2 Chronos, and a whole bunch of Chronos merges which I quantised over the last 48 hours. So there's a lot of Llama 2 choice now.

In my testing with q6_K in Oobabooga, the model works as intended (tested with Simple-1 and Mirostat settings), but maybe I haven't been testing for longer periods of time.

I would think it's worth a revisit to quantize again, just in case. I've heard some reports of Llama 2 models having repetition issues, including Chronos-Hermes 2, so for some of us LLaMA 1-based models are still a viable option.

If not, we still have the original quants. Either way, keep up the great work.

I noticed TheBloke is requantizing some older models, so a GGUF version has been released with updated k-quants. Thank you!
TheBloke/chronos-hermes-13B-GGUF

Sign up or log in to comment