Are you making k-quant series of this model?

#1
by mancub - opened

Might wait for q6_K if so, as that does nicely compared to 4, 5 or 8 and usually has a good perplexity score.

I will when I can, but those new k-quants are exclusive to llama.cpp at the moment

[pytorch2] ubuntu@h100:/workspace/process $ /workspace/git/ggml/build/bin/starcoder-quantize -h
usage: /workspace/git/ggml/build/bin/starcoder-quantize model-f32.bin model-quant.bin type
  type = "q4_0" or 2
  type = "q4_1" or 3
  type = "q5_0" or 8
  type = "q5_1" or 9
  type = "q8_0" or 7
[pytorch2] ubuntu@h100:/workspace/process $

I guess my bad, I was reading the model card and it mentioned 2, 3, 4, 5, 6 and 8 versions, so somehow I equated seeing 6 with K, duh.

I'll give GPTQ model a try instead since that'll probably provide best speed.

No rush otherwise, I have no idea how do you even accomplish everything you do, in just 24 hrs a day. :)

Oh yeah sorry it did say that - I've edited it now. I have a standard GGML template that assumes the Llama k-quants. I've not yet got to the point of implementing different README templates for non-Llama models

I've fixed that now

Sign up or log in to comment