Help me quantize model

#57
by Alsebay - opened

Long time no see. Could you help me quantize my model ?
https://huggingface.co/Alsebay/FinalFintetuning-XVIII-2x8B
LLAMA 3 is hard to quantize (for now).

Absolutely, it's in the queue. Let's see what happens.

As long as the pretokenizer is supported by llama.cpp, it should be straightforward to quantize at the moment:

convert-hf-to-gguf.py --use-temp-file --outfile model.gguf model-directory
quantize model.gguf model.Q4_K_S.gguf Q4_K_S

The "trick" is to use convert-hf-to-gguf.py, and, for large models that don't fit into RAM, --use-temp-file.

But it's such a hassle. And doing imatrix training is then even more hassle. So feel free to worry less and ask me (or anybody else) to quantize it :)

Unfortunately, the model overflowed during imatrix training, so only static quants will be done. and they are done: https://huggingface.co/Alsebay/FinalFintetuning-XVIII-2x8B

mradermacher changed discussion status to closed

I see, Thank you for helping me. Static quants is enough in most of case. Thanks a lot. :D

The more worrysome aspect is that, since imatrix training just feeds a long prompt to the model, when it overflows, it indicates a problem with the model, as it would overflow with at least that prompt (and probably others). Happens with merges.

It's quite a common problem, and doesn't mean the model is broken for all inputs (or not useful), but it's not a good sign.

Maybe that because my MoE merging process have some errors in some tensor layers that I don't know. I will further investigate the model to see what happend. Really thank you for inform me about that. :)

Seem that because one of source model have layer error. Thank you again.

Wow, cool. It's normally not easy to find the root cause - it could just be a few weights that, when e.g. averaged together, cause intermediate results to overflow when originally they didn't. In any case, as always, I'll be happy to quantize anything you come up with :)

Ah no, I just think my model is the worst of the source model, then I concentrated on checking it. I fine-tuned raw text completion (mostly like continue pretraining) about specific content that my friend asked me to do. But google colab specs is not good enough so almost model fail (maximum is 32 rank and lora alpha, so it haven't learned all the content). Then I found together AI and had 4 prototype of finetuning model (but they didn't give me any logs about finetuning, so maybe all of those models are overfit or broken, also they didn't let me config the training freely). One of the prototype is in this MoE merge, that why I concentrated on check it first.
Anyways, very thank you again for helping me, you teach me a lot of knowledge. Maybe I will pause finetuning LLM until I have good specs (maybe hire gpu to "continue pretraining" those novel).

I wish these things wouldn't take as many resources as they do. Sigh.

yeah, but sadly it does. AI already 'cheaper' nowadays, so I hope I could completed my friends request in the future. I'm still merging, that more easy than finetuning for me.

... and lots of legendary models (goliath!) were "just" merges.

Ummm, sorry, but...could you do quant this model for me?: https://huggingface.co/Alsebay/FinalFintetuning-XVIII-v1.1-2x8B
This model just have a minor change, but my specs can't load the full version so I need load it in quantized version.

Absolutely no problem, it's on the way :)

imatrix was generated without an issue this time, imatrix-quants are incoming

Thanks so much!

You are right, although my model still 'seem overfit', but with different model merge recipe, it is not broken in some ways. Thanks.

Sign up or log in to comment