imatrix problem

by DataSoul - opened Mar 6

Mar 6

•

I tried your ggml-causallm-34b-beta-iq3_xxs.gguf in LM studio (based on llama.cpp), and it works great! Thanks.

But when I try to make my own IQ3_XXS and use it in LM studio, the gguf I made doesn't work (but the Qx_x_x I make myself does)

So I would like to ask if I made any mistakes? If you have time to answer

Here's the command I'm using:

imatrix -m F:\gguf_out\CausalLM-34b-beta-Q8_0.gguf -f F:\OpenSource-Datasets\groups_merged.txt -o F:\gguf_out\ggml-causallm-34b-beta-q8_0-imatrix.dat -t 12 -ngl 40 -b 128 -c 32 --chunks 1000

quantize --allow-requantize --imatrix F:\gguf_out\ggml-causallm-34b-beta-q8_0-imatrix.dat F:\gguf_out\CausalLM-34b-beta-Q8_0.gguf F:\gguf_out\CausalLM-34b-beta-IQ3_ XXS.gguf IQ3_ XXS

dranger003

Owner Mar 6

For imatrix I only use -ngl 40 and keep the default 512 context/batch lengths (didn't see anything stating this was a bad approach).
For quantize you need to remove --allow-requantize and quantize from the F16 GGUF - don't quantize from the Q8.

And I think that should do the trick.

DataSoul

Mar 7

For imatrix I only use -ngl 40 and keep the default 512 context/batch lengths (didn't see anything stating this was a bad approach).
For quantize you need to remove --allow-requantize and quantize from the F16 GGUF - don't quantize from the Q8.

And I think that should do the trick.

Thanks, I will try this.

DataSoul changed discussion status to closed Mar 7

DataSoul

Mar 7

I tried your ggml-causallm-34b-beta-iq3_xxs.gguf in LM studio (based on llama.cpp), and it works great! Thanks.

But when I try to make my own IQ3_XXS and use it in LM studio, the gguf I made doesn't work (but the Qx_x_x I make myself does)

So I would like to ask if I made any mistakes? If you have time to answer

Here's the command I'm using:

imatrix -m F:\gguf_out\CausalLM-34b-beta-Q8_0.gguf -f F:\OpenSource-Datasets\groups_merged.txt -o F:\gguf_out\ggml-causallm-34b-beta-q8_0-imatrix.dat -t 12 -ngl 40 -b 128 -c 32 --chunks 1000

quantize --allow-requantize --imatrix F:\gguf_out\ggml-causallm-34b-beta-q8_0-imatrix.dat F:\gguf_out\CausalLM-34b-beta-Q8_0.gguf F:\gguf_out\CausalLM-34b-beta-IQ3_ XXS.gguf IQ3_ XXS

I found that the problem is coming from "LM Studio" . (loading GGUF with "main.exe" in the "llama.cpp" and it works) Maybe it's the "llama.cpp" version built into "LM Studio" that isn't compatible with mine.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment