other quants available?

#1
by veryVANYA - opened

ty for uploading experimental so quick, curious if you've got the 4,5,6,8 as well

Owner

Yes, I'm just uploading them now

got them ty, how did you load them in? i combined but not able to load on lm studio

Will here be IQ quants? My potato server really need them😭. Whatever, thanks for all your work!

Owner

got them ty, how did you load them in? i combined but not able to load on lm studio

I built the llama.cpp fork mentioned in the readme and used it for inference. Combined weights won't work in LM Studio because the bundled llama.cpp version doesn't support them

Will here be IQ quants? My potato server really need them😭. Whatever, thanks for all your work!

If you're looking for IQ quants you can check this repo: https://huggingface.co/dranger003/c4ai-command-r-plus-iMat.GGUF

Shouldn't the 2_K version be around 25GB? Why is it 40GB?

Owner

This is the effect of how Q2_K quantization works in llama.cpp. Not all tensors have Q2_K precision. In the image below, you can see what this looks like for the first block in the Q2_K model
image.png

So 32 GB(16x2) VRAM are not enough without offloading some layers on the RAM.

Unfortunately, yes. If you want to move all layers to GPUs, check the imatrix quants (such as IQ2_XXS) from the dranger003/c4ai-command-r-plus-iMat.GGUF repo. They're smaller than 32 GB

Thanks for the direction.

pmysl changed discussion status to closed

Sign up or log in to comment