other quants available?

by veryVANYA - opened Apr 5, 2024

Discussion

veryVANYA

Apr 5, 2024

ty for uploading experimental so quick, curious if you've got the 4,5,6,8 as well

pmysl

Owner Apr 5, 2024

Yes, I'm just uploading them now

veryVANYA

Apr 5, 2024

got them ty, how did you load them in? i combined but not able to load on lm studio

luzamu

Apr 5, 2024

•

edited Apr 5, 2024

Will here be IQ quants? My potato server really need them😭. Whatever, thanks for all your work!

pmysl

Owner Apr 5, 2024

got them ty, how did you load them in? i combined but not able to load on lm studio

I built the llama.cpp fork mentioned in the readme and used it for inference. Combined weights won't work in LM Studio because the bundled llama.cpp version doesn't support them

Will here be IQ quants? My potato server really need them😭. Whatever, thanks for all your work!

If you're looking for IQ quants you can check this repo: https://huggingface.co/dranger003/c4ai-command-r-plus-iMat.GGUF

Kalemnor

Apr 6, 2024

Shouldn't the 2_K version be around 25GB? Why is it 40GB?

pmysl

Owner Apr 7, 2024

This is the effect of how Q2_K quantization works in llama.cpp. Not all tensors have Q2_K precision. In the image below, you can see what this looks like for the first block in the Q2_K model

Kalemnor

Apr 7, 2024

So 32 GB(16x2) VRAM are not enough without offloading some layers on the RAM.

pmysl

Owner Apr 7, 2024

•

edited Apr 7, 2024

Unfortunately, yes. If you want to move all layers to GPUs, check the imatrix quants (such as IQ2_XXS) from the dranger003/c4ai-command-r-plus-iMat.GGUF repo. They're smaller than 32 GB

Kalemnor

Apr 7, 2024

Thanks for the direction.

pmysl changed discussion status to closed Apr 8, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment