4 bit quantized version?

#5
by Djsiwnckfusnwbducke - opened

Llama 3.2-vision has a GGUF quantized version in Q4_K_M that works well with consumer video cards. I'm not sure how they made it since the llama.cpp tools for quantizing don't support vision models. Will you / can you release 4 bit quantized version of this model? I'd really like to put it to work on my local LLM computer. Thanks!

you can just follow the huggingface transformers documentation to learn howto load a model in 4 bit

Sign up or log in to comment