4 bit quantized version?
#5
by
Djsiwnckfusnwbducke
- opened
Llama 3.2-vision has a GGUF quantized version in Q4_K_M that works well with consumer video cards. I'm not sure how they made it since the llama.cpp tools for quantizing don't support vision models. Will you / can you release 4 bit quantized version of this model? I'd really like to put it to work on my local LLM computer. Thanks!
you can just follow the huggingface transformers documentation to learn howto load a model in 4 bit