4 bit quantized version?

by Djsiwnckfusnwbducke - opened 5 days ago

5 days ago

Llama 3.2-vision has a GGUF quantized version in Q4_K_M that works well with consumer video cards. I'm not sure how they made it since the llama.cpp tools for quantizing don't support vision models. Will you / can you release 4 bit quantized version of this model? I'd really like to put it to work on my local LLM computer. Thanks!

nudelbrot

2 days ago

you can just follow the huggingface transformers documentation to learn howto load a model in 4 bit

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment