Quant method

#1
by gcapnias - opened

Just a question,

Why just 3bit and 5bit quantization models? Usually, all the models start with 4bit quantization.

I was looking in order to create run the model under Ollama. Usually, the 4bit models are used in order to be light.

George J.

Institute for Language and Speech Processing org

We selected to share the Q5_K_M model because it provides better performance with a "small" difference in memory requirements, as well as the Q3 version, which is of lower quality but can run in lower-end GPUs.
If you are interested in a 4-bit version of the model you can find an AWQ one here: https://huggingface.co/ilsp/Meltemi-7B-Instruct-v1-AWQ.

For ollama, we have uploaded a 4bit version here: https://ollama.com/ilsp/meltemi-instruct:q4.1

Great,

Thanks a lot!

soksof changed discussion status to closed

Sign up or log in to comment