norallm/normistral-7b-warm-instruct · GGUF 8bit model not recommended.

Jun 17, 2024

Hi! Thanks a lot for making these models accessible and fully open source!
I was looking at the different quantized models, and from the table, you say that the
8-bit variant is : " very large, extremely low quality loss - not recommended"
First of all, where can i find the results showing that the model has extremely low loss of quality? Could the use of "extremely here be an exaggeration?"
Second, why is it that this variant is not recommended?
Given you have the compute wouldn't you prefer this over the smaller model?
I understand that the smaller models could be more optimal for some, but I don't get why it is just not recommended without any explanation.

If the described magnitude of model size and loss of compression is relative to the other models in the table, then perhaps just call it "largest" and "least loss of quality"?

lgcharpe

Norwegian Large Language Models org Jun 17, 2024

We have not tested the quantized models yet, and the use cases are from llama.cpp, these are there recommendations for the llama, and mistral 7b models.

davda54

Norwegian Large Language Models org Jun 17, 2024

I deleted the "recommendations", they depend on how much VRAM you have available and seem quite misleading. Thanks!

davda54 changed discussion status to closed Oct 5, 2024