Add quantized versions?

#2
by PoignardAzur - opened

GGML is compatible with multiple quantization formats.

Some of them have extremely good size reduction / quality loss tradeoffs. For instance, a model with Q5_K_M is three times smaller than the base FP16 version, for a 2% perplexity increase.

It might be nice to provide some of these quantization formats out of the box.

Sign up or log in to comment