Add quantized versions?
#2
by
PoignardAzur
- opened
GGML is compatible with multiple quantization formats.
Some of them have extremely good size reduction / quality loss tradeoffs. For instance, a model with Q5_K_M is three times smaller than the base FP16 version, for a 2% perplexity increase.
It might be nice to provide some of these quantization formats out of the box.