Add quantized versions?

by PoignardAzur - opened May 13

May 13

GGML is compatible with multiple quantization formats.

Some of them have extremely good size reduction / quality loss tradeoffs. For instance, a model with Q5_K_M is three times smaller than the base FP16 version, for a 2% perplexity increase.

It might be nice to provide some of these quantization formats out of the box.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment