More quants?

by MoonRide - opened Apr 23, 2024

Apr 23, 2024

Could you please provide more quants, like also Q6_K, Q5_K_M? Those are better quality than Q4 (sample results in https://github.com/ggerganov/llama.cpp/blob/master/examples/perplexity/README.md).

MoonRide

Apr 23, 2024

For anyone interested - quants created from F16 via:
quantize Phi-3-mini-4k-instruct-fp16.gguf Phi-3-mini-4k-instruct-Q6_K.gguf Q6_K
work fine (tested using llama.cpp b2714).

simonw

Apr 24, 2024

What quantize command are you using there?

simonw

Apr 24, 2024

Figured it out, it was this one: https://github.com/ggerganov/llama.cpp/blob/master/examples/quantize/quantize.cpp

MoonRide

Apr 24, 2024

@simonw It's just a standard exacutable included in llama.cpp I've mentioned earlier (you can just check out binary releases of llama.cpp, published at https://github.com/ggerganov/llama.cpp/releases).

gugarosa changed discussion status to closed May 1, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment