Q8_0 vs fp16

by AIGUYCONTENT - opened 2 days ago

2 days ago

•

The Q8_0 quant is half the size of the fp16. I'm on an M4 Macbook Pro 48GB RAM. I'm pretty sure I can run fp16 but it doesn't leave much room for context. Is there much to lose by downloading the Q8_0 instead of the fp16?

Mradermacher told me a few months ago that the difference between Q8 to Q5_M is negligible and there is more to gain (by using the lower quant) from the increased context on limited RAM setups---or at least that's what I recall him saying. I could have misunderstood.

Thanks

bartowski

Owner 2 days ago

Yeah personally I wouldn't bother with fp16, I include it only for people who are obsessed with max size, Q8_0 in practice gets you basically indistinguishable performance

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment