Q8_0 vs fp16
#3
by
AIGUYCONTENT
- opened
The Q8_0 quant is half the size of the fp16. I'm on an M4 Macbook Pro 48GB RAM. I'm pretty sure I can run fp16 but it doesn't leave much room for context. Is there much to lose by downloading the Q8_0 instead of the fp16?
Mradermacher told me a few months ago that the difference between Q8 to Q5_M is negligible and there is more to gain (by using the lower quant) from the increased context on limited RAM setups---or at least that's what I recall him saying. I could have misunderstood.
Thanks
Yeah personally I wouldn't bother with fp16, I include it only for people who are obsessed with max size, Q8_0 in practice gets you basically indistinguishable performance