Perplexity

#22
by gsaivinay - opened

Hello Mr. Bloke,

I assume you must be busy, but I'd like to ask if you have made any performance comparisons for Llama2 70B chat models vs GPTQ quant, like you did here . I'm just curious to know what is the performance of bigger quantized models vs fp16.

No worries if this is not in your radar.

Sign up or log in to comment