bartowski/openchat-3.6-8b-20240522-GGUF

hello,

I see that you have both the gguf and exl2 quants available for this model... in your opinion, what are the differences you observe between these 2 different quants? exl2 seems quite a bit faster at 8bit but in theory the quality of their generations should be similar, right? I tried both and couldn't really tell.

Is it ok to assume as long as you can fit an entire 8 bit model + the context length into VRAM, you should choose exl2 over GGUF, but if you can't and have to choose lower bit quants, then GGUF with iMatrix should work better?

I tried quite a few of your quants and found them to be excellent... thank you for all the hard work and contributions to the open source community!

bartowski
/

openchat-3.6-8b-20240522-GGUF

exl2 vs GGUF at 8bit