GGUF version

#2
by johnnnna - opened

Please 🥺

@senseable we need Smaug-72B-v0.1-q2_k_m.gguf (Q2) (and i love you)

@windkkk It's uploading now, I tested it in 2-bit.. it's quite similar good still. The responses are framed the same way but do seem to lack a bit of depth FYI.

@windkkk It's uploading now, I tested it in 2-bit.. it's quite similar good still. The responses are framed the same way but do seem to lack a bit of depth FYI.

Is if OK to try SOTA quantization IQ2_XS

Will Smaug 2 bit fit onto 16 GB GPU?

Thx @senseable <3

Do you guys think the q2 version is so much "dumber" than q4/q5? I might be able to run Smaug 72B q2 on my machine, will compare it to Smaug 34B Q4_M

Thx @senseable <3

Do you guys think the q2 version is so much "dumber" than q4/q5? I might be able to run Smaug 72B q2 on my machine, will compare it to Smaug 34B Q4_M

share the results when you're done ;)

Will Smaug 2 bit fit onto 16 GB GPU?

No, you'll need something like 40GB to use it, but if you have enough RAM you might get a few tokens per seconds with partial offloading.

Sign up or log in to comment