Iambe-RP-v3-20b-EXL2-3bpw

#1
by Alastar-Smith - opened

Hello Good Sir!

I'm a fan of your model! It gives me a really unique answers and follows instructions pretty well!
My problem is that I have 3060 12gb and GGUF version gives me only 3t\s, can you please make an EXL2-3bpw version?

Thank you in advance!
Cheers!

Yeah! EXL quants take forever and need a GPU, so it's too expensive to do remotely. I'll have to do it overnight locally, so it won;t be until morning.

We've got the same GPU so if it works well for me I might start doing it for all stable releases.

Got it! Thank you! It works pretty good and fast at 3bpw, faster then 13b models at GGUF.
Precise as also pretty good since it is a RP model, people say that new EXL2-2 quants is even better.

Ooof, 3bpw took ~14GB vram! I uploaded the 3.0bpw, I've got a 2.6bpw cooking.

D'oh, I didn't know about 8-bit cache. Regardless, there is also a 2.6bpw uploading now.

athirdpath changed discussion status to closed

Sign up or log in to comment