Iambe-RP-v3-20b-EXL2-3bpw

by Alastar-Smith - opened Dec 12, 2023

Discussion

Alastar-Smith

Dec 12, 2023

•

edited Dec 13, 2023

Hello Good Sir!

I'm a fan of your model! It gives me a really unique answers and follows instructions pretty well!
My problem is that I have 3060 12gb and GGUF version gives me only 3t\s, can you please make an EXL2-3bpw version?

Thank you in advance!
Cheers!

athirdpath

Owner Dec 12, 2023

Yeah! EXL quants take forever and need a GPU, so it's too expensive to do remotely. I'll have to do it overnight locally, so it won;t be until morning.

We've got the same GPU so if it works well for me I might start doing it for all stable releases.

Alastar-Smith

Dec 13, 2023

•

edited Dec 13, 2023

Got it! Thank you! It works pretty good and fast at 3bpw, faster then 13b models at GGUF.
Precise as also pretty good since it is a RP model, people say that new EXL2-2 quants is even better.

athirdpath

Owner Dec 13, 2023

Ooof, 3bpw took ~14GB vram! I uploaded the 3.0bpw, I've got a 2.6bpw cooking.

athirdpath

Owner Dec 13, 2023

D'oh, I didn't know about 8-bit cache. Regardless, there is also a 2.6bpw uploading now.

athirdpath changed discussion status to closed Dec 13, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment