5.0bpw exl2?

#2
by coffeedean - opened

Hello,

I've been using your 5.0bpw exl2 model of the previous 103b model from sophosympatheia, Rogue-Rose-103b-v0.2-5.0bpw-h6-exl2-2. It has been performing really well. Now I've been testing Aurora-Nights-103B-v1.0 using the Q5 GGUF from TheBloke, and I'm really impressed with its output, but I've always had huge issues with tk/s performance using GGUF. And your exl2 models are always so good, fast, and easy to use.

Do you have any plans for an Aurora-Nights-103B-v1.0-5.0bpw-h6-exl2? I'm not sure if there's enough demand, not sure how many people used Rogue-Rose-103b-v0.2-5.0bpw-h6-exl2-2, but I did, and love it.

Thanks!

I can add it to the list when doing 103B models. Very few people have 48 GB VRAM, let alone more than 48 GB to run something like this at 5.0bpw (I'll be able to test these locally as well shortly after I shuffle my GPUs around.)

Thank you so much, I appreciate it! Yes, the only way I run those big models at higher quants is using something like Runpod. But it does make a noticeable difference in output quality, at least it for me did for Rogue Rose.

Thank you! You're awesome. I'm testing it right now and it's working very well. Thanks again!

coffeedean changed discussion status to closed

Sign up or log in to comment