Okay, this is amazing.

#1
by zaq-hack - opened

Where did you learn this trick?

At first glance, this seems to make a huge difference in quality of output. I am doing this for a few of my favorite models to compare apples-to-apples for bpw and vs standard llama data quantizing.

Has anyone already run benchmarks on this vs. regular 5bpw? Or other regular exl2 vs. rpcal?

First experience has been magical, so now I'm wondering why everyone doesn't do this. ha ha ha

Owner

Hey, sorry for the slow response. HF User intervitens (not linking them as I don't want to tag them in a mostly-unrelated repo) started posting these, and I decided to do the same a few models I wanted quantized.

The other questions I have no answer to, really.

Sign up or log in to comment