h8 instead of h6 for 8bpw versions

#1
by Thireus - opened

First of all, really big thanks for all the models you've quantized! This is incredibly helpful.

Just wanted to ask if you could produce h8 versions when 8bpw is used for the quantization, or if there is a particular reason why the head is 6 bits only when the rest is 8 bits.

According to Turboderp, h6 was enough, so I just stayed with the defaults there. Probably doesn't make much of a difference either way. I can look to change my scripts to h8; I'll check with Turboderp to confirm.

Sign up or log in to comment