Contradiction in model description

#2
by m9e - opened

"GS: GPTQ group size. Higher numbers use less VRAM, but have lower quantisation accuracy. "None" is the lowest possible value." in the description, but then

"8-bit, with group size 128g for higher inference quality and with Act Order for even higher accuracy. Poor AutoGPTQ CUDA speed."

Description of whether large group size vs small/no group size results in higher inference quality appears to contradict between the general description and the row-wise model-specific description.

It's a bit confusing and I should probably try and clarify that more, yes.

And I definitely need to clear up my table, which uses words like "higher inference quality" without making it clear what it's higher than. Some of that is a vestige from when I only showed a couple of options, so "128 = higher quality" was vs None, the only other option at that time. I'll be re-doing that table soon and will make it clearer.

To clarify the progression: "small group size" and "no group size" are at opposite ends of the scale. So:

  • None -- Lowest quality, smallest file size, lowest VRAM usage
  • 1024 (I don't make this one any more)
  • 128
  • 64
  • 32 -- Highest quality, largest file size, largest VRAM usage

Technically you could go even further into small, eg group size 16. But I've never bothered with that, or seen anyone else do so. Likewise at the other end you could have 256 or 512 or, I think, 2048. But I've stopped making 1024 even, and now just do 128, 64 and 32.

In time I may drop 64 too, as I'm not sure if anyone is actually using it.

Sign up or log in to comment