GPTQ group size effects vRAM usage

by sgjohnson - opened

Unless this isn't the case, many of the models claim GPTQ group size effects vRAM usage, but you can only select precision here. Is group size covered by "Auto"?

accelerate org

Auto covers no_split modules, which generally should group those together. If you have an example model I can run a test to confirm this :)

Sign up or log in to comment