Version with Groupsize = None

#1
by Kelheor - opened

Is it possible in future to post version with Groupsize = None? So it will be possible to fit full context on consumer grade GPU, like 4090 24Gb? Version with 128g gives out of memory error when context almost full.

Example command from other model to visualise what I mean:
python llama.py /workspace/models/ehartford_WizardLM-30B-Uncensored wikitext2 --wbits 4 --true-sequential --act-order --save_safetensors /workspace/eric-30B/gptq/WizardLM-30B-Uncensored-GPTQ-4bit.act-order.safetensors

Yes, I can do that. There should be an update to the model itself soon, so I will run the conversions for that then.

Yes, I can do that. There should be an update to the model itself soon, so I will run the conversions for that then.

Thank you! That would also allow us to use 4096 context size with NTK w/ 24GB VRAM, so it'd be quite nice!

Sign up or log in to comment