Version with Groupsize = None

by Kelheor - opened May 27, 2023

May 27, 2023

Is it possible in future to post version with Groupsize = None? So it will be possible to fit full context on consumer grade GPU, like 4090 24Gb? Version with 128g gives out of memory error when context almost full.

Example command from other model to visualise what I mean:
python llama.py /workspace/models/ehartford_WizardLM-30B-Uncensored wikitext2 --wbits 4 --true-sequential --act-order --save_safetensors /workspace/eric-30B/gptq/WizardLM-30B-Uncensored-GPTQ-4bit.act-order.safetensors

reeducator

Owner May 27, 2023

Yes, I can do that. There should be an update to the model itself soon, so I will run the conversions for that then.

AliCat2

Jul 5, 2023

Yes, I can do that. There should be an update to the model itself soon, so I will run the conversions for that then.

Thank you! That would also allow us to use 4096 context size with NTK w/ 24GB VRAM, so it'd be quite nice!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment