Best optimized settings for 24GB VRAM video card?

#3
by benfy - opened

As title, I am currently running RTX 3090 24GB, but this model exhaust my VRAM after 10-15 lines of prompt, is there any other setting should I change to optimize the balance between VRAM consumption and performance? Really enjoyed this model but sadly I can't really get the most out of it because of out of memory errors.

My current setting is oobabooga as following:
-auto-devices -wbits 4 -groupsize 128 -model_type llama ~ this setting exhaust VRAM around 10 lines of prompt, generation speed is acceptable
-auto-devices -wbits 4 -groupsize 128 -model_type llama -gpu-memory 23 -pre_layer 50 ~ this setting will prevent out of memory errors at the moment but the generation speed is very slow

Any other suggestions?

Sorry, my bad. I have found the solution, I switched the model to "gpt4-x-alpasta-30b-4bit.safetensors" solved the performance and out of VRAM issue (previously the 128g variant cause me alot of issues), with parameters:
-auto-devices -wbits 4 -model_type llama

Now it works like a charm.

What does -auto-devices DO?

What does -auto-devices DO?

My understanding, nothing when using GPTQ.

Sign up or log in to comment