Text Generation
Transformers
Safetensors
English
llama
causal-lm
text-generation-inference
4-bit precision

Issue starting the webui

#23
by Depsi - opened

I have installed everything, and it all went well without any errors. Now, when I run the start_windows.bat file, it asks me to press any key to continue. When I do, the cmd window closes but nothing happens. I hope the screenshot below helps.
2023-05-20.png

How much RAM do you have?

How much RAM do you have?

32

OK I've finally learned what causes this. Yes it is RAM related. The model needs more than 32GB of RAM while loading onto the GPU.

The fix is to increase your pagefile size - you may need as much as 90GB pagefile. That will give it enough room to load the model onto VRAM, and then it will work fine and run exclusively from the GPU.

Got it! much appreciated.
Any GPUs you would recommend maybe from the upper midrange variety? Unfortunately, I'm not a computer geek, so if you give me the specifications, I'll know what to look for when shopping.
Thanks again, and have a good day!

The 4090 is the current king of consumer GPUs. It has 24GB VRAM which is very useful - meaning you can load a 30B 4bit model completely into VRAM - and it's extremely fast.

But the 4090 is also very expensive. A compromise would be a 3090, perhaps a used one. If buying used you can get a 3090 for less than half the price of a 4090, but it has the same amount of VRAM and is no more than ~20-30% slower for LLM inference.

If a 3090 is too much, then look for the best card you can afford with 12GB VRAM. A 4070 is one option. Or if that's still too much, there are a number of 3060s that have 12GB and are very affordable, like 1/3 the price of a used 3090.

Thank you very much for your time. This was really helpful. The 3090 seems very enticing!
Have a great day!

Sign up or log in to comment