Please, help :<

#7
by ANGIPO - opened

cfg: Ryzen 5800X, RTX 3090 24GB
2023-04-22_15-09-47.png

I think there is an issue with the files provided, it is not referencing the safetensor files hence the issue. Hopefully this is updated soon.

Not an issue with the files. Try running it with --wbits 4

I have the same issue.

I haven't downloaded all three big safetensor files, just the 4bit 128g file. Are all big files needed for the model to work?

Ah, somehow the UI resets wbits and groupsize to None when I switch the model. Now I've managed to get past this issue.

However now, the server.py just exits when I try to load the model. No error, nothing.

Update: it appears to be model.load_state_dict(safe_load(checkpoint), strict=False) that crashes Python with no error whatsoever.

@HendrikW80 download: oobabooga/llama-tokenizer and put it in your models folder. This is the default tokenizer

Enjoy,

try to change the folder name, GPT4-X-Alpaca-30B-128g-4bit to gpt4-x-alpaca-30b-128g-4bit

mv GPT4-X-Alpaca-30B-Int4 gpt4-x-alpaca-30b-128g-4bit

You think it's a matter of upper/lower case?

@HendrikW80 , I had the exact same error as you. Python server would close with no errors given. putting oobabooga/llama-tokenizer in my models folder resolved this issue

@HendrikW80 , I had the exact same error as you. Python server would close with no errors given. putting oobabooga/llama-tokenizer in my models folder resolved this issue

I'll try that out and report back if it works, thanks.

@HendrikW80 , I had the exact same error as you. Python server would close with no errors given. putting oobabooga/llama-tokenizer in my models folder resolved this issue

I'll try that out and report back if it works, thanks.

@DTechNation Okay, tried it. No success. Still exits right after "Loading model..." with no error whatsoever.

I downloaded the latest oobabooga/llama-tokenizer and changed the directory name to lower case however it loads and then gets killed.... I am running on a A10 and tried the 128g model and the non-128g model

$ python server.py --auto-devices --groupsize 128 --extensions api --listen --model gpt4-x-alpaca-30b-128g-4bit --wbits 4

Gradio HTTP request redirected to localhost :)
bin /home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda117.so
Loading gpt4-x-alpaca-30b-128g-4bit...
Found the following quantized model: models/gpt4-x-alpaca-30b-128g-4bit/gpt4-x-alpaca-30b-128g-4bit.safetensors
Killed

Solved: It needs about 17GB of free RAM to load (not VRAM).

I have over 40gb of system ram free and it still won't work. I changed the (non 128) model to lower case and have the same quitting issue

$ python server.py --auto-devices --model gpt4-x-alpaca-30b-128g-4bit --wbits 4 --model_type LLaMa

Using a 3090 and It worked randomly for a day after I downloaded the oobabooga/llama-tokenizer, but then stopped the next day. I removed and reinstalled the latest oobabooga web ui, but I am still having the quitting issue

Yeah, same here. It's really hard to debug with nothing but a cryptic error code...

I FIGURED IT OUT

You need like 40gb of space on your SSD for it startup!

I FIGURED IT OUT

You need like 40gb of space on your SSD for it startup!

What script are you using to run the model?
$ python server.py --auto-devices --model gpt4-x-alpaca-30b-128g-4bit --wbits 4 --model_type LLaMa

The reason for the error is a problem on Windows whereby larger models need a very large Pagefile to load, even though you are loading it to GPU and even if you have plenty of RAM.

For 30B models I recommend users have a Pagefile of 100GB. This can be achieved either by manually setting Pagefile to 100GB, or else having it on Auto and ensuring you have 100+ GB free on C: (or whatever drive holds the Pagefile)

That's why it worked for DTechNation when they freed up more disk space; the Pagefile was able to grow large enough to allow the model loading.

Sign up or log in to comment