Unable to load/use this model.

#5
by vdruts - opened

Seem to be able to load most any other 30B models, but this one always results in a "BUS" error. In WSl/Ubuntu

/WizardLM-Uncensored-SuperCOT-Storytelling-GPTQ-4bit.act.order.safetensors
Bus error

So I'm trying to load this model in AutoGPT (Windows) after successfully loading several of your other 30B models... and still getting a BUS error. Any ideas? For some reason the GPU memory barely moves on this one, as in I don't really see it being loaded into memory unlike other models.

In windows it just crashes.

WARNING:The safetensors archive passed at models\TheBloke_WizardLM-Uncensored-SuperCOT-StoryTelling-30B-GPTQ\WizardLM-Uncensored-SuperCOT-Storytelling-GPTQ-4bit.act.order.safetensors does not contain metadata. Make sure to save your model with the save_pretrained method. Defaulting to 'pt' metadata.
Press any key to continue . . .

I've tried changing page-file size. Windows 11 just crashes... For reason reason the CPU barely spins up, no system memory is really used and only 5GB of VRAM, then it hangs and crashes.

The WARNING is fine, it can be ignored - it's not an error at all. Hopefully it will go away in a future release.

The "press a key to continue" is, as you thought, related to Pagefile. I don't know what to suggest other than making sure Pagefile is at least 90GB. That seems to work for everyone else who has this problem.

It's a known problem on Windows that it seems to need use about 3x the size of the model in RAM, and it maps it all to pagefile even if you have plenty of free RAM. Like even on a 128GB system, it still fails unless there's plenty of pagefile.

It's odd because I made the page-file 128GB and it still threw the error :/ I guess I can try 200GB but that insane lol

Any clues as to why this is one of the only models with this particular issue? I feel like its one of the json config files... your other 30B models load fine.

Yeah that would be insane. I really have no idea I'm afraid. I don't own an NV GPU and I can't easily get a Windows cloud system, so it's hard for me to ever test Windows.

What about WSL2? That's what I recommend to most Windows users. I know you said you had problems with it, but that might just be an install issue. Perhaps make a new conda environment with Miniconda and start again in WSL2, installing CUDA toolkit 11.8 and then torch with:

pip install torch  --index-url https://download.pytorch.org/whl/cu118

and then text-generation-webui with

git clone https://github.com/oobabooga/text-generation-webui 
cd text-generation-webui
pip install -r requirements.txt

That will install AutoGPTQ automatically, and then it should work immediately. In theory!

Yea, did that about 10x over recently and still had the issue :/ no clue. I'd be down for you to Teamview into my system and run some test if you like later this week. Win11/WSL2/4090 here.

@vdruts are you sure you have installed and setup WSL2 correctly?

I'm using WSL2 with Ubuntu in Win10 on my 3090 and it can load most 30B models.

@vdruts are you sure you have installed and setup WSL2 correctly?

I'm using WSL2 with Ubuntu in Win10 on my 3090 and it can load most 30B models.

Yes. I never had any issues weeks ago but then started running into issues. Anyway most 30B models load with no problem but this 30B model wont load in WSL or Windows.. in WSL I get a "BUS" error. I've tried increasing and creating a WSL swap file as well and no luck. Same error.

Just tried again in Windows.

Loading Wizard 30B (new one) loads in 48s AutoGPT
Loading The Cot Mix 30B (gets stuck at 5.7GB loaded into GPU memory) then throws the 'press any key'

I've got 64-128GB Swap file set. That did not fix it.

That is really weird. I can't understand what could be different about one 30B model vs another. They're all made the same way, and no-one else is reporting issues specific to this file.

Have you confirmed the file is definitely downloaded OK? Checked the sha256sum? Or just delete it and re-download it?

Also, if you have saved settings for this model in text-gen-ui, check those and maybe delete them. You can see those in models/config-user.yaml. Maybe a different/bad setting got added somewhere that's breaking things.

It is weird... I've tried downloading it. I'll try again.... and it's working... So weird. Because I had downloaded several times before. Now its working. No explanation. Something must have gotten corrupted.

Sign up or log in to comment