Webui crashes when loading the model

by aurenigma - opened Jun 25, 2023

Jun 25, 2023

When I try to load the model following the instructions in the card, I get the following error:

2023-06-25 07:35:11 WARNING:The model weights are not tied. Please use the `tie_weights` method before using the `infer_auto_device` function.
2023-06-25 07:35:12 WARNING:The safetensors archive passed at models\TheBloke_WizardLM-33B-V1.0-Uncensored-GPTQ\wizardlm-33b-v1.0-uncensored-GPTQ-4bit--1g.act.order.safetensors does not contain metadata. Make sure to save your model with the `save_pretrained` method. Defaulting to 'pt' metadata.

aurenigma

Jun 25, 2023

AutoGPTQ fails, but ExLlama seems to work.

TheBloke

Owner Jun 25, 2023

Probably it's due to needing a larger Pagefile to load the model. A common issue on Windows.

But if ExLlama works, just use that.

Ex01

Jun 25, 2023

Probably it's due to needing a larger Pagefile to load the model. A common issue on Windows.

But if ExLlama works, just use that.

Not sure if there is a problem with this one fella when I use ExLlama it runs like freaky fast like a &b response time but it gets into its own time paradox in about 3 responses.
I can run the 30B's nps since I set a 150GB page file on my M.2 on a PCIe4x. (And love them btw!).
This one hits 109GB page file every time and craps out the whole chat bot process set to GPTQ-for-Llama.
Win 11 , 3090, yadyad! :-)

If I can help in some way just ask, if it's not worth it don't stress I'm not :-)

Thanks for the constant updates love them, thanks Tom.

TheBloke

Owner Jun 25, 2023

I'm not really following what the problem is? What do you mean by time paradox? In what way does it crap out with GPTQ-for-LLaMa?

I've been using this with ExLlama on text-generation-webui for my Discord bot for the last 24 hours and it's working very well, so I know the model works in general.

Show me specific errors or problems and I'll help if I can.

Ex01

Jun 26, 2023

•

edited Jun 26, 2023

I'm not really following what the problem is? What do you mean by time paradox? In what way does it crap out with GPTQ-for-LLaMa?

I've been using this with ExLlama on text-generation-webui for my Discord bot for the last 24 hours and it's working very well, so I know the model works in general.

Show me specific errors or problems and I'll help if I can.

For me this particular model falls into a loop very quickly like 3-4 responses, (hence the time paradox reference, apologies if this was unclear).
As for causing the crash? Yes I load the model like I do all your models with GPTQ and this one loads the page file and at 109GB crashes the chat bot program oobabooga_windows\text-generation-webui (Press any key) in the CMD window and then the program force quits.
Runs fine under ExLlama however as stated it falls into a loop very quickly for me personally.
Realise there are millions of variables between users so I threw my experience in with this particular mode, it's the only one out of 20+ I have installed that causes a program crash. (The other 19+ brilliant and working flawlessly of which are yours too Tom.).

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment