Webui crashes when loading the model

#3
by aurenigma - opened

When I try to load the model following the instructions in the card, I get the following error:

2023-06-25 07:35:11 WARNING:The model weights are not tied. Please use the `tie_weights` method before using the `infer_auto_device` function.
2023-06-25 07:35:12 WARNING:The safetensors archive passed at models\TheBloke_WizardLM-33B-V1.0-Uncensored-GPTQ\wizardlm-33b-v1.0-uncensored-GPTQ-4bit--1g.act.order.safetensors does not contain metadata. Make sure to save your model with the `save_pretrained` method. Defaulting to 'pt' metadata.

AutoGPTQ fails, but ExLlama seems to work.

Probably it's due to needing a larger Pagefile to load the model. A common issue on Windows.

But if ExLlama works, just use that.

Probably it's due to needing a larger Pagefile to load the model. A common issue on Windows.

But if ExLlama works, just use that.

Not sure if there is a problem with this one fella when I use ExLlama it runs like freaky fast like a &b response time but it gets into its own time paradox in about 3 responses.
I can run the 30B's nps since I set a 150GB page file on my M.2 on a PCIe4x. (And love them btw!).
This one hits 109GB page file every time and craps out the whole chat bot process set to GPTQ-for-Llama.
Win 11 , 3090, yadyad! :-)

If I can help in some way just ask, if it's not worth it don't stress I'm not :-)

Thanks for the constant updates love them, thanks Tom.

I'm not really following what the problem is? What do you mean by time paradox? In what way does it crap out with GPTQ-for-LLaMa?

I've been using this with ExLlama on text-generation-webui for my Discord bot for the last 24 hours and it's working very well, so I know the model works in general.

Show me specific errors or problems and I'll help if I can.

I'm not really following what the problem is? What do you mean by time paradox? In what way does it crap out with GPTQ-for-LLaMa?

I've been using this with ExLlama on text-generation-webui for my Discord bot for the last 24 hours and it's working very well, so I know the model works in general.

Show me specific errors or problems and I'll help if I can.

For me this particular model falls into a loop very quickly like 3-4 responses, (hence the time paradox reference, apologies if this was unclear).
As for causing the crash? Yes I load the model like I do all your models with GPTQ and this one loads the page file and at 109GB crashes the chat bot program oobabooga_windows\text-generation-webui (Press any key) in the CMD window and then the program force quits.
Runs fine under ExLlama however as stated it falls into a loop very quickly for me personally.
Realise there are millions of variables between users so I threw my experience in with this particular mode, it's the only one out of 20+ I have installed that causes a program crash. (The other 19+ brilliant and working flawlessly of which are yours too Tom.).

Sign up or log in to comment