Transformers
mpt
Composer
MosaicML
llm-foundry
text-generation-inference

Failed to load the model mpt-7b-storywriter.ggmlv3.q5_1.bin

#7
by StableDiffusion69 - opened

When I try to load this model in ooba with my 8GB VRAM card, I get the following:

Traceback (most recent call last): File β€œF:\Programme\oobabooga_windows\text-generation-webui\server.py”, line 67, in load_model_wrapper shared.model, shared.tokenizer = load_model(shared.model_name, loader) File β€œF:\Programme\oobabooga_windows\text-generation-webui\modules\models.py”, line 74, in load_model output = load_func_maploader File β€œF:\Programme\oobabooga_windows\text-generation-webui\modules\models.py”, line 255, in llamacpp_loader model, tokenizer = LlamaCppModel.from_pretrained(model_file) File β€œF:\Programme\oobabooga_windows\text-generation-webui\modules\llamacpp_model.py”, line 55, in from_pretrained result.model = Llama(**params) File β€œF:\Programme\oobabooga_windows\installer_files\env\lib\site-packages\llama_cpp\llama.py”, line 289, in init assert self.ctx is not None AssertionError

Console says:
2023-07-03 08:38:29 INFO:Loading TheBloke_MPT-7B-Storywriter-GGML...
2023-07-03 08:38:29 INFO:llama.cpp weights detected: models\TheBloke_MPT-7B-Storywriter-GGML\mpt-7b-storywriter.ggmlv3.q5_1.bin

2023-07-03 08:38:29 INFO:Cache capacity is 0 bytes
llama.cpp: loading model from models\TheBloke_MPT-7B-Storywriter-GGML\mpt-7b-storywriter.ggmlv3.q5_1.bin
error loading model: llama.cpp: tensor 'rus' should not be 7-dimensional
llama_init_from_file: failed to load model
2023-07-03 08:38:30 ERROR:Failed to load the model.
Traceback (most recent call last):
File "F:\Programme\oobabooga_windows\text-generation-webui\server.py", line 67, in load_model_wrapper
shared.model, shared.tokenizer = load_model(shared.model_name, loader)
File "F:\Programme\oobabooga_windows\text-generation-webui\modules\models.py", line 74, in load_model
output = load_func_maploader
File "F:\Programme\oobabooga_windows\text-generation-webui\modules\models.py", line 255, in llamacpp_loader
model, tokenizer = LlamaCppModel.from_pretrained(model_file)
File "F:\Programme\oobabooga_windows\text-generation-webui\modules\llamacpp_model.py", line 55, in from_pretrained
result.model = Llama(**params)
File "F:\Programme\oobabooga_windows\installer_files\env\lib\site-packages\llama_cpp\llama.py", line 289, in init
assert self.ctx is not None
AssertionError

Exception ignored in: <function LlamaCppModel.__del__ at 0x00000270ED8DCA60>
Traceback (most recent call last):
File "F:\Programme\oobabooga_windows\text-generation-webui\modules\llamacpp_model.py", line 29, in del
self.model.del()
AttributeError: 'LlamaCppModel' object has no attribute 'model'

What are my chances, please? πŸ€”

It's not supported in text-generation-webui. This is an older MPT model where I didn't list all the UIs that are compatible, but check my recent MPT-7B-Chat GGML for a list of UIs it does work with: https://huggingface.co/TheBloke/mpt-7b-chat-GGML

I'll update this README soon

It's not supported in text-generation-webui. This is an older MPT model where I didn't list all the UIs that are compatible, but check my recent MPT-7B-Chat GGML for a list of UIs it does work with: https://huggingface.co/TheBloke/mpt-7b-chat-GGML

I'll update this README soon

Thanks for this info. Unfortunately, the model you linked, isn't compatible, too.
It says:
These files are not compatible with text-generation-webui, llama.cpp, or llama-cpp-python.
Too bad ... 😌

No that's right none of the MPTs are. I just meant look at hte README and it shows some UIs that are compatible, like KoboldCpp and LoLLMS-WebUI

Ah, OK, I misunderstood what you have meant. OK.
Any chance we get some models working with text-genearation-webui some time ... ? πŸ™

Don't know - that depends on whether text-generation-webui adds support for a backend that supports MPT GGML models. There's no sign of that happening right now, but most likely if someone submitted the code necessary for text-generation-webui to support them, oobabooga (the creator of text-gen-ui) would include it. So it depends on someone wanting to write that.

There's a backend called ctransformers that supports all GGML format models include MPT, so if support for that was added to text-generation-webui, then it would work.

In the meantime, KoboldCpp and LoLLMS-WebUI are both good UIs. KoboldCpp even supports GPU acceleration with MPT GGML models.

Alternatively, if you have an NVidia GPU you could try a GPTQ of MPT 7B Storywriter. I haven't made one myself, but there is one available on Hugging Face which should work with text-generation-webui using the GPTQ-for-LLaMa loader (it won't work with ExLlama or AutoGPTQ).

Oh, great! Many thanks for this information. I will look into that.
And many thanks for your great work you do with those models. We all really appreciate it. πŸ‘

Sign up or log in to comment