Error parsing message when loading model

#2
by MasonJ - opened

I'd love to run this model but I get the below error in Ooba on an RTX 3090ti. I get the error with both the ExLlamav2 and ExLlamav2_HF model loaders. Task manger shows 22.7/24.0 GB on the GPU when the loading fails so I don't believe I ran out of VRAM. I can run turboderp_LLama2-70B-2.5bpw-h6-exl2 and turboderp_LLama2-70B-chat-2.55bpw-h6-exl2 just fine. Any insight would be appreciated.

2023-09-16 22:53:25 INFO:Loading airoboros-l2-70b-gpt4-1.4.1_2.5bpw-h6-exl2...
2023-09-16 22:53:38 ERROR:Failed to load the model.
Traceback (most recent call last):
File "S:\LLMs\Interfaces\oobabooga_windows\text-generation-webui\modules\ui_model_menu.py", line 194, in load_model_wrapper
shared.model, shared.tokenizer = load_model(shared.model_name, loader)
File "S:\LLMs\Interfaces\oobabooga_windows\text-generation-webui\modules\models.py", line 85, in load_model
tokenizer = load_tokenizer(model_name, model)
File "S:\LLMs\Interfaces\oobabooga_windows\text-generation-webui\modules\models.py", line 102, in load_tokenizer
tokenizer = AutoTokenizer.from_pretrained(
File "S:\LLMs\Interfaces\oobabooga_windows\installer_files\env\lib\site-packages\transformers\models\auto\tokenization_auto.py", line 736, in from_pretrained
return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
File "S:\LLMs\Interfaces\oobabooga_windows\installer_files\env\lib\site-packages\transformers\tokenization_utils_base.py", line 1854, in from_pretrained
return cls._from_pretrained(
File "S:\LLMs\Interfaces\oobabooga_windows\installer_files\env\lib\site-packages\transformers\tokenization_utils_base.py", line 2017, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
File "S:\LLMs\Interfaces\oobabooga_windows\installer_files\env\lib\site-packages\transformers\models\llama\tokenization_llama.py", line 156, in __init__
self.sp_model = self.get_spm_processor()
File "S:\LLMs\Interfaces\oobabooga_windows\installer_files\env\lib\site-packages\transformers\models\llama\tokenization_llama.py", line 172, in get_spm_processor
model = model_pb2.ModelProto.FromString(sp_model)
google.protobuf.message.DecodeError: Error parsing message

Honestly I have never seen that issue. Which version of transformers/exllamav2 are you using? Even searching for that error I can't find it something clear about it.

I'm using a fresh install of the latest Ooba Web UI (commit 0668f4e67fe158a385eff87ace9d0676d657df20). I included the output of 'pip list' from the conda env below. If something is amiss in my setup, I have no idea what it might be.


Package Version
------------------------- ------------
absl-py 1.4.0
accelerate 0.22.0
aiofiles 23.1.0
aiohttp 3.8.5
aiosignal 1.3.1
altair 5.1.1
antlr4-python3-runtime 4.9.3
anyio 4.0.0
appdirs 1.4.4
asttokens 2.4.0
async-timeout 4.0.3
attrs 23.1.0
auto-gptq 0.4.2+cu117
backcall 0.2.0
beautifulsoup4 4.12.2
bitsandbytes 0.41.1
blinker 1.6.2
cachetools 5.3.1
certifi 2022.12.7
cffi 1.15.1
charset-normalizer 2.1.1
click 8.1.7
colorama 0.4.6
coloredlogs 15.0.1
contourpy 1.1.0
cramjam 2.7.0
ctransformers 0.2.27+cu117
cycler 0.11.0
datasets 2.14.5
decorator 5.1.1
deep-translator 1.9.2
dill 0.3.7
diskcache 5.6.3
docker-pycreds 0.4.0
docopt 0.6.2
einops 0.6.1
elevenlabs 0.2.24
exceptiongroup 1.1.3
executing 1.2.0
exllama 0.0.17+cu117
exllamav2 0.0.1
fastapi 0.95.2
fastparquet 2023.8.0
ffmpeg 1.4
ffmpeg-python 0.2.0
ffmpy 0.3.1
filelock 3.9.0
Flask 2.3.3
flask-cloudflared 0.0.12
fonttools 4.42.1
frozenlist 1.4.0
fsspec 2023.6.0
future 0.18.3
gitdb 4.0.10
GitPython 3.1.36
google-auth 2.23.0
google-auth-oauthlib 1.0.0
gptq-for-llama 0.1.0+cu117
gradio 3.33.1
gradio_client 0.2.5
grpcio 1.58.0
h11 0.14.0
httpcore 0.18.0
httpx 0.25.0
huggingface-hub 0.17.1
humanfriendly 10.0
idna 3.4
ipython 8.15.0
itsdangerous 2.1.2
jedi 0.19.0
Jinja2 3.1.2
joblib 1.3.2
jsonschema 4.19.0
jsonschema-specifications 2023.7.1
kiwisolver 1.4.5
linkify-it-py 2.0.2
llama-cpp-python 0.1.85
llama-cpp-python-cuda 0.1.85+cu117
llvmlite 0.40.1
Markdown 3.4.4
markdown-it-py 2.2.0
MarkupSafe 2.1.2
matplotlib 3.8.0
matplotlib-inline 0.1.6
mdit-py-plugins 0.3.3
mdurl 0.1.2
more-itertools 10.1.0
mpmath 1.2.1
multidict 6.0.4
multiprocess 0.70.15
networkx 3.0
ngrok 0.9.0
ninja 1.11.1
nltk 3.8.1
num2words 0.5.12
numba 0.57.1
numpy 1.24.0
oauthlib 3.2.2
omegaconf 2.3.0
openai-whisper 20230314
optimum 1.13.1
orjson 3.9.7
packaging 23.1
pandas 2.1.0
parso 0.8.3
pathtools 0.1.2
peft 0.5.0
pickleshare 0.7.5
Pillow 10.0.1
pip 23.2.1
prompt-toolkit 3.0.39
protobuf 4.24.3
psutil 5.9.5
pure-eval 0.2.2
py-cpuinfo 9.0.0
pyarrow 13.0.0
pyasn1 0.5.0
pyasn1-modules 0.3.0
pycparser 2.21
pydantic 1.10.12
pydub 0.25.1
Pygments 2.16.1
pyparsing 3.1.1
pyreadline3 3.4.1
python-dateutil 2.8.2
python-multipart 0.0.6
pytz 2023.3.post1
PyYAML 6.0.1
referencing 0.30.2
regex 2023.8.8
requests 2.31.0
requests-oauthlib 1.3.1
rouge 1.0.1
rpds-py 0.10.3
rsa 4.9
safetensors 0.3.2
scikit-learn 1.3.0
scipy 1.11.2
semantic-version 2.10.0
sentence-transformers 2.2.2
sentencepiece 0.1.99
sentry-sdk 1.31.0
setproctitle 1.3.2
setuptools 68.0.0
six 1.16.0
smmap 5.0.0
sniffio 1.3.0
soundfile 0.12.1
soupsieve 2.5
SpeechRecognition 3.10.0
stack-data 0.6.2
starlette 0.27.0
sympy 1.11.1
tensorboard 2.14.0
tensorboard-data-server 0.7.1
threadpoolctl 3.2.0
tiktoken 0.3.1
tokenizers 0.13.3
toolz 0.12.0
torch 2.0.1+cu117
torchaudio 2.0.2+cu117
torchvision 0.15.2+cu117
tqdm 4.66.1
traitlets 5.10.0
transformers 4.33.2
typing_extensions 4.7.1
tzdata 2023.3
uc-micro-py 1.0.2
urllib3 1.26.13
uvicorn 0.23.2
wandb 0.15.10
wcwidth 0.2.6
websockets 11.0.2
Werkzeug 2.3.7
wheel 0.38.4
xxhash 3.3.0
yarl 1.9.2

All seems good, so I'm not exactly sure why is that happening to you.

I will re-do the quant in some hours with latest exllamav2, so stay tuned to that to see if it works for you.

I've uploaded a new version, with sharded files.

Can you test if it loads fine for you?

It is loading fine on my case, while using ooba webui.

image.png

I updated ooba and downloaded the new sharded version but I still get the exact same error, word for word.

I did notice a warning about flash-attention this time, which I can see from your screenshot, you don't have.
[WARNING:You are running ExLlamaV2 without flash-attention.]
I don't know if flash-attention is needed or not.

I will probably try to run the model with a different interface next to see if that changes anything. Unless you have another idea to troubleshoot further.
I appreciate the model, and the help.

I finally found the problem! The model loads correctly now and has been working great so far.

The problem was due to git LFS + my own stupidity. Ooba was trying to use a pointer version of tokenizer.model instead of the actual file.

Sorry for any unnecessary work on your part. I truly do appreciate the assistance. Perhaps if someone else makes my same mistake, they can find this thread and realize the issue.

Sign up or log in to comment