Error parsing message when loading model

by MasonJ - opened Sep 17, 2023

Sep 17, 2023

•

edited Sep 17, 2023

I'd love to run this model but I get the below error in Ooba on an RTX 3090ti. I get the error with both the ExLlamav2 and ExLlamav2_HF model loaders. Task manger shows 22.7/24.0 GB on the GPU when the loading fails so I don't believe I ran out of VRAM. I can run turboderp_LLama2-70B-2.5bpw-h6-exl2 and turboderp_LLama2-70B-chat-2.55bpw-h6-exl2 just fine. Any insight would be appreciated.

2023-09-16 22:53:25 INFO:Loading airoboros-l2-70b-gpt4-1.4.1_2.5bpw-h6-exl2...
2023-09-16 22:53:38 ERROR:Failed to load the model.
Traceback (most recent call last):
File "S:\LLMs\Interfaces\oobabooga_windows\text-generation-webui\modules\ui_model_menu.py", line 194, in load_model_wrapper
shared.model, shared.tokenizer = load_model(shared.model_name, loader)
File "S:\LLMs\Interfaces\oobabooga_windows\text-generation-webui\modules\models.py", line 85, in load_model
tokenizer = load_tokenizer(model_name, model)
File "S:\LLMs\Interfaces\oobabooga_windows\text-generation-webui\modules\models.py", line 102, in load_tokenizer
tokenizer = AutoTokenizer.from_pretrained(
File "S:\LLMs\Interfaces\oobabooga_windows\installer_files\env\lib\site-packages\transformers\models\auto\tokenization_auto.py", line 736, in from_pretrained
return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
File "S:\LLMs\Interfaces\oobabooga_windows\installer_files\env\lib\site-packages\transformers\tokenization_utils_base.py", line 1854, in from_pretrained
return cls._from_pretrained(
File "S:\LLMs\Interfaces\oobabooga_windows\installer_files\env\lib\site-packages\transformers\tokenization_utils_base.py", line 2017, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
File "S:\LLMs\Interfaces\oobabooga_windows\installer_files\env\lib\site-packages\transformers\models\llama\tokenization_llama.py", line 156, in __init__
self.sp_model = self.get_spm_processor()
File "S:\LLMs\Interfaces\oobabooga_windows\installer_files\env\lib\site-packages\transformers\models\llama\tokenization_llama.py", line 172, in get_spm_processor
model = model_pb2.ModelProto.FromString(sp_model)
google.protobuf.message.DecodeError: Error parsing message

Panchovix

Owner Sep 17, 2023

Honestly I have never seen that issue. Which version of transformers/exllamav2 are you using? Even searching for that error I can't find it something clear about it.

MasonJ

Sep 17, 2023

I'm using a fresh install of the latest Ooba Web UI (commit 0668f4e67fe158a385eff87ace9d0676d657df20). I included the output of 'pip list' from the conda env below. If something is amiss in my setup, I have no idea what it might be.


Package                   Version
------------------------- ------------
absl-py                   1.4.0
accelerate                0.22.0
aiofiles                  23.1.0
aiohttp                   3.8.5
aiosignal                 1.3.1
altair                    5.1.1
antlr4-python3-runtime    4.9.3
anyio                     4.0.0
appdirs                   1.4.4
asttokens                 2.4.0
async-timeout             4.0.3
attrs                     23.1.0
auto-gptq                 0.4.2+cu117
backcall                  0.2.0
beautifulsoup4            4.12.2
bitsandbytes              0.41.1
blinker                   1.6.2
cachetools                5.3.1
certifi                   2022.12.7
cffi                      1.15.1
charset-normalizer        2.1.1
click                     8.1.7
colorama                  0.4.6
coloredlogs               15.0.1
contourpy                 1.1.0
cramjam                   2.7.0
ctransformers             0.2.27+cu117
cycler                    0.11.0
datasets                  2.14.5
decorator                 5.1.1
deep-translator           1.9.2
dill                      0.3.7
diskcache                 5.6.3
docker-pycreds            0.4.0
docopt                    0.6.2
einops                    0.6.1
elevenlabs                0.2.24
exceptiongroup            1.1.3
executing                 1.2.0
exllama                   0.0.17+cu117
exllamav2                 0.0.1
fastapi                   0.95.2
fastparquet               2023.8.0
ffmpeg                    1.4
ffmpeg-python             0.2.0
ffmpy                     0.3.1
filelock                  3.9.0
Flask                     2.3.3
flask-cloudflared         0.0.12
fonttools                 4.42.1
frozenlist                1.4.0
fsspec                    2023.6.0
future                    0.18.3
gitdb                     4.0.10
GitPython                 3.1.36
google-auth               2.23.0
google-auth-oauthlib      1.0.0
gptq-for-llama            0.1.0+cu117
gradio                    3.33.1
gradio_client             0.2.5
grpcio                    1.58.0
h11                       0.14.0
httpcore                  0.18.0
httpx                     0.25.0
huggingface-hub           0.17.1
humanfriendly             10.0
idna                      3.4
ipython                   8.15.0
itsdangerous              2.1.2
jedi                      0.19.0
Jinja2                    3.1.2
joblib                    1.3.2
jsonschema                4.19.0
jsonschema-specifications 2023.7.1
kiwisolver                1.4.5
linkify-it-py             2.0.2
llama-cpp-python          0.1.85
llama-cpp-python-cuda     0.1.85+cu117
llvmlite                  0.40.1
Markdown                  3.4.4
markdown-it-py            2.2.0
MarkupSafe                2.1.2
matplotlib                3.8.0
matplotlib-inline         0.1.6
mdit-py-plugins           0.3.3
mdurl                     0.1.2
more-itertools            10.1.0
mpmath                    1.2.1
multidict                 6.0.4
multiprocess              0.70.15
networkx                  3.0
ngrok                     0.9.0
ninja                     1.11.1
nltk                      3.8.1
num2words                 0.5.12
numba                     0.57.1
numpy                     1.24.0
oauthlib                  3.2.2
omegaconf                 2.3.0
openai-whisper            20230314
optimum                   1.13.1
orjson                    3.9.7
packaging                 23.1
pandas                    2.1.0
parso                     0.8.3
pathtools                 0.1.2
peft                      0.5.0
pickleshare               0.7.5
Pillow                    10.0.1
pip                       23.2.1
prompt-toolkit            3.0.39
protobuf                  4.24.3
psutil                    5.9.5
pure-eval                 0.2.2
py-cpuinfo                9.0.0
pyarrow                   13.0.0
pyasn1                    0.5.0
pyasn1-modules            0.3.0
pycparser                 2.21
pydantic                  1.10.12
pydub                     0.25.1
Pygments                  2.16.1
pyparsing                 3.1.1
pyreadline3               3.4.1
python-dateutil           2.8.2
python-multipart          0.0.6
pytz                      2023.3.post1
PyYAML                    6.0.1
referencing               0.30.2
regex                     2023.8.8
requests                  2.31.0
requests-oauthlib         1.3.1
rouge                     1.0.1
rpds-py                   0.10.3
rsa                       4.9
safetensors               0.3.2
scikit-learn              1.3.0
scipy                     1.11.2
semantic-version          2.10.0
sentence-transformers     2.2.2
sentencepiece             0.1.99
sentry-sdk                1.31.0
setproctitle              1.3.2
setuptools                68.0.0
six                       1.16.0
smmap                     5.0.0
sniffio                   1.3.0
soundfile                 0.12.1
soupsieve                 2.5
SpeechRecognition         3.10.0
stack-data                0.6.2
starlette                 0.27.0
sympy                     1.11.1
tensorboard               2.14.0
tensorboard-data-server   0.7.1
threadpoolctl             3.2.0
tiktoken                  0.3.1
tokenizers                0.13.3
toolz                     0.12.0
torch                     2.0.1+cu117
torchaudio                2.0.2+cu117
torchvision               0.15.2+cu117
tqdm                      4.66.1
traitlets                 5.10.0
transformers              4.33.2
typing_extensions         4.7.1
tzdata                    2023.3
uc-micro-py               1.0.2
urllib3                   1.26.13
uvicorn                   0.23.2
wandb                     0.15.10
wcwidth                   0.2.6
websockets                11.0.2
Werkzeug                  2.3.7
wheel                     0.38.4
xxhash                    3.3.0
yarl                      1.9.2

Panchovix

Owner Sep 17, 2023

All seems good, so I'm not exactly sure why is that happening to you.

I will re-do the quant in some hours with latest exllamav2, so stay tuned to that to see if it works for you.

Panchovix

Owner Sep 18, 2023

I've uploaded a new version, with sharded files.

Can you test if it loads fine for you?

It is loading fine on my case, while using ooba webui.

MasonJ

Sep 19, 2023

•

edited Sep 19, 2023

I updated ooba and downloaded the new sharded version but I still get the exact same error, word for word.

I did notice a warning about flash-attention this time, which I can see from your screenshot, you don't have.
[WARNING:You are running ExLlamaV2 without flash-attention.]
I don't know if flash-attention is needed or not.

I will probably try to run the model with a different interface next to see if that changes anything. Unless you have another idea to troubleshoot further.
I appreciate the model, and the help.

MasonJ

Sep 19, 2023

•

edited Sep 19, 2023

I finally found the problem! The model loads correctly now and has been working great so far.

The problem was due to git LFS + my own stupidity. Ooba was trying to use a pointer version of tokenizer.model instead of the actual file.

Sorry for any unnecessary work on your part. I truly do appreciate the assistance. Perhaps if someone else makes my same mistake, they can find this thread and realize the issue.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment