Error when trying to run in OobaBooga

#1
by AIGUYCONTENT - opened

I'm getting an error when trying to run in OobaBooga:

17:33:41-171561 INFO Loading "tess-v2.5-qwen2-72B-q4_k_m.gguf"
17:33:41-285599 ERROR Failed to load the model.
Traceback (most recent call last):
File "/home/me/Desktop/text-generation-webui-main/modules/ui_model_menu.py", line 244, in load_model_wrapper
shared.model, shared.tokenizer = load_model(selected_model, loader)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/me/Desktop/text-generation-webui-main/modules/models.py", line 82, in load_model
metadata = get_model_metadata(model_name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/me/Desktop/text-generation-webui-main/modules/models_settings.py", line 67, in get_model_metadata
bos_token = metadata['tokenizer.ggml.tokens'][metadata['tokenizer.ggml.bos_token_id']]
~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyError: 'tokenizer.ggml.bos_token_id'

how much VRAM you have sir?
the model q4_k_m use like 47.4 GB and the q3_k_m use like 37.7 GB maybe that is the error?
nevertheless I haven't use it OobaBooga I don't have the hardware for it.
This was just made to put on spaces on here https://huggingface.co/spaces/poscye/chat-with-tess and works great.

You can also test the one that made @bartowski here https://huggingface.co/bartowski/Tess-v2.5-Qwen2-72B-GGUF#download-a-file-not-the-whole-branch-from-below you can check based on the hardware you have. He made complete quants

wouldn't surprise me if oobabooga needed a llamacpp update :')

I have 50GB of VRAM. It's weird because the second I push "Load" in OobaBooga, I get the error message. So, I don't think it's due to VRAM. ChatGPT says it's missing the tokenizer file (tokenizer.ggml.bos_token_id).

Oh hey Bartowski, are you saying I need to manually do that? I run this in Ubuntu and just updated the WebUI a few hours ago.

I will give your quant a test. I have a 4090, 4080, and 3080 hooked up to my machine. Will be swapping out the 3080 for a 3090 tomorrow for a total of 64GB VRAM. I hope this is enough.

The reason i suspect it's ooba's issue is that running the raw llama.cpp ./main (well, ./llama-cli now..) yields perfect output, so if the source works but a branch doesn't, probably the branch is what's messing it up (ooba's webui in this case)

pabloce changed discussion status to closed

Sign up or log in to comment