Issue with tokenizor using Ooga Booga?

#21
by mm04926412 - opened

I've installed Ooga Booga and I'm trying to run the model from the cuda weights but encounter the following error when starting the web ui

Traceback (most recent call last):
File "C:\Users\mm049\Desktop\ooga booga\text-generation-webui\server.py", line 302, in
shared.model, shared.tokenizer = load_model(shared.model_name)
File "C:\Users\mm049\Desktop\ooga booga\text-generation-webui\modules\models.py", line 181, in load_model
tokenizer = LlamaTokenizer.from_pretrained(Path(f"{shared.args.model_dir}/{shared.model_name}/"), clean_up_tokenization_spaces=True)
File "C:\Users\mm049\Desktop\ooga booga\installer_files\env\lib\site-packages\transformers\tokenization_utils_base.py", line 1811, in from_pretrained
return cls.from_pretrained(
File "C:\Users\mm049\Desktop\ooga booga\installer_files\env\lib\site-packages\transformers\tokenization_utils_base.py", line 1965, in from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
File "C:\Users\mm049\Desktop\ooga booga\installer_files\env\lib\site-packages\transformers\models\llama\tokenization_llama.py", line 96, in init
self.sp_model.Load(vocab_file)
File "C:\Users\mm049\Desktop\ooga booga\installer_files\env\lib\site-packages\sentencepiece_init
.py", line 905, in Load
return self.LoadFromFile(model_file)
File "C:\Users\mm049\Desktop\ooga booga\installer_files\env\lib\site-packages\sentencepiece_init
.py", line 310, in LoadFromFile
return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
RuntimeError: Internal: D:\a\sentencepiece\sentencepiece\src\sentencepiece_processor.cc(1102) [model_proto->ParseFromArray(serialized.data(), serialized.size())]

Does anyone have any insight on what might be causing this. I have confirmed that the checksum of my model weights matches the repository.

I have the same problem. Hopefully the creators will figure out the incompatibility with Ooga Booga soon.

I'm also having the same issuing while using transformers straight in python REPL or in Code, this is my issue

I solved this problem on my machine, for some reason the tokenizor is stored using github LFS despite being less than a megabyte, you likely have a 1kb file pointer instead of the real tokenizor

@mm04926412 How do you get the real tokenizor?

Sign up or log in to comment