anon8231489123/gpt4-x-alpaca-13b-native-4bit-128g · Issue with tokenizor using Ooga Booga?

Apr 10, 2023

I've installed Ooga Booga and I'm trying to run the model from the cuda weights but encounter the following error when starting the web ui

Traceback (most recent call last):
File "C:\Users\mm049\Desktop\ooga booga\text-generation-webui\server.py", line 302, in
shared.model, shared.tokenizer = load_model(shared.model_name)
File "C:\Users\mm049\Desktop\ooga booga\text-generation-webui\modules\models.py", line 181, in load_model
tokenizer = LlamaTokenizer.from_pretrained(Path(f"{shared.args.model_dir}/{shared.model_name}/"), clean_up_tokenization_spaces=True)
File "C:\Users\mm049\Desktop\ooga booga\installer_files\env\lib\site-packages\transformers\tokenization_utils_base.py", line 1811, in from_pretrained
return cls.from_pretrained(
File "C:\Users\mm049\Desktop\ooga booga\installer_files\env\lib\site-packages\transformers\tokenization_utils_base.py", line 1965, in from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
File "C:\Users\mm049\Desktop\ooga booga\installer_files\env\lib\site-packages\transformers\models\llama\tokenization_llama.py", line 96, in init
self.sp_model.Load(vocab_file)
File "C:\Users\mm049\Desktop\ooga booga\installer_files\env\lib\site-packages\sentencepiece_init.py", line 905, in Load
return self.LoadFromFile(model_file)
File "C:\Users\mm049\Desktop\ooga booga\installer_files\env\lib\site-packages\sentencepiece_init.py", line 310, in LoadFromFile
return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
RuntimeError: Internal: D:\a\sentencepiece\sentencepiece\src\sentencepiece_processor.cc(1102) [model_proto->ParseFromArray(serialized.data(), serialized.size())]

Does anyone have any insight on what might be causing this. I have confirmed that the checksum of my model weights matches the repository.

Radon088

Apr 10, 2023

I have the same problem. Hopefully the creators will figure out the incompatibility with Ooga Booga soon.

ShifraSec

Apr 11, 2023

I'm also having the same issuing while using transformers straight in python REPL or in Code, this is my issue

mm04926412

Apr 11, 2023

I solved this problem on my machine, for some reason the tokenizor is stored using github LFS despite being less than a megabyte, you likely have a 1kb file pointer instead of the real tokenizor

neoszhane

Apr 16, 2023

@mm04926412 How do you get the real tokenizor?