tokenizer.model failing to load

#2
by tylerdev - opened

Hi, I see you just updated this to include a tokenizer model.

It seems to be causing some issues with sentencepiece when trying to load the model. My error is below:

2024-02-13T00:16:16.475613939Z     self.tokenizer = ExLlamaV2Tokenizer(config)
2024-02-13T00:16:16.475615049Z   File "/usr/local/lib/python3.10/dist-packages/exllamav2/tokenizer.py", line 65, in __init__
2024-02-13T00:16:16.475616229Z     if os.path.exists(path_spm) and not force_json: self.tokenizer = ExLlamaV2TokenizerSPM(path_spm)
2024-02-13T00:16:16.475617419Z   File "/usr/local/lib/python3.10/dist-packages/exllamav2/tokenizers/spm.py", line 9, in __init__
2024-02-13T00:16:16.475618739Z     self.spm = SentencePieceProcessor(model_file = tokenizer_model)
2024-02-13T00:16:16.475619839Z   File "/usr/local/lib/python3.10/dist-packages/sentencepiece/__init__.py", line 447, in Init
2024-02-13T00:16:16.475620919Z     self.Load(model_file=model_file, model_proto=model_proto)
2024-02-13T00:16:16.475622029Z   File "/usr/local/lib/python3.10/dist-packages/sentencepiece/__init__.py", line 905, in Load
2024-02-13T00:16:16.475623039Z     return self.LoadFromFile(model_file)
2024-02-13T00:16:16.475624149Z   File "/usr/local/lib/python3.10/dist-packages/sentencepiece/__init__.py", line 310, in LoadFromFile
2024-02-13T00:16:16.475625379Z     return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
2024-02-13T00:16:16.475626769Z RuntimeError: Internal: src/sentencepiece_processor.cc(1101) [model_proto->ParseFromArray(serialized.data(), serialized.size())] 

Any ideas on what might be causing this? I confirmed that the tokenizer model downloaded so not sure what to try.

Anyway, great model!! Thanks for publishing this.

That is because I failed to upload the right one.

Should be fixed now, let me know if it isn't!

That worked, thanks!

tylerdev changed discussion status to closed

Sign up or log in to comment