tokenizer.model can't be loaded by SentencePiece: "RuntimeError: Internal: could not parse ModelProto from tokenizer.model"

#109
by ericx134 - opened

I got a crash when trying to load the "tokenizer.model" using SentencePiece. Any idea why?

from sentencepiece import SentencePieceProcessor
tokenizer_model = "tokenizer.model"
sp_processor = SentencePieceProcessor()
sp_processor.load(tokenizer_model)

Error message:
RuntimeError: Internal: could not parse ModelProto from tokenizer.model

use these in completely new enviroment

https://brev.dev/blog/the-simple-guide-to-fine-tuning-llms
https://github.com/meta-llama/llama-recipes/issues/475

download requirements.txt from :
raw.githubusercontent.com/huggingface/transformers/main/examples/flax/vision/requirements.txt

it wasn't working previously, but after installing in a new environment, it worked fine for llama 3 8B/70B

Sign up or log in to comment