tokenizer.model can't be loaded by SentencePiece: "RuntimeError: Internal: could not parse ModelProto from tokenizer.model"
#109
by
ericx134
- opened
I got a crash when trying to load the "tokenizer.model" using SentencePiece. Any idea why?
from sentencepiece import SentencePieceProcessor
tokenizer_model = "tokenizer.model"
sp_processor = SentencePieceProcessor()
sp_processor.load(tokenizer_model)
Error message:
RuntimeError: Internal: could not parse ModelProto from tokenizer.model
use these in completely new enviroment
https://brev.dev/blog/the-simple-guide-to-fine-tuning-llms
https://github.com/meta-llama/llama-recipes/issues/475
download requirements.txt from :
raw.githubusercontent.com/huggingface/transformers/main/examples/flax/vision/requirements.txt
it wasn't working previously, but after installing in a new environment, it worked fine for llama 3 8B/70B