meta-llama/Meta-Llama-3-8B-Instruct · Problem with the tokenizer

Apr 19

•

Hello Everyone,
I am having some issues using the model, after having download the model on a local repository i'm trying to load it for text summarization purposes. The model load without error but what i get when loading from pretrained the tokenizer is the following (would somebody know what i might be doing wrong here ?):

model = LlamaForCausalLM.from_pretrained(model_dir)
tokenizer = LlamaTokenizer.from_pretrained(model_dir)

#tokenizer = LlamaTokenizer.from_pretrained(model_dir, add_eos_token=True, use_fast=True)

The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization.
The tokenizer class you load from this checkpoint is 'PreTrainedTokenizerFast'.
The class this function is called from is 'LlamaTokenizer'.
Traceback (most recent call last):

Cell In[10], line 1
tokenizer = LlamaTokenizer.from_pretrained(model_dir, add_eos_token=True, use_fast=True)

File ~\venvs\envllama\lib\site-packages\transformers\tokenization_utils_base.py:2089 in from_pretrained
return cls._from_pretrained(

File ~\venvs\envllama\lib\site-packages\transformers\tokenization_utils_base.py:2311 in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)

File ~\venvs\envllama\lib\site-packages\transformers\models\llama\tokenization_llama.py:169 in init
self.sp_model = self.get_spm_processor(kwargs.pop("from_slow", False))

File ~\venvs\envllama\lib\site-packages\transformers\models\llama\tokenization_llama.py:196 in get_spm_processor
tokenizer.Load(self.vocab_file)

File ~\venvs\envllama\lib\site-packages\sentencepiece_init_.py:961 in Load
return self.LoadFromFile(model_file)

File ~\venvs\envllama\lib\site-packages\sentencepiece_init_.py:316 in LoadFromFile
return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)

TypeError: not a string

Thank you !

mohitmayank

Apr 20

Try AutoTokenizer

pprp

Apr 21

Same problem and AutoTokenizer works fine