Problem with the tokenizer

by Douedos - opened

Hello Everyone,
I am having some issues using the model, after having download the model on a local repository i'm trying to load it for text summarization purposes. The model load without error but what i get when loading from pretrained the tokenizer is the following (would somebody know what i might be doing wrong here ?):

model = LlamaForCausalLM.from_pretrained(model_dir)
tokenizer = LlamaTokenizer.from_pretrained(model_dir)

#tokenizer = LlamaTokenizer.from_pretrained(model_dir, add_eos_token=True, use_fast=True)

The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization.
The tokenizer class you load from this checkpoint is 'PreTrainedTokenizerFast'.
The class this function is called from is 'LlamaTokenizer'.
Traceback (most recent call last):

Cell In[10], line 1
tokenizer = LlamaTokenizer.from_pretrained(model_dir, add_eos_token=True, use_fast=True)

File ~\venvs\envllama\lib\site-packages\transformers\ in from_pretrained
return cls._from_pretrained(

File ~\venvs\envllama\lib\site-packages\transformers\ in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)

File ~\venvs\envllama\lib\site-packages\transformers\models\llama\ in init
self.sp_model = self.get_spm_processor(kwargs.pop("from_slow", False))

File ~\venvs\envllama\lib\site-packages\transformers\models\llama\ in get_spm_processor

File ~\venvs\envllama\lib\site-packages\ in Load
return self.LoadFromFile(model_file)

File ~\venvs\envllama\lib\site-packages\ in LoadFromFile
return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)

TypeError: not a string

Thank you !

Try AutoTokenizer

Same problem and AutoTokenizer works fine

Sign up or log in to comment