Problem loading the tokenizer.

#1
by nayohan - opened

Hello, I encountered the following problem while loading the tokenizer.

save_llama = ['lcw99/zephykor-ko-beta-7b-chang']
for model_name in save_llama:
    save_model_name = model_name.split('/')[-1]
    print('save_model_name:', save_model_name)
    
    tokenizer = LlamaTokenizer.from_pretrained(model_name)
    tokenizer.save_pretrained(f"{save_path}/{save_model_name}")
    
    model = AutoModel.from_pretrained(model_name)
    model.save_pretrained(f"{save_path}/{save_model_name}")
save_model_name: zephykor-ko-beta-7b-chang
Traceback (most recent call last):
  File "/home/closedai/.test/hybrid-ltm/src/download_model.py", line 35, in <module>
    tokenizer = LlamaTokenizer.from_pretrained(model_name, use_fast=True)
  File "/home/closedai/.conda/envs/sent/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2024, in from_pretrained
    return cls._from_pretrained(
  File "/home/closedai/.conda/envs/sent/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2256, in _from_pretrained
    tokenizer = cls(*init_inputs, **init_kwargs)
  File "/home/closedai/.conda/envs/sent/lib/python3.10/site-packages/transformers/models/llama/tokenization_llama.py", line 178, in __init__
    self.sp_model = self.get_spm_processor(kwargs.pop("from_slow", False))
  File "/home/closedai/.conda/envs/sent/lib/python3.10/site-packages/transformers/models/llama/tokenization_llama.py", line 203, in get_spm_processor
    tokenizer.Load(self.vocab_file)
  File "/home/closedai/.conda/envs/sent/lib/python3.10/site-packages/sentencepiece/__init__.py", line 905, in Load
    return self.LoadFromFile(model_file)
  File "/home/closedai/.conda/envs/sent/lib/python3.10/site-packages/sentencepiece/__init__.py", line 310, in LoadFromFile
    return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
TypeError: not a string

I think the tokenizer.model file wasn't uploaded or is there anything I did wrong? The model loads fine.

try AutoTokenizer.

Changing to AutoTokenizer solved the problem very easily. Thank you! I'll give it a try :)

nayohan changed discussion status to closed

Sign up or log in to comment