Tokenizer class CodeLlamaTokenizer does not exist or is not currently imported.

#10
by cofade - opened

When trying to run the model, I get the error
"Tokenizer class CodeLlamaTokenizer does not exist or is not currently imported."

It is raised by "Lib\site-packages\transformers\models\auto\tokenization_auto.py", line 724, as it is indeed not contained in the TOKENIZER_MAPPING_NAMES OrderedDict.
The requested tokenizer "CodeLlamaTokenizer" is defined in "models\codellama_CodeLlama-7b-Instruct-hf\tokenizer_config.json".

Can you please help me with this issue?

Code Llama org

Hi @cofade !

You need to install transformers from the main development branch, because the Code Llama changes have not been released through PyPi yet. This is how you'd do it:

pip install git+https://github.com/huggingface/transformers.git@main

Hope that helps!

Hi @pcuenq , I did this, but how do i use the downloaded repo? also it's not even telling me where it downloaded it?

Post running the pip install, use it normally as you would have any python package. from transformers import AutoModelForCausalLM, AutoTokenizer, AutoConfig. Then tokenizer = AutoTokenizer.from_pretrained("codellama/CodeLlama-7b-Instruct-hf")

@saksham-lamini doesn't from_pretrained automatically download the model from api? what is the point of downloading the git repo also then?

@Emrys95 you're right that from_pretrained will download the model or in this case the token shards from the API, but it would try to load it into a CodeLlamaTokenizer class which does not exist if you did a normal pip install.

pip install git+https://github.com/huggingface/transformers.git@main

Thank you @pcuenq , that worked perfectly! I am using this LLM with the oobabooga Web UI, and the installer didn't provide the correct transformers version yet.

@pcuenq I've been trying for days to get one of these models running, always running into one problem or another, such as python package conflicts (im new at this, yes), could you please give me some valid code i can just copy/paste and i can work to get it running? So far only GPT2 has worked for me, the very old version, but fine tuning it has resulted in catastrophic forgetting where it cant answer anything except my own document which i fed to it. If you could guide me in the right direction i'd appreciate it.

@Emrys95 yeah I too need some valid full code as there have been a lot of dependancy issues coming

same issue.

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
Cell In[9], line 1
----> 1 from transformers import CodeLlamaTokenizer

ImportError: cannot import name 'CodeLlamaTokenizer' from 'transformers' (/Users/hawei/miniconda3/envs/lang/lib/python3.10/site-packages/transformers/__init__.py)

same issue.

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
Cell In[9], line 1
----> 1 from transformers import CodeLlamaTokenizer

ImportError: cannot import name 'CodeLlamaTokenizer' from 'transformers' (/Users/hawei/miniconda3/envs/lang/lib/python3.10/site-packages/transformers/__init__.py)

This error fix after I re-install main branch transformers.

But I get a new error.

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[5], line 1
----> 1 tokenizer = AutoTokenizer.from_pretrained(model)

File ~/miniconda3/envs/lang/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py:735, in AutoTokenizer.from_pretrained(cls, pretrained_model_name_or_path, *inputs, **kwargs)
    731     if tokenizer_class is None:
    732         raise ValueError(
    733             f"Tokenizer class {tokenizer_class_candidate} does not exist or is not currently imported."
    734         )
--> 735     return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
    737 # Otherwise we have to be creative.
    738 # if model is an encoder decoder, the encoder tokenizer class is used by default
    739 if isinstance(config, EncoderDecoderConfig):

File ~/miniconda3/envs/lang/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:1854, in PreTrainedTokenizerBase.from_pretrained(cls, pretrained_model_name_or_path, cache_dir, force_download, local_files_only, token, revision, *init_inputs, **kwargs)
   1851     else:
   1852         logger.info(f"loading file {file_path} from cache at {resolved_vocab_files[file_id]}")
-> 1854 return cls._from_pretrained(
   1855     resolved_vocab_files,
   1856     pretrained_model_name_or_path,
   1857     init_configuration,
   1858     *init_inputs,
   1859     token=token,
   1860     cache_dir=cache_dir,
   1861     local_files_only=local_files_only,
   1862     _commit_hash=commit_hash,
   1863     _is_local=is_local,
   1864     **kwargs,
   1865 )

File ~/miniconda3/envs/lang/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:2017, in PreTrainedTokenizerBase._from_pretrained(cls, resolved_vocab_files, pretrained_model_name_or_path, init_configuration, token, cache_dir, local_files_only, _commit_hash, _is_local, *init_inputs, **kwargs)
   2015 # Instantiate tokenizer.
   2016 try:
-> 2017     tokenizer = cls(*init_inputs, **init_kwargs)
   2018 except OSError:
   2019     raise OSError(
   2020         "Unable to load vocabulary from file. "
   2021         "Please check that the provided vocabulary is accessible and not corrupted."
   2022     )

File ~/miniconda3/envs/lang/lib/python3.10/site-packages/transformers/models/code_llama/tokenization_code_llama_fast.py:154, in CodeLlamaTokenizerFast.__init__(self, vocab_file, tokenizer_file, clean_up_tokenization_spaces, unk_token, bos_token, eos_token, prefix_token, middle_token, suffix_token, eot_token, fill_token, add_bos_token, add_eos_token, **kwargs)
    151 self.update_post_processor()
    153 self.vocab_file = vocab_file
--> 154 self.can_save_slow_tokenizer = False if not self.vocab_file else True
    156 self._prefix_token = prefix_token
    157 self._middle_token = middle_token

AttributeError: can't set attribute 'can_save_slow_tokenizer'
Code Llama org

This was fixed on main!

This was fixed on main!

Actully I still face this issue.

pip installing from the main branch fixes the issue, but installing from the main branch will also cause a latency bug that slows down inference speed when using 4bit.

Edit: Fixed by pip installing directly from the branch which added CodeLlama support: https://github.com/huggingface/transformers/pull/25740

Installed using: pip install git+https://github.com/ArthurZucker/transformers.git@add-llama-code

Code Llama org

@opencode could you please explain more about the latency bug you mentioned?

This was fixed on main!

Sign up or log in to comment