ValueError: Tokenizer class CohereTokenizer does not exist or is not currently imported.

#25
by chandrak - opened

Running this example script gives an error

# pip install 'git+https://github.com/huggingface/transformers.git' bitsandbytes accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

bnb_config = BitsAndBytesConfig(load_in_8bit=True)

model_id = "CohereForAI/c4ai-command-r-plus"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config)

# Format message with the command-r-plus chat template
messages = [{"role": "user", "content": "Hello, how are you?"}]
input_ids = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt")
## <BOS_TOKEN><|START_OF_TURN_TOKEN|><|USER_TOKEN|>Hello, how are you?<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|>

gen_tokens = model.generate(
    input_ids, 
    max_new_tokens=100, 
    do_sample=True, 
    temperature=0.3,
    )

gen_text = tokenizer.decode(gen_tokens[0])
print(gen_text)

try pip install 'git+https://github.com/huggingface/transformers.git' bitsandbytes accelerate to install latest transformers, it use a new tokenizer called CohereTokenizer which added in new version of transformers

try pip install 'git+https://github.com/huggingface/transformers.git' bitsandbytes accelerate to install latest transformers, it use a new tokenizer called CohereTokenizer which added in new version of transformers

Which transformers version exactly? I followed this instruction and end up with transformers=4.40.0.dev0, but I still can't import CohereTokenizer.

image.png

try pip install 'git+https://github.com/huggingface/transformers.git' bitsandbytes accelerate to install latest transformers, it use a new tokenizer called CohereTokenizer which added in new version of transformers

Which transformers version exactly? I followed this instruction and end up with transformers=4.40.0.dev0, but I still can't import CohereTokenizer.

image.png

There is no CohereTokenizer but there is CohereTokenizerFast.
Try modify the tokenizer_class from "CohereTokenizer" to "CohereTokenizerFast" in tokenizer_config.json

Cohere For AI org

hi @xiangrong , can you try with AutoTokenizer, it should work because it is mapped to the correct Tokenizer class.

@ahmetustun Thankyou, it works

@ahmetustun it's very weird that AutoTokenizer only works when use_fast is set to True which is the case with AutoTokenizer by default. If you manually set it to False it will throw this error:

In [6]: import transformers

In [7]: tokenizer = AutoTokenizer.from_pretrained('.models/ArabicLLM', use_fast=False)

ValueError Traceback (most recent call last)
Cell In[7], line 1
----> 1 tokenizer = AutoTokenizer.from_pretrained('.models/ArabicLLM', use_fast=False)

File ~/anaconda3/envs/llava/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py:877, in AutoTokenizer.from_pretrained(cls, pretrained_model_name_or_path, *inputs, **kwargs)
875 tokenizer_class = tokenizer_class_from_name(tokenizer_class_candidate)
876 if tokenizer_class is None:
--> 877 raise ValueError(
878 f"Tokenizer class {tokenizer_class_candidate} does not exist or is not currently imported."
879 )
880 return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
882 # Otherwise we have to be creative.
883 # if model is an encoder decoder, the encoder tokenizer class is used by default

ValueError: Tokenizer class CohereTokenizer does not exist or is not currently imported.

Sign up or log in to comment