ValueError: Cannot instantiate this tokenizer from a slow version. If it's based on sentencepiece, make sure you have sentencepiece installed.

#18
by pseudotensor - opened
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model = 'CohereForAI/c4ai-command-r-v01'

tokenizer = AutoTokenizer.from_pretrained(model, trust_remote_code=True)

now fails with:

You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers
Traceback (most recent call last):
  File "/home/jon/h2ogpt/coheretest1.py", line 5, in <module>
    tokenizer = AutoTokenizer.from_pretrained(model, trust_remote_code=True, add_prefix_space=False)
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 810, in from_pretrained
    return tokenizer_class.from_pretrained(
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2048, in from_pretrained
    return cls._from_pretrained(
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2287, in _from_pretrained
    tokenizer = cls(*init_inputs, **init_kwargs)
  File "/home/jon/.cache/huggingface/modules/transformers_modules/CohereForAI/c4ai-command-r-v01/779ade391d0552f47d38c13745f6e2d33eb3d916/tokenization_cohere_fast.py", line 128, in __init__
    super().__init__(
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 102, in __init__
    raise ValueError(
ValueError: Cannot instantiate this tokenizer from a slow version. If it's based on sentencepiece, make sure you have sentencepiece installed.

This worked yesterday.

My sentencepiece is latest, i.e. 0.2.0. transformers is latest, i.e. 4.38.2.

This does not work either:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model = 'CohereForAI/c4ai-command-r-v01'

tokenizer = AutoTokenizer.from_pretrained(model, trust_remote_code=True, use_fast=False)

gives:

Traceback (most recent call last):
  File "/home/jon/h2ogpt/coheretest1.py", line 5, in <module>
    tokenizer = AutoTokenizer.from_pretrained(model, trust_remote_code=True, use_fast=False)
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 806, in from_pretrained
    tokenizer_class = get_class_from_dynamic_module(class_ref, pretrained_model_name_or_path, **kwargs)
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/transformers/dynamic_module_utils.py", line 479, in get_class_from_dynamic_module
    if "--" in class_reference:
TypeError: argument of type 'NoneType' is not iterable
Cohere For AI org

hey, this should be fixed now. Can you please try again?

pseudotensor changed discussion status to closed

I just got started with command-r )quantized version) and still have this issue!

Sign up or log in to comment