Fail to use tokenzier

#3
by eeyrw - opened

Traceback (most recent call last):
File "F:\diffusers-test\translate.py", line 3, in
tokenizer = AutoTokenizer.from_pretrained("facebook/mbart-large-50-many-to-one-mmt",cache_dir='.')
File "F:\diffusers-test\diffusers_venv\lib\site-packages\transformers\models\auto\tokenization_auto.py", line 619, in from_pretrained
return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
File "F:\diffusers-test\diffusers_venv\lib\site-packages\transformers\tokenization_utils_base.py", line 1777, in from_pretrained
return cls._from_pretrained(
File "F:\diffusers-test\diffusers_venv\lib\site-packages\transformers\tokenization_utils_base.py", line 1932, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
File "F:\diffusers-test\diffusers_venv\lib\site-packages\transformers\models\mbart50\tokenization_mbart50_fast.py", line 135, in init
super().init(
File "F:\diffusers-test\diffusers_venv\lib\site-packages\transformers\tokenization_utils_fast.py", line 120, in init
raise ValueError(
ValueError: Couldn't instantiate the backend tokenizer from one of:
(1) a tokenizers library serialization file,
(2) a slow tokenizer instance to convert or
(3) an equivalent slow tokenizer class to instantiate and convert.
You need to have sentencepiece installed to convert a slow tokenizer to a fast one.

To fix this is quite trick...
I installed sentencepiece then get protobuf version incompatible error...

TypeError: Descriptors cannot not be created directly.
If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
If you cannot immediately regenerate your protos, some other possible workarounds are:

  1. Downgrade the protobuf package to 3.20.x or lower.

Then install protobuf==3.20.3
Finally get it worked.

eeyrw changed discussion status to closed

Sign up or log in to comment