processor = PaliGemmaProcessor.from_pretrained(model_id) issue

by lvfengchun - opened 12 days ago

12 days ago

Traceback (most recent call last):
File "/Disk/lfc/paligemma2/inference.py", line 13, in
processor = PaliGemmaProcessor.from_pretrained(model_id)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Disk/Miniforge3/envs/lfc_test/lib/python3.11/site-packages/transformers/processing_utils.py", line 892, in from_pretrained
args = cls._get_arguments_from_pretrained(pretrained_model_name_or_path, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Disk/Miniforge3/envs/lfc_test/lib/python3.11/site-packages/transformers/processing_utils.py", line 938, in _get_arguments_from_pretrained
args.append(attribute_class.from_pretrained(pretrained_model_name_or_path, **kwargs))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Disk/Miniforge3/envs/lfc_test/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 2271, in from_pretrained
return cls._from_pretrained(
^^^^^^^^^^^^^^^^^^^^^
File "/Disk/Miniforge3/envs/lfc_test/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 2505, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Disk/Miniforge3/envs/lfc_test/lib/python3.11/site-packages/transformers/models/gemma/tokenization_gemma_fast.py", line 103, in init
super().init(
File "/Disk/Miniforge3/envs/lfc_test/lib/python3.11/site-packages/transformers/tokenization_utils_fast.py", line 115, in init
fast_tokenizer = TokenizerFast.from_file(fast_tokenizer_file)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Exception: data did not match any variant of untagged enum ModelWrapper at line 2591977 column 3

GopiUppari

Google org 11 days ago

Hi @lvfengchun ,

I didn't encounter an error, could you please refer to this gist file.

Getting an error because the PaliGemmaProcessor is unable to load the tokenizer due to an issue with the tokenizer file (tokenizer.json) got corrupted, incompatible, or incorrectly formatted.

To solve this issue, please make sure that use the PaliGemmaProcessor and model from the same checkpoint.

model_id = "google/paligemma2-3b-pt-896"
processor = PaliGemmaProcessor.from_pretrained(model_id)
model = PaliGemmaForConditionalGeneration.from_pretrained(model_id)

Suppose, if your working on your local system then delete the local cache of the tokenizer files for the model and redownload them.

If you still persists an issue, please let me know.

Thank you.

Xenova

Google org 11 days ago

This is most likely due to an outdated version of transformers/tokenizers. Upgrading should fix the issue!

lvfengchun

11 days ago

Hi @GopiUppari I loaded the model and tokenizer locally, and I have re-downloaded the tokenizer.json file, but I still have this error.

lvfengchun

11 days ago

•

edited 11 days ago

@Xenova transformers==4.47.0 it does work for me
thanks

Xenova

Google org 10 days ago

Happy to help!

Xenova changed discussion status to closed 10 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment