Sep 5, 2023

@ydshieh
it was working fine till yesterday & today i have got this error:
AttributeError: can't set attribute 'can_save_slow_tokenizer'

library installation:
!pip install transformers
!pip install sentencepiece

system specs:
OS: AWS Sagemaker(Amazon Linux 2, Jupyter Lab 3
(notebook-al2-v2))
Python: 3.10
Transformers: 4.31.0
sentencepiece: 0.1.99
PyTorch: 2.0.1
CUDA (python -c 'import torch; print(torch.version.cuda)'): 11.8

code:
import requests

from PIL import Image
from transformers import AutoProcessor, AutoModelForVision2Seq

model = AutoModelForVision2Seq.from_pretrained("ydshieh/kosmos-2-patch14-224", trust_remote_code=True)
processor = AutoProcessor.from_pretrained("ydshieh/kosmos-2-patch14-224", trust_remote_code=True)

prompt = "An image of"

url = "https://huggingface.co/ydshieh/kosmos-2-patch14-224/resolve/main/snowman.png"
image = Image.open(requests.get(url, stream=True).raw)

The original Kosmos-2 demo saves the image first then reload it. For some images, this will give slightly different image input and change the generation outputs.

Uncomment the following 2 lines if you want to match the original demo's outputs.

(One example is the `two_dogs.jpg` from the demo)

image.save("new_image.jpg")

image = Image.open("new_image.jpg")

inputs = processor(text=prompt, images=image, return_tensors="pt")

generated_ids = model.generate(
pixel_values=inputs["pixel_values"],
input_ids=inputs["input_ids"][:, :-1],
attention_mask=inputs["attention_mask"][:, :-1],
img_features=None,
img_attn_mask=inputs["img_attn_mask"][:, :-1],
use_cache=True,
max_new_tokens=64,
)
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]

Specify `cleanup_and_extract=False` in order to see the raw model generation.

processed_text = processor.post_process_generation(generated_text, cleanup_and_extract=False)

print(processed_text)

`<grounding> An image of<phrase> a snowman</phrase><object><patch_index_0044><patch_index_0863></object> warming himself by<phrase> a fire</phrase><object><patch_index_0005><patch_index_0911></object>.`

By default, the generated text is cleanup and the entities are extracted.

processed_text, entities = processor.post_process_generation(generated_text)

print(processed_text)

`An image of a snowman warming himself by a fire.`

print(entities)

`[('a snowman', (12, 21), [(0.390625, 0.046875, 0.984375, 0.828125)]), ('a fire', (41, 47), [(0.171875, 0.015625, 0.484375, 0.890625)])]`

Error:

AttributeError Traceback (most recent call last)
Cell In[14], line 8
4 from transformers import AutoProcessor, AutoModelForVision2Seq
7 model = AutoModelForVision2Seq.from_pretrained("ydshieh/kosmos-2-patch14-224", trust_remote_code=True)
----> 8 processor = AutoProcessor.from_pretrained("ydshieh/kosmos-2-patch14-224", trust_remote_code=True)
10 prompt = "An image of"
12 url = "https://huggingface.co/ydshieh/kosmos-2-patch14-224/resolve/main/snowman.png"

File ~/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/transformers/models/auto/processing_auto.py:283, in AutoProcessor.from_pretrained(cls, pretrained_model_name_or_path, **kwargs)
281 if os.path.isdir(pretrained_model_name_or_path):
282 processor_class.register_for_auto_class()
--> 283 return processor_class.from_pretrained(
284 pretrained_model_name_or_path, trust_remote_code=trust_remote_code, **kwargs
285 )
286 elif processor_class is not None:
287 return processor_class.from_pretrained(
288 pretrained_model_name_or_path, trust_remote_code=trust_remote_code, **kwargs
289 )

File ~/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/transformers/processing_utils.py:226, in ProcessorMixin.from_pretrained(cls, pretrained_model_name_or_path, cache_dir, force_download, local_files_only, token, revision, **kwargs)
223 if token is not None:
224 kwargs["token"] = token
--> 226 args = cls._get_arguments_from_pretrained(pretrained_model_name_or_path, **kwargs)
227 return cls(*args)

File ~/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/transformers/processing_utils.py:270, in ProcessorMixin._get_arguments_from_pretrained(cls, pretrained_model_name_or_path, **kwargs)
267 else:
268 attribute_class = getattr(transformers_module, class_name)
--> 270 args.append(attribute_class.from_pretrained(pretrained_model_name_or_path, **kwargs))
271 return args

File ~/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py:723, in AutoTokenizer.from_pretrained(cls, pretrained_model_name_or_path, *inputs, **kwargs)
721 if os.path.isdir(pretrained_model_name_or_path):
722 tokenizer_class.register_for_auto_class()
--> 723 return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
724 elif config_tokenizer_class is not None:
725 tokenizer_class = None

File ~/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:1854, in PreTrainedTokenizerBase.from_pretrained(cls, pretrained_model_name_or_path, cache_dir, force_download, local_files_only, token, revision, *init_inputs, **kwargs)
1851 else:
1852 logger.info(f"loading file {file_path} from cache at {resolved_vocab_files[file_id]}")
-> 1854 return cls._from_pretrained(
1855 resolved_vocab_files,
1856 pretrained_model_name_or_path,
1857 init_configuration,
1858 *init_inputs,
1859 token=token,
1860 cache_dir=cache_dir,
1861 local_files_only=local_files_only,
1862 _commit_hash=commit_hash,
1863 _is_local=is_local,
1864 **kwargs,
1865 )

File ~/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:2017, in PreTrainedTokenizerBase._from_pretrained(cls, resolved_vocab_files, pretrained_model_name_or_path, init_configuration, token, cache_dir, local_files_only, _commit_hash, _is_local, *init_inputs, **kwargs)
2015 # Instantiate tokenizer.
2016 try:
-> 2017 tokenizer = cls(*init_inputs, **init_kwargs)
2018 except OSError:
2019 raise OSError(
2020 "Unable to load vocabulary from file. "
2021 "Please check that the provided vocabulary is accessible and not corrupted."
2022 )

File ~/.cache/huggingface/modules/transformers_modules/ydshieh/kosmos-2-patch14-224/d591f18e3ce08debe6bbdc7117b87a1595450179/tokenization_kosmos2_fast.py:140, in Kosmos2TokenizerFast.init(self, vocab_file, tokenizer_file, bos_token, eos_token, sep_token, cls_token, unk_token, pad_token, mask_token, num_patch_index_tokens, add_tag_and_patch_index_tokens, **kwargs)
126 super().init(
127 vocab_file,
128 tokenizer_file=tokenizer_file,
(...)
136 **kwargs,
137 )
139 self.vocab_file = vocab_file
--> 140 self.can_save_slow_tokenizer = False if not self.vocab_file else True
142 self.eod_token = ""
144 self.boi_token = ""

AttributeError: can't set attribute 'can_save_slow_tokenizer'

ydshieh

Owner Sep 5, 2023

Hi, yes, I also realized this yesterday.

I have to discuss this with some team members, and update the files on the Hub too. Sorry for this happening.

ydshieh

Owner Sep 5, 2023

Hello again. I have done something in the meantime - it should work for now.

Ashwath-Shetty

Sep 5, 2023

thanks for replying @ydshieh , no problem , i'm running some experiments, do you think will this be fixed by the end of this week?

ydshieh

Owner Sep 5, 2023

It should work now already. Let me know if you still encounter some issues.

Ashwath-Shetty

Sep 5, 2023

@ydshieh
i restarted, re ran the whole thing & got a new error this time.
code,environment & everything else is same.

error:

TypeError Traceback (most recent call last)
Cell In[7], line 21
13 image = Image.open(requests.get(url, stream=True).raw)
15 # The original Kosmos-2 demo saves the image first then reload it. For some images, this will give slightly different image input and change the generation outputs.
16 # Uncomment the following 2 lines if you want to match the original demo's outputs.
17 # (One example is the two_dogs.jpg from the demo)
18 # image.save("new_image.jpg")
19 # image = Image.open("new_image.jpg")
---> 21 inputs = processor(text=prompt, images=image, return_tensors="pt")
23 generated_ids = model.generate(
24 pixel_values=inputs["pixel_values"],
25 input_ids=inputs["input_ids"][:, :-1],
(...)
30 max_new_tokens=64,
31 )
32 generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]

File ~/.cache/huggingface/modules/transformers_modules/ydshieh/kosmos-2-patch14-224/48e3edebaeb02dc9fe105f40e85a43a3b440dc72/processing_kosmos2.py:131, in Kosmos2Processor.call(self, images, text, bboxes, num_image_tokens, first_image_token_id, add_special_tokens, padding, truncation, max_length, stride, pad_to_multiple_of, return_attention_mask, return_overflowing_tokens, return_special_tokens_mask, return_offsets_mapping, return_token_type_ids, return_length, verbose, return_tensors, **kwargs)
128 encoding.update(text_encoding)
130 if images is not None:
--> 131 image_encoding = self.image_processor(images, return_tensors=return_tensors)
132 encoding.update(image_encoding)
134 # Use the id of the first token after

File ~/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/transformers/image_processing_utils.py:494, in BaseImageProcessor.call(self, images, **kwargs)
492 def call(self, images, **kwargs) -> BatchFeature:
493 """Preprocess an image or a batch of images."""
--> 494 return self.preprocess(images, **kwargs)

File ~/.cache/huggingface/modules/transformers_modules/ydshieh/kosmos-2-patch14-224/48e3edebaeb02dc9fe105f40e85a43a3b440dc72/image_processing_kosmos2.py:277, in Kosmos2ImageProcessor.preprocess(self, images, do_resize, size, resample, do_center_crop, crop_size, do_rescale, rescale_factor, do_normalize, image_mean, image_std, do_convert_rgb, return_tensors, data_format, input_data_format, **kwargs)
274 input_data_format = infer_channel_dimension_format(images[0])
276 if do_resize:
--> 277 images = [
278 self.resize(image=image, size=size, resample=resample, input_data_format=input_data_format)
279 for image in images
280 ]
282 if do_center_crop:
283 images = [
284 self.center_crop(image=image, size=crop_size, input_data_format=input_data_format) for image in images
285 ]

File ~/.cache/huggingface/modules/transformers_modules/ydshieh/kosmos-2-patch14-224/48e3edebaeb02dc9fe105f40e85a43a3b440dc72/image_processing_kosmos2.py:278, in (.0)
274 input_data_format = infer_channel_dimension_format(images[0])
276 if do_resize:
277 images = [
--> 278 self.resize(image=image, size=size, resample=resample, input_data_format=input_data_format)
279 for image in images
280 ]
282 if do_center_crop:
283 images = [
284 self.center_crop(image=image, size=crop_size, input_data_format=input_data_format) for image in images
285 ]

File ~/.cache/huggingface/modules/transformers_modules/ydshieh/kosmos-2-patch14-224/48e3edebaeb02dc9fe105f40e85a43a3b440dc72/image_processing_kosmos2.py:150, in Kosmos2ImageProcessor.resize(self, image, size, resample, data_format, input_data_format, **kwargs)
148 if "shortest_edge" not in size:
149 raise ValueError(f"The size parameter must contain the key shortest_edge. Got {size.keys()}")
--> 150 output_size = get_resize_output_image_size(
151 image, size=size["shortest_edge"], input_data_format=input_data_format
152 )
153 return resize(
154 image,
155 size=output_size,
(...)
159 **kwargs,
160 )

TypeError: get_resize_output_image_size() got an unexpected keyword argument 'input_data_format'

ydshieh

Owner Sep 5, 2023

I am not able to reproduce the issue with input_data_format. Are you still using the exact same code snippet I put in the README.md file? Also, what's your tranformers version?

Ashwath-Shetty

Sep 5, 2023

yes, i'm using the exact same code from the tutorial, transformers=4.31.0.
everything is same from my first comment in this thread.
@ydshieh

ydshieh

Owner Sep 5, 2023

In this case , better if you provide a google colab notebook that shows the issue and share a link to that notebook.

Ashwath-Shetty

Sep 5, 2023

thanks, i have restarted the instance & re did the whole thing, it's working now.

ydshieh changed discussion status to closed Sep 5, 2023

AttributeError: can't set attribute 'can_save_slow_tokenizer'

The original Kosmos-2 demo saves the image first then reload it. For some images, this will give slightly different image input and change the generation outputs.

Uncomment the following 2 lines if you want to match the original demo's outputs.

(One example is the two_dogs.jpg from the demo)

image.save("new_image.jpg")

image = Image.open("new_image.jpg")

Specify cleanup_and_extract=False in order to see the raw model generation.

<grounding> An image of<phrase> a snowman</phrase><object><patch_index_0044><patch_index_0863></object> warming himself by<phrase> a fire</phrase><object><patch_index_0005><patch_index_0911></object>.

By default, the generated text is cleanup and the entities are extracted.

An image of a snowman warming himself by a fire.

[('a snowman', (12, 21), [(0.390625, 0.046875, 0.984375, 0.828125)]), ('a fire', (41, 47), [(0.171875, 0.015625, 0.484375, 0.890625)])]

Error:

(One example is the `two_dogs.jpg` from the demo)

Specify `cleanup_and_extract=False` in order to see the raw model generation.

`<grounding> An image of<phrase> a snowman</phrase><object><patch_index_0044><patch_index_0863></object> warming himself by<phrase> a fire</phrase><object><patch_index_0005><patch_index_0911></object>.`

`An image of a snowman warming himself by a fire.`

`[('a snowman', (12, 21), [(0.390625, 0.046875, 0.984375, 0.828125)]), ('a fire', (41, 47), [(0.171875, 0.015625, 0.484375, 0.890625)])]`