preprocessor_config.json missing

#1
by premsa - opened

Hello,

is the code for the portation of the model available somewhere? I am running into problems when trying to retrieve the processor of the model. Any assist or pointers towards relevant code would be helpful!

CODE:
self.processor = VisionTextDualEncoderProcessor.from_pretrained(model_name)

ERROR:
.venv/lib/python3.10/site-packages/transformers/utils/hub.py", line 463, in cached_file
raise EnvironmentError(
OSError: calpt/CLIP-ViT-H-14-frozen-xlm-roberta-large-laion5B-s13B-b90k does not appear to have a file named preprocessor_config.json. Checkout 'https://huggingface.co/calpt/CLIP-ViT-H-14-frozen-xlm-roberta-large-laion5B-s13B-b90k/main' for available files.

ALTERNATIVE:
I also tried initializing the processor from:

self.tokenizer = AutoTokenizer.from_pretrained(MODEL.pretrained_dual_text)
self.image_processor = AutoFeatureExtractor.from_pretrained(MODEL.pretrained_dual_image)
self.processor = VisionTextDualEncoderProcessor(self.image_processor, self.tokenizer)

with

pretrained_dual_text = "xlm-roberta-large"
pretrained_dual_image = "laion/CLIP-ViT-H-14-laion2B-s32B-b79K"

which resulted in internal shape error which I assume comes from the fact that I am not using the correct preprocessors for the task?

.venv/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 459, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Expected 3D (unbatched) or 4D (batched) input to conv2d, but got input of size: [32, 150528]

Hey,

you can find the code for porting the model from OpenCLIP here: https://gist.github.com/calpt/8e3555bd11f1916b5169c8125117e5ee

This repo only contains the model checkpoints without tokenizer config or preprocessor config. The correct tokenizer/ preprocessor to use would be the following:

  • tokenizer: xlm-roberta-large
  • preprocessor: laion/CLIP-ViT-H-14-laion2B-s32B-b79K

Sign up or log in to comment