LanguageBind/LanguageBind_Audio · Cannot instantiate using `from

Oct 30, 2023

Trying to load the model currently fails:
```
In [6]: model = CLIPModel.from_pretrained("LanguageBind/LanguageBind_Audio")
You are using a model of type LanguageBindAudio to instantiate a model of type clip. This is not supported for all configurations of models and can yield errors.

RuntimeError Traceback (most recent call last)
Cell In[6], line 1
----> 1 model = CLIPModel.from_pretrained("LanguageBind/LanguageBind_Audio")

File /..../miniconda3/envs/LanguageBind/lib/python3.9/site-packages/transformers/modeling_utils.py:2881, in PreTrainedModel.from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
2871 if dtype_orig is not None:
2872 torch.set_default_dtype(dtype_orig)
2874 (
2875 model,
2876 missing_keys,
2877 unexpected_keys,
2878 mismatched_keys,
2879 offload_index,
2880 error_msgs,
-> 2881 ) = cls._load_pretrained_model(
2882 model,
2883 state_dict,
2884 loaded_state_dict_keys, # XXX: rename?
2885 resolved_archive_file,
2886 pretrained_model_name_or_path,
2887 ignore_mismatched_sizes=ignore_mismatched_sizes,
2888 sharded_metadata=sharded_metadata,
2889 _fast_init=_fast_init,
2890 low_cpu_mem_usage=low_cpu_mem_usage,
2891 device_map=device_map,
2892 offload_folder=offload_folder,
2893 offload_state_dict=offload_state_dict,
2894 dtype=torch_dtype,
2895 is_quantized=(load_in_8bit or load_in_4bit),
2896 keep_in_fp32_modules=keep_in_fp32_modules,
2897 )
2899 model.is_loaded_in_4bit = load_in_4bit
2900 model.is_loaded_in_8bit = load_in_8bit

File /..../miniconda3/envs/LanguageBind/lib/python3.9/site-packages/transformers/modeling_utils.py:3278, in PreTrainedModel._load_pretrained_model(cls, model, state_dict, loaded_keys, resolved_archive_file, pretrained_model_name_or_path, ignore_mismatched_sizes, sharded_metadata, _fast_init, low_cpu_mem_usage, device_map, offload_folder, offload_state_dict, dtype, is_quantized, keep_in_fp32_modules)
3274 if "size mismatch" in error_msg:
3275 error_msg += (
3276 "\n\tYou may consider adding ignore_mismatched_sizes=True in the model from_pretrained method."
3277 )
-> 3278 raise RuntimeError(f"Error(s) in loading state_dict for {model.class.name}:\n\t{error_msg}")
3280 if is_quantized:
3281 unexpected_keys = [elem for elem in unexpected_keys if "SCB" not in elem]

RuntimeError: Error(s) in loading state_dict for CLIPModel:
size mismatch for vision_model.embeddings.position_ids: copying a param with shape torch.Size([1, 577]) from checkpoint, the shape in current model is torch.Size([1, 257]).
size mismatch for vision_model.embeddings.position_embedding.weight: copying a param with shape torch.Size([577, 1024]) from checkpoint, the shape in current model is torch.Size([257, 1024]).
You may consider adding ignore_mismatched_sizes=True in the model from_pretrained method.


Running with `ignore_mismatched_sizes` succeeds but many weights not being used or being freshly initialised. This is likely not correct.

It would be nice if we could instantiate the model using `from_pretrained` without error.

Thanks~

LanguageBind

Owner Oct 31, 2023

•

edited Oct 31, 2023

You can try by following this.
Feel free to tell me if it run or not.

LanguageBind changed discussion status to closed Oct 31, 2023

afmck

Nov 1, 2023

Yes that works, but the config in this repo indicates that the weights and config should be usable by CLIPModel. Is this not the case?

afmck changed discussion status to open Nov 1, 2023

LanguageBind

Owner Nov 1, 2023

Yes that works, but the config in this repo indicates that the weights and config should be usable by CLIPModel. Is this not the case?

That's typo. We will release a stronger audio model and fix it.

afmck

Nov 1, 2023

Nice thanks, could you share more details about this stronger model?

LanguageBind

Owner Nov 4, 2023

Nice thanks, could you share more details about this stronger model?

We have released the stronger model, the results can be found here.
The checkpoint also has updated!

LanguageBind
/

LanguageBind_Audio

Cannot instantiate using `from_pretrained`

Trying to load the model currently fails:
```
In [6]: model = CLIPModel.from_pretrained("LanguageBind/LanguageBind_Audio")
You are using a model of type LanguageBindAudio to instantiate a model of type clip. This is not supported for all configurations of models and can yield errors.

Cannot instantiate using `from_pretrained`

Trying to load the model currently fails:```In [6]: model = CLIPModel.from_pretrained("LanguageBind/LanguageBind_Audio")You are using a model of type LanguageBindAudio to instantiate a model of type clip. This is not supported for all configurations of models and can yield errors.

Trying to load the model currently fails:
```
In [6]: model = CLIPModel.from_pretrained("LanguageBind/LanguageBind_Audio")
You are using a model of type LanguageBindAudio to instantiate a model of type clip. This is not supported for all configurations of models and can yield errors.