RuntimeError: Error(s) in loading state_dict for HuggingFaceWhisper:

#1
by Mr1gh - opened
size mismatch for _mel_filters: copying a param with shape torch.Size([80, 201]) from checkpoint, the shape in current model is torch.Size([201, 80]).

I just use the same code provided and get this error , when i transpode the mel_filters it load the model but when transcribing give matrix multiplication error :

File /opt/conda/lib/python3.10/site-packages/speechbrain/lobes/models/huggingface_whisper.py:247, in HuggingFaceWhisper._log_mel_spectrogram(self, audio)
244 magnitudes = stft[..., :-1].abs() ** 2
246 filters = self._mel_filters
--> 247 mel_spec = filters @ magnitudes
249 log_spec = torch.clamp(mel_spec, min=1e-10).log10()
250 log_spec = torch.maximum(
251 log_spec,
252 (log_spec.flatten(start_dim=1).max(dim=-1)[0] - 8.0)[:, None, None],
253 )

RuntimeError: mat1 and mat2 shapes cannot be multiplied (3000x201 and 80x201)

SpeechBrain org

Hello,

which version of PyTorch are you using?

Hello,

which version of PyTorch are you using?

Hello torch.version '2.0.0+cu117'

It works when I reshape the feature_extractor.filter on speechbrain in /speechbrain/lobes/models/huggingface_whisper.py

but I don't know if this is a good way to fix it

Sign up or log in to comment