speechbrain/asr-whisper-large-v2-commonvoice-ar · RuntimeError: Error(s) in loading state

Jun 13, 2023

size mismatch for _mel_filters: copying a param with shape torch.Size([80, 201]) from checkpoint, the shape in current model is torch.Size([201, 80]).

I just use the same code provided and get this error , when i transpode the mel_filters it load the model but when transcribing give matrix multiplication error :

File /opt/conda/lib/python3.10/site-packages/speechbrain/lobes/models/huggingface_whisper.py:247, in HuggingFaceWhisper._log_mel_spectrogram(self, audio)
244 magnitudes = stft[..., :-1].abs() ** 2
246 filters = self._mel_filters
--> 247 mel_spec = filters @ magnitudes
249 log_spec = torch.clamp(mel_spec, min=1e-10).log10()
250 log_spec = torch.maximum(
251 log_spec,
252 (log_spec.flatten(start_dim=1).max(dim=-1)[0] - 8.0)[:, None, None],
253 )

RuntimeError: mat1 and mat2 shapes cannot be multiplied (3000x201 and 80x201)

Adel-Moumen

SpeechBrain org Jun 13, 2023

Hello,

which version of PyTorch are you using?

Mr1gh

Jun 14, 2023

Hello,

which version of PyTorch are you using?

Hello torch.version '2.0.0+cu117'

It works when I reshape the feature_extractor.filter on speechbrain in /speechbrain/lobes/models/huggingface_whisper.py

but I don't know if this is a good way to fix it

speechbrain
/

asr-whisper-large-v2-commonvoice-ar

RuntimeError: Error(s) in loading state_dict for HuggingFaceWhisper: