RuntimeError: Error(s) in loading state_dict for HuggingFaceWhisper: #1

by - opened
size mismatch for _mel_filters: copying a param with shape torch.Size([80, 201]) from checkpoint, the shape in current model is torch.Size([201, 80]).

I just use the same code provided and get this error , when i transpode the mel_filters it load the model but when transcribing give matrix multiplication error :

File /opt/conda/lib/python3.10/site-packages/speechbrain/lobes/models/, in HuggingFaceWhisper._log_mel_spectrogram(self, audio)
244 magnitudes = stft[..., :-1].abs() ** 2
246 filters = self._mel_filters
--> 247 mel_spec = filters @ magnitudes
249 log_spec = torch.clamp(mel_spec, min=1e-10).log10()
250 log_spec = torch.maximum(
251 log_spec,
252 (log_spec.flatten(start_dim=1).max(dim=-1)[0] - 8.0)[:, None, None],
253 )

RuntimeError: mat1 and mat2 shapes cannot be multiplied (3000x201 and 80x201)

SpeechBrain org


which version of PyTorch are you using?


which version of PyTorch are you using?

Hello torch.version '2.0.0+cu117'

It works when I reshape the feature_extractor.filter on speechbrain in /speechbrain/lobes/models/

but I don't know if this is a good way to fix it

Sign up or log in to comment