error

#6
by vibber - opened

when transcribing an mp4 video today I ran into problems. I processed the same video yesterday with no problem.

the error

by the way thank you for making this. Much appreciated

RuntimeError: The size of tensor a (147) must match the size of tensor b (3) at non-singleton dimension 3
Traceback:
File "/home/user/.local/lib/python3.10/site-packages/streamlit/scriptrunner/script_runner.py", line 554, in _run_script
exec(code, module.dict)
File "/home/user/app/pages/02_๐Ÿ“ผ_Upload_Video_File.py", line 229, in
main()
File "/home/user/app/pages/02_๐Ÿ“ผ_Upload_Video_File.py", line 120, in main
results = inferecence(loaded_model, input_file, task)
File "/home/user/.local/lib/python3.10/site-packages/streamlit/legacy_caching/caching.py", line 573, in wrapped_func
return get_or_create_cached_value()
File "/home/user/.local/lib/python3.10/site-packages/streamlit/legacy_caching/caching.py", line 557, in get_or_create_cached_value
return_value = func(*args, **kwargs)
File "/home/user/app/pages/02_๐Ÿ“ผ_Upload_Video_File.py", line 68, in inferecence
results = loaded_model.transcribe(f"{save_dir}/output.wav", **options)
File "/home/user/.local/lib/python3.10/site-packages/whisper/transcribe.py", line 234, in transcribe
result: DecodingResult = decode_with_fallback(mel_segment)
File "/home/user/.local/lib/python3.10/site-packages/whisper/transcribe.py", line 164, in decode_with_fallback
decode_result = model.decode(segment, options)
File "/home/user/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/user/.local/lib/python3.10/site-packages/whisper/decoding.py", line 819, in decode
result = DecodingTask(model, options).run(mel)
File "/home/user/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/user/.local/lib/python3.10/site-packages/whisper/decoding.py", line 732, in run
tokens, sum_logprobs, no_speech_probs = self._main_loop(audio_features, tokens)
File "/home/user/.local/lib/python3.10/site-packages/whisper/decoding.py", line 682, in _main_loop
logits = self.inference.logits(tokens, audio_features)
File "/home/user/.local/lib/python3.10/site-packages/whisper/decoding.py", line 161, in logits
return self.model.decoder(tokens, audio_features, kv_cache=self.kv_cache)
File "/home/user/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/user/.local/lib/python3.10/site-packages/whisper/model.py", line 211, in forward
x = block(x, xa, mask=self.mask, kv_cache=kv_cache)
File "/home/user/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/user/.local/lib/python3.10/site-packages/whisper/model.py", line 136, in forward
x = x + s

Sign up or log in to comment