openai/whisper · Whisper crashes frequently

Mar 31, 2023

Hey guys,
I stumbled across the following problem, I recently installed Whisper on my local machine to pull the text out of some interviews for work.
I already have whisper on my office laptop running, but the specs are low, no dedicated gpu ... you name it.
So converting 1h of interview took 2 full 8h shifts.
I followed the installation guide by TroubleChute like last time, now only with my GPU in mind (RTX2070).
To the problem:
It dosn't matter which sound file I use - all are .m4a files. The conversion/ pulling crashes after 10-19 Minuets with the error on the end of my description.
First of all, I'm not perfect, so of curse, there is a change I messed up the installation to some extend.
I don't believe it's the sound files, I suspected them to be the cause of the problem, as the interview is held in a rather hard German slang
But I ran the first file I used at work here again,
at work, it pulled the text from the full hour of the interview, on my home PC it crashed after 11 minutes.
While writing I tried on suggestion I've read on another post regarding this issue, with setting the Temperature fallback to "None" .... when I've first tried it I haven't capitalized the N.
And it works out...., can somebody explain to me what the temperature fallback actually is? jugging by all the jibberish I'm seeing ... is it really Whisper not trying to understand what's being said? And how can I fine tune it, to try to understand it, but not crashing....
Why did it work while decoding with the CPU alone but not with the GPU on that one file - just luck?
Is there something else I could try to improve this situation?

And before anyone ask, I'm unfortunately not able/ allowed to share the audio, this is for the master degree from someone at my company so this will only cause problems for me ^^

Thanks for all the advice you hopefully can provide me.

File "C:\Users*\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users*\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in run_code
exec(code, run_globals)
File "C:\Users*\AppData\Local\Programs\Python\Python310\Scripts\whisper.exe_main.py", line 7, in
File "C:\Users*\AppData\Local\Programs\Python\Python310\lib\site-packages\whisper\transcribe.py", line 437, in cli
result = transcribe(model, audio_path, temperature=temperature, **args)
File "C:\Users*\AppData\Local\Programs\Python\Python310\lib\site-packages\whisper\transcribe.py", line 229, in transcribe
result: DecodingResult = decode_with_fallback(mel_segment)
File "C:\Users*\AppData\Local\Programs\Python\Python310\lib\site-packages\whisper\transcribe.py", line 164, in decode_with_fallback
decode_result = model.decode(segment, options)
File "C:\Users*\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "C:\Users*\AppData\Local\Programs\Python\Python310\lib\site-packages\whisper\decoding.py", line 811, in decode
result = DecodingTask(model, options).run(mel)
File "C:\Users*\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "C:\Users*\AppData\Local\Programs\Python\Python310\lib\site-packages\whisper\decoding.py", line 724, in run
tokens, sum_logprobs, no_speech_probs = self._main_loop(audio_features, tokens)
File "C:\Users*\AppData\Local\Programs\Python\Python310\lib\site-packages\whisper\decoding.py", line 689, in _main_loop
tokens, completed = self.decoder.update(tokens, logits, sum_logprobs)
File "C:\Users*\AppData\Local\Programs\Python\Python310\lib\site-packages\whisper\decoding.py", line 276, in update
next_tokens = Categorical(logits=logits / self.temperature).sample()
File "C:\Users*\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\distributions\categorical.py", line 66, in init
super().init(batch_shape, validate_args=validate_args)
File "C:\Users*\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\distributions\distribution.py", line 62, in init
raise ValueError(
ValueError: Expected parameter logits (Tensor of shape (5, 51865)) of distribution Categorical(logits: torch.Size([5, 51865])) to satisfy the constraint IndependentConstraint(Real(), 1), but found invalid values:
tensor([[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[-inf, -inf, -inf, ..., -inf, -inf, -inf],
[-inf, -inf, -inf, ..., -inf, -inf, -inf],
[-inf, -inf, -inf, ..., -inf, -inf, -inf]], device='cuda:0')

KoddaDuck

May 23, 2023

If a model works with CPUs but not GPUs, the most reasonable cause could be related with VRAM, 2070 probably doesn't have enough VRAM to inference the large(or even medium) model, you may want to upgrade your GPU to something that have at least 11GB of VRAM to inference the Large model(which takes around 10Gb of VRAM). As for fine-tuning the model, try colab below: https://colab.research.google.com/github/sanchit-gandhi/notebooks/blob/main/fine_tune_whisper.ipynb. If you are aiming for fine-tuning larger model, try peft fine-tuning the model. I'm not sure about the temperature fallback you are talking about, it could be related to beam search and greedy decoding in decoing process.

ameenmohammad

Sep 14, 2024

Hi @KoddaDuck
I don't think vram is the cause, I am running an a6000 ada with 62 gb of vram, and running whisper large v3 that is finetuned with fasterwhisper causes the kernel to crash.