The size of tensor a (449) must match the size of tensor b (448) at non-singleton dimension 1

#18
by Mr1gh - opened

for wav more than 6 s this problem occurs, when I search I get that "Whisper decoder uses a learned position embedding which has the max length of 448 tokens. Therefore it cannot decode any transcription of more than 448 label ids." is that mean that whisper can be trained on only fixed max length of tokens, and it can't be changed?

Sign up or log in to comment