forced_decoder_ids not applied properly when generation

#10
by minseong-ringle - opened
input_features = processor(input, return_tensors="pt").input_features
forced_decoder_ids = processor.get_decoder_prompt_ids(language = "en", task = "transcribe", no_timestamps=False)

predicted_ids = model.generate(input_features, forced_decoder_ids = forced_decoder_ids)
transcription = processor.batch_decode(predicted_ids)

#  This results in 
# tensor([[50258, 50259, 50359, 50363
# -> "<|startoftranscript|><|en|><|transcribe|><|notimestamps|>
# for transcription

model.config.forced_decoder_ids = processor.get_decoder_prompt_ids(language = "en", task = "transcribe", no_timestamps=False)
# also using this cause the same result.

Here are some code snippets I've tried so far.
I cannot remove notimestamps token as a decoder input.
Any rescues?

Thank you for your help in advance.

Hey! So this might be related to the fact that the "<|notimestamps|>" token is not in the list of suppress tokens! This means that the model is just predicting this token.
We should probably add it to the list of the suppress_tokens

Sign up or log in to comment