Add "<|startoftranscript|>" to forced decoder ids
Replacing <|translate|><|notimestamps|>
with <|startoftranscript|><|en|><|transcribe|><|notimestamps|>
That's a pretty big change, you are also adding more tokens.
I think the reason why, by default we only have the 2 tokens is for testing purposes. I agree that depending on the usage we should rather hard-code them in the tests
Also the reason why we don't have <|startoftranscript|>
in the forced_decoder_ids
is because it is set in decoder_start_token_id
We should set the language though in the forced decoder ids no? As we do for say the medium checkpoint:
https://huggingface.co/openai/whisper-medium/blob/main/config.json#L26-L39
For the large, we're currently setting <|translate|><|notimestamps|>
For all the other multilingual checkpoints, we're setting <|en|><|transcribe|><|notimestamps|>