distil-whisper/distil-large-v2 · Is Prompting possible?

Yes, currently for batch size 1:
from transformers import WhisperProcessor, WhisperForConditionalGeneration
from datasets import load_dataset

dataset = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
input_speech = dataset[3]["audio"]["array"]

processor = WhisperProcessor.from_pretrained("distil-whisper/distil-large-v2")
model = WhisperForConditionalGeneration.from_pretrained("distil-whisper/distil-large-v2")
input_features = processor(input_speech, return_tensors="pt").input_features

# --- Without prompt ---
output_without_prompt = model.generate(input_features)
print(processor.decode(output_without_prompt[0]))
# <|startoftranscript|><|en|><|transcribe|><|notimestamps|> He has grave doubts whether Sir Frederick Leighton's work is really Greek after all, and can discover in it but little of Rocky Ithaca.<|endoftext|>

# --- With prompt ---
# Let's change the spelling of "Leighton" -> "Layton" by passing it as a prompt
prompt_ids = processor.get_prompt_ids("Layton")
output_with_prompt = model.generate(input_features, prompt_ids=prompt_ids)
print(processor.decode(output_with_prompt[0]))
# <|startofprev|> Layton<|startoftranscript|><|en|><|transcribe|><|notimestamps|> He has grave doubts whether Sir Frederick Layton's work is really Greek after all, and can discover in it but little of Rocky Ithaca.<|endoftext|>