Update generation_config.json
#7
by
rpinto
- opened
Correct alignment heads after analyzing the cross-attention weights using DTW averaging 20 samples from "librispeech". Tests showed better timestamp aligments than "whisper-small.en" especially in shorter samples (8-10 seconds).
Super cool, thanks for the PR @rpinto ! Do you have a script to reproduce these results to inspect the cross-attention weights from DTW to verify the alignment improves? I'd love to run it for the other distil-whisper models too to confirm we have the optimal alignment!
Sure thing, I will upload something later today when I get back from work.