Update generation_config.json

by rpinto - opened Jan 3

base: refs/heads/main

←

from: refs/pr/7

Discussion Files changed

-78

rpinto

Jan 3

Correct alignment heads after analyzing the cross-attention weights using DTW averaging 20 samples from "librispeech". Tests showed better timestamp aligments than "whisper-small.en" especially in shorter samples (8-10 seconds).

Update generation_config.jsonb76dc295

sanchit-gandhi

Whisper Distillation org Jan 3

Super cool, thanks for the PR @rpinto ! Do you have a script to reproduce these results to inspect the cross-attention weights from DTW to verify the alignment improves? I'd love to run it for the other distil-whisper models too to confirm we have the optimal alignment!

Xenova

Whisper Distillation org Jan 3

Thanks for the PR @rpinto !

Do you have a script to reproduce these results to inspect the cross-attention weights from DTW to verify the alignment improves?

I'm also quite interested in this.

rpinto

Jan 4

Sure thing, I will upload something later today when I get back from work.

rpinto

Jan 4

https://github.com/jrrpm/whisper-analysis

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Ready to merge

This branch is ready to get merged automatically.

· Sign up or log in to comment