How to fine tuning with timestamps

#122
by deepdml - opened

I have my own labeled dataset and I want to fine tune it with the accuracy of timestamps as well. How can I do that using Transformers library? @sanchit-gandhi
For fine tuning I'm following https://huggingface.co/blog/fine-tune-whisper but I didn't find anything related to timestamps.

For distil-whisper I've read that it's possible to use timestamp when pseudo-labelling: https://github.com/huggingface/distil-whisper/tree/main/training#1-pseudo-labelling. How can we addapt this to fine-tuning? Thanks

Timestamps are just tokens. All you need to do is figure out how to inject the correct timestamp token at the correct position in the text.

Take a look at the vocab.json to understand how timestamps are tokenized.

Sign up or log in to comment