Speech to Speech Translation

#3
by Owos - opened

Can I use this model for speech to speech translation?
If yes, please how can I tweak the model for it?

I'm currently trying to do speech to text with Whisper (which can do translation while transcribing the audio), and then text to speech using this model. But the problem is that the timestamp doesn't get utilized, so the generated speech isn't in sync with the original one.

Have you found a satisfying way to do speech to speech that syncs the two speeches?

Sign up or log in to comment