Does anyone know how to tag the speaker with Whisper?

#35
by Shaunnnnn - opened

I tried the model for interview record, and it worked pretty well. The thing was that the output was a whole chunk of text and I have no idea about how to tag different speakers. I assume Whisper can distinguish different voices. Are there any easy ways to do that?

You can also have a look at WhisperX: [https://github.com/m-bain/whisperX]

But no, "speaker diarization" (distinguishing speakers) is NOT a feature of the model Whisper, as it was not trained for this task.

BTW, I managed to tag the speakers for primary research interview record using the code here: https://colab.research.google.com/drive/1V-Bt5Hm2kjaDb4P1RyMSswsDKyrzc2-3?usp=sharing#scrollTo=ACobbJnIR_ni

BTW, I managed to tag the speakers for primary research interview record using the code here: https://colab.research.google.com/drive/1V-Bt5Hm2kjaDb4P1RyMSswsDKyrzc2-3?usp=sharing#scrollTo=ACobbJnIR_ni

speaker diarization is not possible through this model (any whisper model) you are using pyannote, that is a different thing. Also, you need to agree to their terms (or complete a form) before you can use it.

Sign up or log in to comment