Does anyone know how to tag the speaker with Whisper?

#35

by Shaunnnnn - opened Nov 18, 2023

Nov 18, 2023

I tried the model for interview record, and it worked pretty well. The thing was that the output was a whole chunk of text and I have no idea about how to tag different speakers. I assume Whisper can distinguish different voices. Are there any easy ways to do that?

artyomboyko

Nov 21, 2023

Hello. As far as I can see you need this? https://huggingface.co/learn/audio-course/chapter7/transcribe-meeting

Ilianos

Dec 5, 2023

You can also have a look at WhisperX: [https://github.com/m-bain/whisperX]

But no, "speaker diarization" (distinguishing speakers) is NOT a feature of the model Whisper, as it was not trained for this task.

Shaunnnnn

Dec 5, 2023

BTW, I managed to tag the speakers for primary research interview record using the code here: https://colab.research.google.com/drive/1V-Bt5Hm2kjaDb4P1RyMSswsDKyrzc2-3?usp=sharing#scrollTo=ACobbJnIR_ni

supercharge19

Dec 15, 2023

BTW, I managed to tag the speakers for primary research interview record using the code here: https://colab.research.google.com/drive/1V-Bt5Hm2kjaDb4P1RyMSswsDKyrzc2-3?usp=sharing#scrollTo=ACobbJnIR_ni

speaker diarization is not possible through this model (any whisper model) you are using pyannote, that is a different thing. Also, you need to agree to their terms (or complete a form) before you can use it.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment