Spaces:

dwarkesh
/

whisper-speaker-recognition

Build error

Running locally

by alexgo84 - opened Feb 2, 2023

Feb 2, 2023

This solution works really well! I've tried to combine Whisper with Pyannote Audio speaker recognition but had poor results, probably because of some oversights on my part.

I would like to run your code locally, should it be possible? I don't see where the Pyannote auth token is provided (sorry for the noob question).

alexgo84

Feb 2, 2023

So I found how to add the authentication token and it seems to work. For some reason though I get very different results than the ones I get when using the 'App'. I am using a smaller Whisper model (medium), but the main problem lies in the diarization part. The saparation to speaker 1 / speaker 2 is completely wrong. Do you think that using the smaller Whisper model can account for that?

alaffere

Feb 9, 2023

@alexgo84 I just ran this in a Colab notebook set to GPU: Premium using the large-v2 Whisper model. It took about 12 minutes for a 60 minute audio and the transcription came out great.

alexgo84

Feb 10, 2023

@alaffere So I can confirm that using a different Whisper model size changes the final diarization output.
I'll mention that the videos I'm transcribing are mainly in the Russian and Ukranian languages, maybe that's why the diarization far from perfect.

ilai11

Feb 26, 2023

@alexgo84 I just ran this in a Colab notebook set to GPU: Premium using the large-v2 Whisper model. It took about 12 minutes for a 60 minute audio and the transcription came out great.

Hugging face is too slow, It would be very great if you could share your colab notebook. Thank you so much!

marinohardin

Mar 30, 2023

This solution works really well! I've tried to combine Whisper with Pyannote Audio speaker recognition but had poor results, probably because of some oversights on my part.

I would like to run your code locally, should it be possible? I don't see where the Pyannote auth token is provided (sorry for the noob question).

Any chance you can tell me how you got this working locally? How did you add the token and how exactly do you run it (python app.py .... how do I pass in the audio file?)?

alexgo84

Mar 31, 2023

For the place to put the Huggingface token search the code for use_auth_token="-".

I actually had some problem with CUDA compatibility and had to use the CPU. After isntalling the dependencies I wrote a small wrapper around the code of personal use.
You can find the code and instructions in a small GitHub repo, wrapped with a small backend. Pre-requisites should be similar:

https://github.com/alexgo84/video-transcribe

I've also had great results with whisperX (with CUDA) that integrates pyannote as well as doing allignment on words to produce high quality subtitles. In this repo the mini project in wrapped with a simple Flask server (no auth or anything fance):

https://github.com/alexgo84/whisperx-server

Best24

Sep 10, 2023

•

edited Sep 10, 2023

For the place to put the Huggingface token search the code for use_auth_token="-".

I actually had some problem with CUDA compatibility and had to use the CPU. After isntalling the dependencies I wrote a small wrapper around the code of personal use.
You can find the code and instructions in a small GitHub repo, wrapped with a small backend. Pre-requisites should be similar:

https://github.com/alexgo84/video-transcribe

I've also had great results with whisperX (with CUDA) that integrates pyannote as well as doing allignment on words to produce high quality subtitles. In this repo the mini project in wrapped with a simple Flask server (no auth or anything fance):

https://github.com/alexgo84/whisperx-server

Hello Alex!
Can you please share colab notebook please?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment