Spaces:
Build error
Running locally
This solution works really well! I've tried to combine Whisper with Pyannote Audio speaker recognition but had poor results, probably because of some oversights on my part.
I would like to run your code locally, should it be possible? I don't see where the Pyannote auth token is provided (sorry for the noob question).
So I found how to add the authentication token and it seems to work. For some reason though I get very different results than the ones I get when using the 'App'. I am using a smaller Whisper model (medium), but the main problem lies in the diarization part. The saparation to speaker 1 / speaker 2 is completely wrong. Do you think that using the smaller Whisper model can account for that?
This solution works really well! I've tried to combine Whisper with Pyannote Audio speaker recognition but had poor results, probably because of some oversights on my part.
I would like to run your code locally, should it be possible? I don't see where the Pyannote auth token is provided (sorry for the noob question).
Any chance you can tell me how you got this working locally? How did you add the token and how exactly do you run it (python app.py .... how do I pass in the audio file?)?
For the place to put the Huggingface token search the code for use_auth_token="-"
.
I actually had some problem with CUDA compatibility and had to use the CPU. After isntalling the dependencies I wrote a small wrapper around the code of personal use.
You can find the code and instructions in a small GitHub repo, wrapped with a small backend. Pre-requisites should be similar:
https://github.com/alexgo84/video-transcribe
I've also had great results with whisperX (with CUDA) that integrates pyannote as well as doing allignment on words to produce high quality subtitles. In this repo the mini project in wrapped with a simple Flask server (no auth or anything fance):
For the place to put the Huggingface token search the code for
use_auth_token="-"
.I actually had some problem with CUDA compatibility and had to use the CPU. After isntalling the dependencies I wrote a small wrapper around the code of personal use.
You can find the code and instructions in a small GitHub repo, wrapped with a small backend. Pre-requisites should be similar:https://github.com/alexgo84/video-transcribe
I've also had great results with whisperX (with CUDA) that integrates pyannote as well as doing allignment on words to produce high quality subtitles. In this repo the mini project in wrapped with a simple Flask server (no auth or anything fance):
Hello Alex!
Can you please share colab notebook please?