JournalistsonHF/README · New Whisper implementation optimized for speaker diarization

This looks interesting via @philschmid :

“ If you are using Whisper for transcription, listen⁉️👂We created an optimized Whisper with Speaker Diarization for @huggingface Inference Endpoints 🤗 We created a reference implementation that optimizes Whisper with Flash Attention and Speculative Decoding and combines it with Diarization for speaker separations! 🤯

TL;DR:
🏎️ Ultra faster inference due to flash attention & speculative decoding
✅ Leverages the Custom Handler feature of Hugging Face Inference Endpoints
⚡️Takes 4.15s to transcribe 60s audio for Whisper Large on 1x A10G GPU
🔬 Combines Whisper with Pyannote's diarization model
🌐 Fully customizable and adjustable to specific use cases
🔓 Open-source for easy deployment”

Blog post: https://huggingface.co/blog/asr-diarization

Python code: https://huggingface.co/sergeipetrov/asrdiarization-handler/blob/main/handler.py