pyannote.audio

non-profit

https://github.com/pyannote/pyannote-audio

pyannote

AI & ML interests

speaker diarization // speaker recognition // speaker segmentation // voice activity detection // overlapped speech detection // speaker change detection

Organization Card

About org cards

pyannote.audio is an open-source toolkit for speaker diarization.

Pretrained pipelines reach state-of-the-art performance on most academic benchmarks.

Using it in production?
Consider switching to pyannoteAI for better and faster options.

Benchmark	v2.1	v3.1	pyannoteAI
AISHELL-4	14.1	12.2	11.2
AliMeeting (channel 1)	27.4	24.4	19.3
AMI (IHM)	18.9	18.8	15.8
AMI (SDM)	27.1	22.4	19.3
AVA-AVD	66.3	50.0	44.8
CALLHOME (part 2)	31.6	28.4	19.8
DIHARD 3 (full)	26.9	21.7	16.8
Earnings21	17.0	9.4	9.1
Ego4D (dev.)	61.5	51.2	44.0
MSDWild	32.8	25.3	19.8
RAMC	22.5	22.2	11.1
REPERE (phase2)	8.2	7.8	7.6
VoxConverse (v0.3)	11.2	11.3	9.8
Diarization error rate (in %)

Using high-end NVIDIA hardware,

v2.1 takes around 1m30s to process 1h of audio
v3.1 takes around 1m20s to process 1h of audio
On-premise pyannoteAI takes less than 30s to process 1h of audio

spaces 1

Pretrained pipelines

models 13

pyannote/speaker-diarization-3.1

Automatic Speech Recognition • Updated 1 day ago • 6.23M • 236

pyannote/overlapped-speech-detection

Automatic Speech Recognition • Updated 1 day ago • 13.9k • 22

pyannote/speaker-segmentation

Automatic Speech Recognition • Updated 1 day ago • 23 • 24

pyannote/voice-activity-detection

Automatic Speech Recognition • Updated 1 day ago • 533k • 135

pyannote/segmentation

Voice Activity Detection • Updated 1 day ago • 4.35M • 419

pyannote/speaker-diarization

Automatic Speech Recognition • Updated 1 day ago • 3.47M • 666

pyannote/speaker-diarization-3.0

Automatic Speech Recognition • Updated 1 day ago • 566k • 144

pyannote/embedding

Updated 1 day ago • 259k • 76

pyannote/wespeaker-voxceleb-resnet34-LM

Updated 1 day ago • 8.12M • 25

pyannote/segmentation-3.0

Voice Activity Detection • Updated 1 day ago • 8.5M • 149

datasets

None public yet