-
pyannote/speaker-diarization-community-1
Automatic Speech Recognition β’ Updated β’ 3.46k β’ 14 -
pyannote/speaker-diarization-community-1-cloud
Voice Activity Detection β’ Updated β’ 12 -
pyannote/speaker-diarization-precision-2
Voice Activity Detection β’ Updated β’ 171 β’ 1 -
pyannote/wespeaker-voxceleb-resnet34-LM
Updated β’ 17.3M β’ 77
AI & ML interests
Speaker Intelligence Platform for developers
Recent Activity
π Simply detect, segment, label, and separate speakers in any language
π€ What is speaker diarization?
Speaker diarization is the process of automatically partitioning the audio recording of a conversation into segments and labeling them by speaker, answering the question "who spoke when?". As the foundational layer of conversational AI, speaker diarization provides high-level insights for human-human and human-machine conversations, and unlocks a wide range of downstream applications: meeting transcription, call center analytics, voice agents, video dubbing.
βΆοΈ Getting started
Install pyannote.audio
latest release available from with either
uv
(recommended) or pip
:
$ uv add pyannote.audio
$ pip install pyannote.audio
Enjoy state-of-the-art speaker diarization:
# download pretrained pipeline from Huggingface
from pyannote.audio import Pipeline
pipeline = Pipeline.from_pretrained('pyannote/speaker-diarization-community-1', token="HUGGINGFACE_TOKEN")
# perform speaker diarization locally
output = pipeline('/path/to/audio.wav')
# enjoy state-of-the-art speaker diarization
for turn, speaker in output.speaker_diarization:
print(f"{speaker} speaks between t={turn.start}s and t={turn.end}s")
Read community-1
model card to make the most of it.
π State-of-the-art models
pyannoteAI
research team trains cutting-edge speaker diarization models, thanks to Jean Zay π«π· supercomputer managed by GENCI π. They come in two flavors:
pyannote.audio
open models available on Huggingface and used by 140k+ developers over the world ;- premium models available on
pyannoteAI
cloud (and on-premise for enterprise customers) that provide state-of-the-art speaker diarization as well as additional enterprise features.
Benchmark (last updated in 2025-09) | legacy (3.1) |
community-1 |
precision-2 |
---|---|---|---|
AISHELL-4 | 12.2 | 11.7 | 11.4 π |
AliMeeting (channel 1) | 24.5 | 20.3 | 15.2 π |
AMI (IHM) | 18.8 | 17.0 | 12.9 π |
AMI (SDM) | 22.7 | 19.9 | 15.6 π |
AVA-AVD | 49.7 | 44.6 | 37.1 π |
CALLHOME (part 2) | 28.5 | 26.7 | 16.6 π |
DIHARD 3 (full) | 21.4 | 20.2 | 14.7 π |
Ego4D (dev.) | 51.2 | 46.8 | 39.0 π |
MSDWild | 25.4 | 22.8 | 17.3 π |
RAMC | 22.2 | 20.8 | 10.5 π |
REPERE (phase2) | 7.9 | 8.9 | 7.4 π |
VoxConverse (v0.3) | 11.2 | 11.2 | 8.5 π |
Diarization error rate (in %, the lower, the better)
β©οΈ Going further, better, and faster
precision-2
premium model further improves accuracy, processing speed, as well as brings additional features.
Features | community-1 |
precision-2 |
---|---|---|
Set exact/min/max number of speakers | β | β |
Exclusive speaker diarization (for transcription) | β | β |
Segmentation confidence scores | β | β |
Speaker confidence scores | β | β |
Voiceprinting | β | β |
Speaker identification | β | β |
Time to process 1h of audio (on H100) | 37s | 14s |
Create a pyannoteAI
account, change one line of code, and enjoy free cloud credits to try precision-2
premium diarization:
# perform premium speaker diarization on pyannoteAI cloud
pipeline = Pipeline.from_pretrained('pyannote/speaker-diarization-precision-2', token="PYANNOTEAI_API_KEY")
better_output = pipeline('/path/to/audio.wav')
-
pyannote/speaker-diarization-community-1
Automatic Speech Recognition β’ Updated β’ 3.46k β’ 14 -
pyannote/speaker-diarization-community-1-cloud
Voice Activity Detection β’ Updated β’ 12 -
pyannote/speaker-diarization-precision-2
Voice Activity Detection β’ Updated β’ 171 β’ 1 -
pyannote/wespeaker-voxceleb-resnet34-LM
Updated β’ 17.3M β’ 77
models
17

pyannote/speaker-diarization-community-1

pyannote/speaker-diarization-precision-2

pyannote/speaker-diarization-community-1-cloud

pyannote/ci-segmentation

pyannote/speech-separation-ami-1.0

pyannote/separation-ami-1.0

pyannote/speaker-diarization-3.1

pyannote/overlapped-speech-detection

pyannote/speaker-segmentation
