Hervé Bredin
commited on
Commit
•
89a2e1b
1
Parent(s):
aaad0e6
feat: initial import
Browse files- README.md +57 -0
- config.yaml +18 -0
README.md
ADDED
@@ -0,0 +1,57 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
tags:
|
3 |
+
- pyannote
|
4 |
+
- audio
|
5 |
+
- voice
|
6 |
+
- speech
|
7 |
+
- speaker
|
8 |
+
- speaker-diarization
|
9 |
+
- speaker-change-detection
|
10 |
+
- voice-activity-detection
|
11 |
+
- overlapped-speech-detection
|
12 |
+
datasets:
|
13 |
+
- ami
|
14 |
+
- dihard
|
15 |
+
- voxconverse
|
16 |
+
- voxceleb
|
17 |
+
license: mit
|
18 |
+
inference: false
|
19 |
+
---
|
20 |
+
|
21 |
+
# [pyannote.audio](https://github.com/pyannote/pyannote-audio) // speaker diarization
|
22 |
+
|
23 |
+
```python
|
24 |
+
from pyannote.audio import Pipeline
|
25 |
+
pipeline = Pipeline.from_pretrained("pyannote/speaker-diarization")
|
26 |
+
output = pipeline("audio.wav")
|
27 |
+
|
28 |
+
for speech_turn, _, speaker in output.itertracks():
|
29 |
+
print(f"Speaker '{speaker}' speaks between t={speech_turn.start}s and t={speech_turn.end}s.")
|
30 |
+
```
|
31 |
+
|
32 |
+
## Benchmark
|
33 |
+
|
34 |
+
| Dataset | [Diarization error rate](http://pyannote.github.io/pyannote-metrics/reference.html#diarization) |
|
35 |
+
| --------------------------------------------------------------------------------------------------- | ------ |
|
36 |
+
| [AMI `only_words` evaluation set](https://github.com/BUTSpeechFIT/AMI-diarization-setup) | 21.3% |
|
37 |
+
| [DIHARD 3 evaluation set](https://arxiv.org/abs/2012.01477) | 22.2% |
|
38 |
+
| [VoxConverse 0.0.2 evaluation set](https://github.com/joonson/voxconverse) | 13.0% |
|
39 |
+
|
40 |
+
## Support
|
41 |
+
|
42 |
+
For commercial enquiries and scientific consulting, please contact [me](mailto:herve@niderb.fr).
|
43 |
+
For [technical questions](https://github.com/pyannote/pyannote-audio/discussions) and [bug reports](https://github.com/pyannote/pyannote-audio/issues), please check [pyannote.audio](https://github.com/pyannote/pyannote-audio) Github repository.
|
44 |
+
|
45 |
+
|
46 |
+
## Citation
|
47 |
+
|
48 |
+
```bibtex
|
49 |
+
@inproceedings{Bredin2020,
|
50 |
+
Title = {{pyannote.audio: neural building blocks for speaker diarization}},
|
51 |
+
Author = {{Bredin}, Herv{\'e} and {Yin}, Ruiqing and {Coria}, Juan Manuel and {Gelly}, Gregory and {Korshunov}, Pavel and {Lavechin}, Marvin and {Fustes}, Diego and {Titeux}, Hadrien and {Bouaziz}, Wassim and {Gill}, Marie-Philippe},
|
52 |
+
Booktitle = {ICASSP 2020, IEEE International Conference on Acoustics, Speech, and Signal Processing},
|
53 |
+
Address = {Barcelona, Spain},
|
54 |
+
Month = {May},
|
55 |
+
Year = {2020},
|
56 |
+
}
|
57 |
+
```
|
config.yaml
ADDED
@@ -0,0 +1,18 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
pipeline:
|
2 |
+
name: pyannote.audio.pipelines.SpeakerDiarization
|
3 |
+
params:
|
4 |
+
segmentation: pyannote/segmentation
|
5 |
+
embedding: speechbrain/spkrec-ecapa-voxceleb
|
6 |
+
clustering: AgglomerativeClustering
|
7 |
+
|
8 |
+
params:
|
9 |
+
clustering:
|
10 |
+
method: average
|
11 |
+
threshold: 0.582398766878762
|
12 |
+
min_activity: 6.073193238899291
|
13 |
+
min_duration_off: 0.09791355693027545
|
14 |
+
min_duration_on: 0.05537587440407595
|
15 |
+
offset: 0.4806866463041527
|
16 |
+
onset: 0.8104268538848918
|
17 |
+
stitch_threshold: 0.04033955907446252
|
18 |
+
|