Hervé BREDIN commited on
Commit
fa95f91
1 Parent(s): 89e7168

doc: update README

Browse files
Files changed (1) hide show
  1. README.md +16 -12
README.md CHANGED
@@ -20,19 +20,23 @@ datasets:
20
  - repere
21
  - voxceleb
22
  license: mit
 
 
 
 
 
23
  ---
24
 
25
  # 🎹 Speaker diarization
26
 
27
- Relies on pyannote.audio 2.0: see [installation instructions](https://github.com/pyannote/pyannote-audio/tree/develop#installation).
28
-
29
 
30
  ## TL;DR
31
 
32
  ```python
33
  # load the pipeline from Hugginface Hub
34
  from pyannote.audio import Pipeline
35
- pipeline = Pipeline.from_pretrained("pyannote/speaker-diarization@2022.07")
36
 
37
  # apply the pipeline to an audio file
38
  diarization = pipeline("audio.wav")
@@ -89,15 +93,15 @@ Processing is fully automatic:
89
  * evaluation of overlapped speech
90
 
91
 
92
- | Benchmark | [DER%](. "Diarization error rate") | [FA%](. "False alarm rate") | [Miss%](. "Missed detection rate") | [Conf%](. "Speaker confusion rate") | Expected output | File-level evaluation |
93
- | ---------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------- | --------------------------- | ---------------------------------- | ----------------------------------- | ------------------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------ |
94
- | [AISHELL-4](http://www.openslr.org/111/) | 14.61 | 3.31 | 4.35 | 6.95 | [RTTM](https://huggingface.co/pyannote/speaker-diarization/blob/2022.07/reproducible_research/2022.07/AISHELL.SpeakerDiarization.Full.test.rttm) | [eval](https://huggingface.co/pyannote/speaker-diarization/blob/2022.07/reproducible_research/2022.07/AISHELL.SpeakerDiarization.Full.test.eval) |
95
- | [AMI *Mix-Headset*](https://groups.inf.ed.ac.uk/ami/corpus/) [*only_words*](https://github.com/BUTSpeechFIT/AMI-diarization-setup) | 18.21 | 3.28 | 11.07 | 3.87 | [RTTM](https://huggingface.co/pyannote/speaker-diarization/blob/2022.07/reproducible_research/2022.07/AMI.SpeakerDiarization.only_words.test.rttm) | [eval](https://huggingface.co/pyannote/speaker-diarization/blob/2022.07/reproducible_research/2022.07/AMI.SpeakerDiarization.only_words.test.eval) |
96
- | [AMI *Array1-01*](https://groups.inf.ed.ac.uk/ami/corpus/) [*only_words*](https://github.com/BUTSpeechFIT/AMI-diarization-setup) | 29.00 | 2.71 | 21.61 | 4.68 | [RTTM](https://huggingface.co/pyannote/speaker-diarization/blob/2022.07/reproducible_research/2022.07/AMI-SDM.SpeakerDiarization.only_words.test.rttm) | [eval](https://huggingface.co/pyannote/speaker-diarization/blob/2022.07/reproducible_research/2022.07/AMI-SDM.SpeakerDiarization.only_words.test.eval) |
97
- | [CALLHOME](https://catalog.ldc.upenn.edu/LDC2001S97) [*Part2*](https://github.com/BUTSpeechFIT/CALLHOME_sublists/issues/1) | 30.24 | 3.71 | 16.86 | 9.66 | [RTTM](https://huggingface.co/pyannote/speaker-diarization/blob/2022.07/reproducible_research/2022.07/CALLHOME.SpeakerDiarization.CALLHOME.test.rttm) | [eval](https://huggingface.co/pyannote/speaker-diarization/blob/2022.07/reproducible_research/2022.07/CALLHOME.SpeakerDiarization.CALLHOME.test.eval) |
98
- | [DIHARD 3 *Full*](https://arxiv.org/abs/2012.01477) | 20.99 | 4.25 | 10.74 | 6.00 | [RTTM](https://huggingface.co/pyannote/speaker-diarization/blob/2022.07/reproducible_research/2022.07/DIHARD.SpeakerDiarization.Full.test.rttm) | [eval](https://huggingface.co/pyannote/speaker-diarization/blob/2022.07/reproducible_research/2022.07/DIHARD.SpeakerDiarization.Full.test.eval) |
99
- | [REPERE *Phase 2*](https://islrn.org/resources/360-758-359-485-0/) | 12.62 | 1.55 | 3.30 | 7.76 | [RTTM](https://huggingface.co/pyannote/speaker-diarization/blob/2022.07/reproducible_research/2022.07/REPERE.SpeakerDiarization.Full.test.rttm) | [eval](https://huggingface.co/pyannote/speaker-diarization/blob/2022.07/reproducible_research/2022.07/REPERE.SpeakerDiarization.Full.test.eval) |
100
- | [VoxConverse *v0.3*](https://github.com/joonson/voxconverse) | 12.61 | 3.45 | 3.85 | 5.31 | [RTTM](https://huggingface.co/pyannote/speaker-diarization/blob/main/reproducible_research/2022.07/VoxConverse.SpeakerDiarization.VoxConverse.test.rttm) | [eval](https://huggingface.co/pyannote/speaker-diarization/blob/main/reproducible_research/2022.07/VoxConverse.SpeakerDiarization.VoxConverse.test.eval) |
101
 
102
  ## Support
103
 
 
20
  - repere
21
  - voxceleb
22
  license: mit
23
+ extra_gated_prompt: "The collected information will help acquire a better knowledge of pyannote.audio userbase and help its maintainers apply for grants to improve it further. If you are an academic researcher, please cite the relevant papers in your own publications using the model. If you work for a company, please consider contributing back to pyannote.audio development (e.g. through unrestricted gifts). We also provide scientific consulting services around speaker diarization and machine listening."
24
+ extra_gated_fields:
25
+ Company/university: text
26
+ Website: text
27
+ I plan to use this model for (task, type of audio data, etc): text
28
  ---
29
 
30
  # 🎹 Speaker diarization
31
 
32
+ Relies on pyannote.audio 2.0.1: see [installation instructions](https://github.com/pyannote/pyannote-audio#installation).
 
33
 
34
  ## TL;DR
35
 
36
  ```python
37
  # load the pipeline from Hugginface Hub
38
  from pyannote.audio import Pipeline
39
+ pipeline = Pipeline.from_pretrained("pyannote/speaker-diarization@2.0.1")
40
 
41
  # apply the pipeline to an audio file
42
  diarization = pipeline("audio.wav")
 
93
  * evaluation of overlapped speech
94
 
95
 
96
+ | Benchmark (2.0.1) | [DER%](. "Diarization error rate") | [FA%](. "False alarm rate") | [Miss%](. "Missed detection rate") | [Conf%](. "Speaker confusion rate") | Expected output | File-level evaluation |
97
+ | ---------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------- | --------------------------- | ---------------------------------- | ----------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------- |
98
+ | [AISHELL-4](http://www.openslr.org/111/) | 14.61 | 3.31 | 4.35 | 6.95 | [RTTM](https://huggingface.co/pyannote/speaker-diarization/blob/2022.07/reproducible_research/2022.07/AISHELL.SpeakerDiarization.Full.test.rttm) | [eval](https://huggingface.co/pyannote/speaker-diarization/blob/2022.07/reproducible_research/2022.07/AISHELL.SpeakerDiarization.Full.test.eval) |
99
+ | [AMI *Mix-Headset*](https://groups.inf.ed.ac.uk/ami/corpus/) [*only_words*](https://github.com/BUTSpeechFIT/AMI-diarization-setup) | 18.21 | 3.28 | 11.07 | 3.87 | [RTTM](https://huggingface.co/pyannote/speaker-diarization/blob/2022.07/reproducible_research/2022.07/AMI.SpeakerDiarization.only_words.test.rttm) | [eval](https://huggingface.co/pyannote/speaker-diarization/blob/2022.07/reproducible_research/2022.07/AMI.SpeakerDiarization.only_words.test.eval) |
100
+ | [AMI *Array1-01*](https://groups.inf.ed.ac.uk/ami/corpus/) [*only_words*](https://github.com/BUTSpeechFIT/AMI-diarization-setup) | 29.00 | 2.71 | 21.61 | 4.68 | [RTTM](https://huggingface.co/pyannote/speaker-diarization/blob/2022.07/reproducible_research/2022.07/AMI-SDM.SpeakerDiarization.only_words.test.rttm) | [eval](https://huggingface.co/pyannote/speaker-diarization/blob/2022.07/reproducible_research/2022.07/AMI-SDM.SpeakerDiarization.only_words.test.eval) |
101
+ | [CALLHOME](https://catalog.ldc.upenn.edu/LDC2001S97) [*Part2*](https://github.com/BUTSpeechFIT/CALLHOME_sublists/issues/1) | 30.24 | 3.71 | 16.86 | 9.66 | [RTTM](https://huggingface.co/pyannote/speaker-diarization/blob/2022.07/reproducible_research/2022.07/CALLHOME.SpeakerDiarization.CALLHOME.test.rttm) | [eval](https://huggingface.co/pyannote/speaker-diarization/blob/2022.07/reproducible_research/2022.07/CALLHOME.SpeakerDiarization.CALLHOME.test.eval) |
102
+ | [DIHARD 3 *Full*](https://arxiv.org/abs/2012.01477) | 20.99 | 4.25 | 10.74 | 6.00 | [RTTM](https://huggingface.co/pyannote/speaker-diarization/blob/2022.07/reproducible_research/2022.07/DIHARD.SpeakerDiarization.Full.test.rttm) | [eval](https://huggingface.co/pyannote/speaker-diarization/blob/2022.07/reproducible_research/2022.07/DIHARD.SpeakerDiarization.Full.test.eval) |
103
+ | [REPERE *Phase 2*](https://islrn.org/resources/360-758-359-485-0/) | 12.62 | 1.55 | 3.30 | 7.76 | [RTTM](https://huggingface.co/pyannote/speaker-diarization/blob/2022.07/reproducible_research/2022.07/REPERE.SpeakerDiarization.Full.test.rttm) | [eval](https://huggingface.co/pyannote/speaker-diarization/blob/2022.07/reproducible_research/2022.07/REPERE.SpeakerDiarization.Full.test.eval) |
104
+ | [VoxConverse *v0.3*](https://github.com/joonson/voxconverse) | 12.61 | 3.45 | 3.85 | 5.31 | [RTTM](https://huggingface.co/pyannote/speaker-diarization/blob/main/reproducible_research/2022.07/VoxConverse.SpeakerDiarization.VoxConverse.test.rttm) | [eval](https://huggingface.co/pyannote/speaker-diarization/blob/main/reproducible_research/2022.07/VoxConverse.SpeakerDiarization.VoxConverse.test.eval) |
105
 
106
  ## Support
107