File size: 1,256 Bytes
1f4e791
 
dc3781c
 
 
 
 
1f4e791
dc3781c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
---
license: mit
language:
- ru
library_name: pyannote-audio
tags:
- code
---

# Segmentation model

This model was trained on AMI-MixHeadset and my own synthetic dataset of Russian speech.

Training time: 5 hours on GTX 3060 

This model can be used for diarization model from [pyannote/speaker-diarization](https://huggingface.co/pyannote/speaker-diarization)

| Benchmark | DER% |
| --------- |------|
| [AMI (*headset mix,*](https://groups.inf.ed.ac.uk/ami/corpus/) [*only_words*)](https://github.com/BUTSpeechFIT/AMI-diarization-setup) | 38.8 |

## Usage example

```python
import yaml
from yaml.loader import SafeLoader

import torch
from pyannote.audio import Model
from pyannote.audio.pipelines import SpeakerDiarization


segm_model = torch.load('model/segm_model.pth', map_location=torch.device('cpu'))
embed_model = Model.from_pretrained("pyannote/embedding", use_auth_token='ACCESS_TOKEN_GOES_HERE')
diar_pipeline = SpeakerDiarization(
    segmentation=segm_model,
    segmentation_batch_size=16,
    clustering="AgglomerativeClustering",
    embedding=embed_model
)

with open('model/config.yaml', 'r') as f:
    diar_config = yaml.load(f, Loader=SafeLoader)
diar_pipeline.instantiate(diar_config)

annotation = diar_pipeline('audio.wav')
```