|
--- |
|
license: mit |
|
language: |
|
- ru |
|
library_name: pyannote-audio |
|
tags: |
|
- code |
|
--- |
|
|
|
# Segmentation model |
|
|
|
This model was trained on AMI-MixHeadset and my own synthetic dataset of Russian speech. |
|
|
|
Training time: 5 hours on GTX 3060 |
|
|
|
This model can be used for diarization model from [pyannote/speaker-diarization](https://huggingface.co/pyannote/speaker-diarization) |
|
|
|
| Benchmark | DER% | |
|
| --------- |------| |
|
| [AMI (*headset mix,*](https://groups.inf.ed.ac.uk/ami/corpus/) [*only_words*)](https://github.com/BUTSpeechFIT/AMI-diarization-setup) | 38.8 | |
|
|
|
## Usage example |
|
|
|
```python |
|
import yaml |
|
from yaml.loader import SafeLoader |
|
|
|
import torch |
|
from pyannote.audio import Model |
|
from pyannote.audio.pipelines import SpeakerDiarization |
|
|
|
|
|
segm_model = torch.load('model/segm_model.pth', map_location=torch.device('cpu')) |
|
embed_model = Model.from_pretrained("pyannote/embedding", use_auth_token='ACCESS_TOKEN_GOES_HERE') |
|
diar_pipeline = SpeakerDiarization( |
|
segmentation=segm_model, |
|
segmentation_batch_size=16, |
|
clustering="AgglomerativeClustering", |
|
embedding=embed_model |
|
) |
|
|
|
with open('model/config.yaml', 'r') as f: |
|
diar_config = yaml.load(f, Loader=SafeLoader) |
|
diar_pipeline.instantiate(diar_config) |
|
|
|
annotation = diar_pipeline('audio.wav') |
|
``` |
|
|