Segmentation model
This model was trained on AMI-MixHeadset and my own synthetic dataset of Russian speech.
Training time: 5 hours on GTX 3060
This model can be used for diarization model from pyannote/speaker-diarization
Benchmark | DER% |
---|---|
AMI (headset mix, only_words) | 38.8 |
Usage example
import yaml
from yaml.loader import SafeLoader
import torch
from pyannote.audio import Model
from pyannote.audio.pipelines import SpeakerDiarization
segm_model = torch.load('model/segm_model.pth', map_location=torch.device('cpu'))
embed_model = Model.from_pretrained("pyannote/embedding", use_auth_token='ACCESS_TOKEN_GOES_HERE')
diar_pipeline = SpeakerDiarization(
segmentation=segm_model,
segmentation_batch_size=16,
clustering="AgglomerativeClustering",
embedding=embed_model
)
with open('model/config.yaml', 'r') as f:
diar_config = yaml.load(f, Loader=SafeLoader)
diar_pipeline.instantiate(diar_config)
annotation = diar_pipeline('audio.wav')