pyannote-segmentation-3.0-RTVE-primary

Model Details

This system is a collection of three fine-tuned models, to be fused with DOVER-Lap. Each models is fine-tuned monitoring a different metric component of Diarization Error Rate (i.e., False Alarm, Missed Detection, and Speaker Confusion). More information about the fusion of these models can be found in this paper.

Each model is a fine-tuned version of pyannote/segmentation-3.0 on the RTVE database used for Albayzin Evaluations of IberSPEECH 2024.

On the RTVE2024 test set it achives the following results (two-decimal rounding), being the best-performing system of Albayzin Evaluations 2024:

Diarization Error Rate (DER): 14.98%
False Alarm: 2.64%
Missed Detection: 4.54%
Speaker Confusion: 7.80%

Uses

This system is intented to be used for speaker diarization of TV shows.

Usage

The instructions to obtain the RTTM output of each model can be found here, using this configuration file

Once obtained, this script can be modified to obtain the fusion of each model's output.

Training Details

Training Data

The train.lst file includes the URIs of the training data.

Training Hyperparameters

Model:

duration: 10.0
max_speakers_per_chunk: 3
max_speakers_per_frame: 2
train_batch_size: 32
powerset_max_classes: 2

Adam Optimizer:

lr: 0.0001

Early Stopping:

direction: 'min'
max_epochs: 20

Development Data

The development.lst file includes the URIs of the development data.

Evaluation

Forgiveness collar: 250ms
Skip overlap: False

Testing Data & Metrics

Testing Data

The test.lst file includes the URIs of the testing data.

Metrics

Diarization Error Rate, False Alarm, Missed Detection, Speaker Confusion.

Citation

If you use these models, please cite:

BibTeX:

@inproceedings{souganidis24_iberspeech,
  title     = {HiTZ-Aholab Speaker Diarization System for Albayzin Evaluations of IberSPEECH 2024},
  author    = {Christoforos Souganidis and Gemma Meseguer and Asier Herranz and Inma {Hernáez Rioja} and Eva Navas and Ibon Saratxaga},
  year      = {2024},
  booktitle = {IberSPEECH 2024},
  pages     = {327--330},
  doi       = {10.21437/IberSPEECH.2024-68},
}

Acknowledgments

This project with reference 2022/TL22/00215335 has been parcially funded by the Ministerio de Transformación Digital and by the Plan de Recuperación, Transformación y Resiliencia – Funded by the European Union – NextGenerationEU ILENIA and by the project IkerGaitu funded by the Basque Government.

HiTZ
/

pyannote-segmentation-3.0-RTVE