MLC-SLM Official Diarization Baseline Artifacts

This repository stores artifacts from reproducing the official MLC-SLM Task II diarization baseline.

What Is Included

Fine-tuned pyannote/segmentation-3.0 checkpoint
Pyannote database config
3D-Speaker diarization config
Patched runner/scripts used for this run
Predicted development-set RTTMs
DER scoring output

Training Setup

Dataset: MLC-SLM training sets 1-4
Epochs: 1
Batch size: 64
GPUs: 2 x RTX A6000
Pyannote validation DER after epoch 1: 0.183

Final Diarization Result

Development-set diarization result after 3D-Speaker/CAMPPlus embedding extraction and spectral clustering:

DER = 16.85% with collar 0, no overlap mode

Pipeline

pyannote segmentation fine-tuning
-> VAD/subsegments
-> 3D-Speaker CAMPPlus embeddings: iic/speech_campplus_sv_zh_en_16k-common_advanced
-> spectral clustering
-> RTTM output
-> md-eval DER scoring

How To Use The Checkpoint

This checkpoint is a fine-tuned pyannote segmentation model. It is not a standalone full diarization pipeline by itself.

Install dependencies:

pip install huggingface_hub pyannote.audio

Download and load the checkpoint:

from huggingface_hub import hf_hub_download

repo_id = "sulaimank/mlc-slm-pyannote-segmentation-baseline"

ckpt_path = hf_hub_download(
    repo_id=repo_id,
    filename="epoch=0-step=28190.ckpt",
    repo_type="model",
)

For PyTorch 2.6+, patch checkpoint loading before calling Model.from_pretrained:

import torch

_original_torch_load = torch.load

def torch_load_weights_false(*args, **kwargs):
    kwargs["weights_only"] = False
    return _original_torch_load(*args, **kwargs)

torch.load = torch_load_weights_false

Load the pyannote model:

from pyannote.audio import Model

model = Model.from_pretrained(ckpt_path)
model.eval()

How To Reproduce Full Diarization

To reproduce the full official-style diarization pipeline, use the uploaded runner and configs together with the MLC-SLM baseline repository:

git clone https://github.com/mubingshen/MLC-SLM-Baseline.git

The full diarization pipeline requires:

fine-tuned pyannote segmentation checkpoint
+ VAD/subsegments
+ 3D-Speaker CAMPPlus speaker embeddings
+ spectral clustering

The uploaded predicted dev RTTMs are available in:

dev_rttm_predictions/

The DER scoring output is available in:

results/md_eval_fixed_ids.txt

Note

This checkpoint is derived from pyannote/segmentation-3.0; keep usage consistent with pyannote model terms.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support