MLC-SLM Official Diarization Baseline Artifacts
This repository stores artifacts from reproducing the official MLC-SLM Task II diarization baseline.
What Is Included
- Fine-tuned
pyannote/segmentation-3.0checkpoint - Pyannote database config
- 3D-Speaker diarization config
- Patched runner/scripts used for this run
- Predicted development-set RTTMs
- DER scoring output
Training Setup
- Dataset: MLC-SLM training sets 1-4
- Epochs: 1
- Batch size: 64
- GPUs: 2 x RTX A6000
- Pyannote validation DER after epoch 1: 0.183
Final Diarization Result
Development-set diarization result after 3D-Speaker/CAMPPlus embedding extraction and spectral clustering:
DER = 16.85% with collar 0, no overlap mode
Pipeline
pyannote segmentation fine-tuning
-> VAD/subsegments
-> 3D-Speaker CAMPPlus embeddings: iic/speech_campplus_sv_zh_en_16k-common_advanced
-> spectral clustering
-> RTTM output
-> md-eval DER scoring
How To Use The Checkpoint
This checkpoint is a fine-tuned pyannote segmentation model. It is not a standalone full diarization pipeline by itself.
Install dependencies:
pip install huggingface_hub pyannote.audio
Download and load the checkpoint:
from huggingface_hub import hf_hub_download
repo_id = "sulaimank/mlc-slm-pyannote-segmentation-baseline"
ckpt_path = hf_hub_download(
repo_id=repo_id,
filename="epoch=0-step=28190.ckpt",
repo_type="model",
)
For PyTorch 2.6+, patch checkpoint loading before calling Model.from_pretrained:
import torch
_original_torch_load = torch.load
def torch_load_weights_false(*args, **kwargs):
kwargs["weights_only"] = False
return _original_torch_load(*args, **kwargs)
torch.load = torch_load_weights_false
Load the pyannote model:
from pyannote.audio import Model
model = Model.from_pretrained(ckpt_path)
model.eval()
How To Reproduce Full Diarization
To reproduce the full official-style diarization pipeline, use the uploaded runner and configs together with the MLC-SLM baseline repository:
git clone https://github.com/mubingshen/MLC-SLM-Baseline.git
The full diarization pipeline requires:
fine-tuned pyannote segmentation checkpoint
+ VAD/subsegments
+ 3D-Speaker CAMPPlus speaker embeddings
+ spectral clustering
The uploaded predicted dev RTTMs are available in:
dev_rttm_predictions/
The DER scoring output is available in:
results/md_eval_fixed_ids.txt
Note
This checkpoint is derived from pyannote/segmentation-3.0; keep usage consistent with pyannote model terms.