DiCoSe: Improving Music Source Separation with Diffusion and Consistency Refinement
Pre-trained checkpoints for "Improving Music Source Separation with Diffusion and Consistency Refinement".
- Code: github.com/Russell-Izadi-Bose/DiCoSe
- Paper: arXiv:2412.06965
- Demo: consistency-separation.github.io
This repo hosts checkpoints for two experimental tracks described in the paper:
- A custom U-Net separator trained on Slakh2100.
- A BS-RoFormer separator (backbone from Music-Source-Separation-Training) trained on MUSDB18-HQ.
For each track, three checkpoints are provided, corresponding to the three stages of the method: a Deterministic separator, a Diffusion refinement model trained on top of it, and a Consistency-Distilled (CD) model distilled from the diffusion model for fast (1-2 step) inference.
Files
| File | Track | Stage | SDR (dB, avg across stems) |
|---|---|---|---|
Deterministic_model_unet/model.ckpt |
U-Net / Slakh2100 | Deterministic | 10.89 |
diffusion_model_unet/model.ckpt |
U-Net / Slakh2100 | Diffusion | 11.34 |
CD_unet/model.ckpt |
U-Net / Slakh2100 | Consistency-Distilled | 11.42 (T=1) → 11.95 (T=4) |
Deterministic_model_MSST_bs_roformer/model.ckpt |
BS-RoFormer / MUSDB18 | Deterministic | 9.84 |
diffusion_model_MSST_bs_roformer/model.ckpt |
BS-RoFormer / MUSDB18 | Diffusion | 10.34 |
CD_MSST_bs_roformer/model.ckpt |
BS-RoFormer / MUSDB18 | Consistency-Distilled | 10.41 (T=1) → 10.40 (T=2) |
SDR is the median-over-1s-chunks SDR (via museval), averaged across stems on the respective test set, as reported in the paper. The Consistency-Distilled (CD) checkpoints are a single model evaluated at different numbers of inference steps (T); more steps generally improve quality further.
Usage
See the GitHub repo for the download script, environment setup, and eval configs that load these checkpoints. Training/eval code for the BS-RoFormer track is coming soon; checkpoints are published now for reference.
Citation
@misc{karchkhadze2024improvingsourceextractiondiffusion,
title={Improving Music Source Separation with Diffusion and Consistency Refinement},
author={Tornike Karchkhadze and Mohammad Rasool Izadi and Shuo Zhang and Shlomo Dubnov},
year={2024},
eprint={2412.06965},
archivePrefix={arXiv},
primaryClass={cs.SD},
url={https://arxiv.org/abs/2412.06965},
}
License
MIT