README.md · Sucial/Dereverb-Echo_Mel_Band_Roformer at b417b0e5bc1d8683a2b5c90c69240829db3cd608

metadata

license: cc-by-nc-sa-4.0

Description

This model is used to separate reverb and delay effects in vocals. In addition, it can also separate partial harmony, but it cannot completely separate them. I added random high cut after the reverberation and delay effects in the dataset, so the model's handling of high frequencies is not particularly aggressive.
You can try listening to the performance of this model here!

How to use the model?

Try it with ZFTurbo's Music-Source-Separation-Training

Model

Configs: config_dereverb-echo_mel_band_roformer.yaml
Model: dereverb-echo_mel_band_roformer_sdr_10.0169.ckpt
Instruments: [dry, other]
Finetuned from: model_mel_band_roformer_ep_3005_sdr_11.4360.ckpt
Datasets:

Training datasets: 270 songs from opencpop and GTSinger
Validation datasets: 30 songs from my own collection
All random reverbs and delay effects are generated by this python script and sorted into the mustb18 dataset format.

Metrics: Based on the sdr value of 30 songs for validation.

Instr dry sdr: 13.1507 (Std: 4.1088)
Instr dry l1_freq: 53.7715 (Std: 13.3363)
Instr dry si_sdr: 12.7707 (Std: 4.6134)
Instr other sdr: 6.8830 (Std: 2.5547)
Instr other l1_freq: 52.7358 (Std: 11.8587)
Instr other si_sdr: 5.9448 (Std: 2.8721)
Metric avg sdr        : 10.0169
Metric avg l1_freq    : 53.2536
Metric avg si_sdr     : 9.3577

Training log

Training logs: train.log
The following image is the TensorBoard visualization training log generated by this script.

Thanks

Mel-Band-Roformer [Paper, Repository]
ZFTurbo's training code [Music-Source-Separation-Training]
CN17161 provided GPUs.
Glucy-2 provided technical assistance.