Currently it's an experimental model!
How to use the model?
Try it with ZFTurbo's Music-Source-Separation-Training
Description
Recently, I attempted to train a model for separating male and female voices in choir singing, and the results were quite good, far exceeding my expectations. However, due to the lack of a certain degree of universality in the training and validation data (all the training and validation data used were Chinese songs), I personally classify this model as an experimental model.
The model can separate the male and female voices in a chorus. However, if male and female are singing at intervals (one by one), they cannot be separated. The model separation effect can be heard here!
I used a total of 750 songs for training, of which 700 were used as the training set and 50 as the validation set. All the songs are from opencpop and m4singer datasets. Fine tuning training from model_bs_roformer_ep_317_sdr_12.9755.ckpt
Of these, model_chorus_bs_roformer_ep_267_sdr_24.1275.ckpt
has the following validation values
Train epoch: 267
Instr male sdr: 24.4762 (Std: 1.5505)
Instr female sdr: 23.7788 (Std: 1.5168)
Metric avg sdr : 24.1275
Thanks
Thanks to CN17161 for the GPU math support!