--- license: cc-by-nc-4.0 language: - ja tags: - music - speech - audio - audio-to-audio - a cappella - vocal ensemble datasets: - jaCappella metrics: - SI-SDR --- # DPTNet trained with the jaCappella corpus for vocal ensemble separation This model was trained by Tomohiko Nakamura using [the codebase](https://github.com/TomohikoNakamura/asteroid_jaCappella)). It was trained on the vocal ensemble separation task of [the jaCappella dataset](https://tomohikonakamura.github.io/jaCappella_corpus/). [The paper](https://doi.org/10.1109/ICASSP49357.2023.10095569) was published in ICASSP 2023 ([arXiv](https://arxiv.org/abs/2211.16028)). # License See [the jaCappella dataset page](https://tomohikonakamura.github.io/jaCappella_corpus/). # Citation See [the jaCappella dataset page](https://tomohikonakamura.github.io/jaCappella_corpus/). # Configuration ```yaml data: num_workers: 12 sample_rate: 48000 samples_per_track: 13 seed: 42 seq_dur: 5.046 source_augmentations: - gain sources: - vocal_percussion - bass - alto - tenor - soprano - lead_vocal filterbank: kernel_size: 32 n_filters: 64 stride: 16 masknet: bidirectional: true chunk_size: 174 dropout: 0 ff_activation: relu ff_hid: 256 hop_size: 128 in_chan: 64 mask_act: sigmoid n_repeats: 8 n_src: 6 norm_type: gLN out_chan: 64 optim: lr: 0.005 optimizer: adam weight_decay: 1.0e-05 training: batch_size: 1 early_stop: true epochs: 600 gradient_clipping: 5 half_lr: true loss_func: pit_sisdr ``` # Results (SI-SDR [dB]) on vocal ensemble separation | Method | Lead vocal | Soprano | Alto | Tenor | Bass |Vocal percussion| |:---------------:|:--------------:|:--------------:|:--------------:|:--------------:|:--------------:|:--------------:| | DPTNet | 8.9 | 8.5 | 11.9 | 14.9 | 19.7 | 21.9 |