--- license: cc-by-nc-4.0 language: - ja tags: - music - speech - audio - audio-to-audio - a cappella - vocal ensemble datasets: - jaCappella metrics: - SI-SDR --- # X-UMX trained with the jaCappella corpus for vocal ensemble separation This model was trained by Tomohiko Nakamura using [the codebase](https://github.com/TomohikoNakamura/asteroid_jaCappella)). It was trained on the vocal ensemble separation task of [the jaCappella dataset](https://tomohikonakamura.github.io/jaCappella_corpus/). [The paper](https://doi.org/10.1109/ICASSP49357.2023.10095569) was published in ICASSP 2023 ([arXiv](https://arxiv.org/abs/2211.16028)). # License See [the jaCappella dataset page](https://tomohikonakamura.github.io/jaCappella_corpus/). # Citation See [the jaCappella dataset page](https://tomohikonakamura.github.io/jaCappella_corpus/). # Configuration ```yaml data: num_workers: 12 sample_rate: 48000 samples_per_track: 13 seed: 42 seq_dur: 6.0 source_augmentations: - gain sources: - vocal_percussion - bass - alto - tenor - soprano - lead_vocal model: bandwidth: 16000 bidirectional: true hidden_size: 512 in_chan: 4096 nb_channels: 1 nhop: 1024 pretrained: null spec_power: 1 window_length: 4096 optim: lr: 0.001 lr_decay_gamma: 0.3 lr_decay_patience: 80 optimizer: adam patience: 1000 weight_decay: 1.0e-05 training: batch_size: 16 epochs: 1000 loss_combine_sources: true loss_use_multidomain: true mix_coef: 10.0 val_dur: 80.0 ``` # Results (SI-SDR [dB]) on vocal ensemble separation | Method | Lead vocal | Soprano | Alto | Tenor | Bass |Vocal percussion| |:---------------:|:--------------:|:--------------:|:--------------:|:--------------:|:--------------:|:--------------:| | X-UMX | 7.5 | 10.7 | 13.5 | 10.2 | 9.1 | 21.0 |