|
--- |
|
license: cc-by-nc-4.0 |
|
language: |
|
- ja |
|
tags: |
|
- music |
|
- speech |
|
- audio |
|
- audio-to-audio |
|
- a cappella |
|
- vocal ensemble |
|
datasets: |
|
- jaCappella |
|
metrics: |
|
- SI-SDR |
|
--- |
|
|
|
# MRDLA trained with the jaCappella corpus for vocal ensemble separation |
|
|
|
This model was trained by Tomohiko Nakamura using [the codebase](https://github.com/TomohikoNakamura/asteroid_jaCappella)). |
|
It was trained on the vocal ensemble separation task of [the jaCappella dataset](https://tomohikonakamura.github.io/jaCappella_corpus/). |
|
[The paper](https://doi.org/10.1109/ICASSP49357.2023.10095569) was published in ICASSP 2023 ([arXiv](https://arxiv.org/abs/2211.16028)). |
|
|
|
# License |
|
See [the jaCappella dataset page](https://tomohikonakamura.github.io/jaCappella_corpus/). |
|
|
|
# Citation |
|
See [the jaCappella dataset page](https://tomohikonakamura.github.io/jaCappella_corpus/). |
|
|
|
For MRDLA, please cite the following paper. |
|
``` |
|
@article{TNakamura202104IEEEACMTASLP, |
|
author={Nakamura, Tomohiko and Kozuka, Shihori and Saruwatari, Hiroshi}, |
|
journal = {IEEE/ACM Transactions on Audio, Speech, and Language Processing}, |
|
title = {Time-domain audio source separation with neural networks based on multiresolution analysis}, |
|
year=2021, |
|
doi={10.1109/TASLP.2021.3072496}, |
|
month=apr, |
|
volume=29, |
|
pages={1687--1701}, |
|
} |
|
``` |
|
|
|
# Configuration |
|
|
|
```yaml |
|
data: |
|
in_memory: true |
|
num_workers: 12 |
|
sample_rate: 48000 |
|
samples_per_track: 13 |
|
seed: 42 |
|
seq_dur: 6.0 |
|
source_augmentations: |
|
- gain |
|
sources: |
|
- vocal_percussion |
|
- bass |
|
- alto |
|
- tenor |
|
- soprano |
|
- lead_vocal |
|
loss_func: |
|
lambda_t: 10.0 |
|
lambda_f: 1.0 |
|
band: high |
|
model: |
|
C_dec: 64 |
|
C_enc: 64 |
|
C_mid: 768 |
|
L: 12 |
|
activation: GELU |
|
context: false |
|
f_dec: 21 |
|
f_enc: 21 |
|
input_length: 288000 |
|
padding_type: reflect |
|
signal_ch: 1 |
|
wavelet: haar |
|
optim: |
|
lr: 0.0001 |
|
lr_decay_gamma: 0.3 |
|
lr_decay_patience: 50 |
|
optimizer: adam |
|
patience: 1000 |
|
weight_decay: 0.0 |
|
training: |
|
batch_size: 16 |
|
epochs: 1000 |
|
``` |
|
|
|
# Results (SI-SDR [dB]) on vocal ensemble separation |
|
|
|
| Method | Lead vocal | Soprano | Alto | Tenor | Bass |Vocal percussion| |
|
|:---------------:|:--------------:|:--------------:|:--------------:|:--------------:|:--------------:|:--------------:| |
|
| MRDLA | 8.7 | 11.8 | 14.7 | 11.3 | 10.2 | 22.1 | |
|
|