File size: 2,378 Bytes
6ccd601
69bbc73
 
 
 
 
 
f921921
 
 
c485899
f921921
 
 
 
6ccd601
7a2dd5a
 
 
b571656
 
 
7a2dd5a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9d4b39c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f921921
9d4b39c
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
---
license: cc-by-nc-4.0
language:
- ja
tags:
- music
- speech
- audio
- audio-to-audio
- a cappella
- vocal ensemble
datasets:
- jaCappella
metrics:
- SI-SDR
---

# MRDLA trained with the jaCappella corpus for vocal ensemble separation

This model was trained by Tomohiko Nakamura using [the codebase](https://github.com/TomohikoNakamura/asteroid_jaCappella)).  
It was trained on the vocal ensemble separation task of [the jaCappella dataset](https://tomohikonakamura.github.io/jaCappella_corpus/).  
[The paper](https://doi.org/10.1109/ICASSP49357.2023.10095569) was published in ICASSP 2023 ([arXiv](https://arxiv.org/abs/2211.16028)).

# License
See [the jaCappella dataset page](https://tomohikonakamura.github.io/jaCappella_corpus/).

# Citation
See [the jaCappella dataset page](https://tomohikonakamura.github.io/jaCappella_corpus/).

For MRDLA, please cite the following paper.
```
@article{TNakamura202104IEEEACMTASLP,
 author={Nakamura, Tomohiko and Kozuka, Shihori and Saruwatari, Hiroshi},
 journal = {IEEE/ACM Transactions on Audio, Speech, and Language Processing},
 title = {Time-domain audio source separation with neural networks based on multiresolution analysis},
 year=2021,
 doi={10.1109/TASLP.2021.3072496},
 month=apr,
 volume=29,
 pages={1687--1701},
}
```

# Configuration

```yaml
data:
  in_memory: true
  num_workers: 12
  sample_rate: 48000
  samples_per_track: 13
  seed: 42
  seq_dur: 6.0
  source_augmentations:
  - gain
  sources:
  - vocal_percussion
  - bass
  - alto
  - tenor
  - soprano
  - lead_vocal
loss_func:
  lambda_t: 10.0
  lambda_f: 1.0
  band: high
model:
  C_dec: 64
  C_enc: 64
  C_mid: 768
  L: 12
  activation: GELU
  context: false
  f_dec: 21
  f_enc: 21
  input_length: 288000
  padding_type: reflect
  signal_ch: 1
  wavelet: haar
optim:
  lr: 0.0001
  lr_decay_gamma: 0.3
  lr_decay_patience: 50
  optimizer: adam
  patience: 1000
  weight_decay: 0.0
training:
  batch_size: 16
  epochs: 1000
```

# Results (SI-SDR [dB]) on vocal ensemble separation

|     Method      |   Lead vocal   |    Soprano     |      Alto      |     Tenor      |      Bass      |Vocal percussion|
|:---------------:|:--------------:|:--------------:|:--------------:|:--------------:|:--------------:|:--------------:|
|     MRDLA       |       8.7      |      11.8      |      14.7      |      11.3      |      10.2      |      22.1      |