tnkmr's picture
Update README.md
00e6e62 verified
metadata
license: cc-by-nc-4.0
language:
  - ja
tags:
  - music
  - speech
  - audio
  - audio-to-audio
  - a cappella
  - vocal ensemble
datasets:
  - jaCappella
metrics:
  - SI-SDR

DPTNet trained with the jaCappella corpus for vocal ensemble separation

This model was trained by Tomohiko Nakamura using the codebase).
It was trained on the vocal ensemble separation task of the jaCappella dataset.
The paper was published in ICASSP 2023 (arXiv).

License

See the jaCappella dataset page.

Citation

See the jaCappella dataset page.

Configuration

data:
  num_workers: 12
  sample_rate: 48000
  samples_per_track: 13
  seed: 42
  seq_dur: 5.046
  source_augmentations:
  - gain
  sources:
  - vocal_percussion
  - bass
  - alto
  - tenor
  - soprano
  - lead_vocal
filterbank:
  kernel_size: 32
  n_filters: 64
  stride: 16
masknet:
  bidirectional: true
  chunk_size: 174
  dropout: 0
  ff_activation: relu
  ff_hid: 256
  hop_size: 128
  in_chan: 64
  mask_act: sigmoid
  n_repeats: 8
  n_src: 6
  norm_type: gLN
  out_chan: 64
optim:
  lr: 0.005
  optimizer: adam
  weight_decay: 1.0e-05
training:
  batch_size: 1
  early_stop: true
  epochs: 600
  gradient_clipping: 5
  half_lr: true
  loss_func: pit_sisdr

Results (SI-SDR [dB]) on vocal ensemble separation

Method Lead vocal Soprano Alto Tenor Bass Vocal percussion
DPTNet 8.9 8.5 11.9 14.9 19.7 21.9