jaCappella
/

DPTNet_jaCappella_VES_48k

Model card Files Files and versions Community

DPTNet_jaCappella_VES_48k / README.md

tnkmr's picture

Update README.md

00e6e62 verified 5 months ago

|

history blame contribute delete

No virus

1.96 kB

	---
	license: cc-by-nc-4.0
	language:
	- ja
	tags:
	- music
	- speech
	- audio
	- audio-to-audio
	- a cappella
	- vocal ensemble
	datasets:
	- jaCappella
	metrics:
	- SI-SDR
	---

	# DPTNet trained with the jaCappella corpus for vocal ensemble separation

	This model was trained by Tomohiko Nakamura using [the codebase](https://github.com/TomohikoNakamura/asteroid_jaCappella)).
	It was trained on the vocal ensemble separation task of [the jaCappella dataset](https://tomohikonakamura.github.io/jaCappella_corpus/).
	[The paper](https://doi.org/10.1109/ICASSP49357.2023.10095569) was published in ICASSP 2023 ([arXiv](https://arxiv.org/abs/2211.16028)).

	# License
	See [the jaCappella dataset page](https://tomohikonakamura.github.io/jaCappella_corpus/).

	# Citation
	See [the jaCappella dataset page](https://tomohikonakamura.github.io/jaCappella_corpus/).

	# Configuration
	```yaml
	data:
	num_workers: 12
	sample_rate: 48000
	samples_per_track: 13
	seed: 42
	seq_dur: 5.046
	source_augmentations:
	- gain
	sources:
	- vocal_percussion
	- bass
	- alto
	- tenor
	- soprano
	- lead_vocal
	filterbank:
	kernel_size: 32
	n_filters: 64
	stride: 16
	masknet:
	bidirectional: true
	chunk_size: 174
	dropout: 0
	ff_activation: relu
	ff_hid: 256
	hop_size: 128
	in_chan: 64
	mask_act: sigmoid
	n_repeats: 8
	n_src: 6
	norm_type: gLN
	out_chan: 64
	optim:
	lr: 0.005
	optimizer: adam
	weight_decay: 1.0e-05
	training:
	batch_size: 1
	early_stop: true
	epochs: 600
	gradient_clipping: 5
	half_lr: true
	loss_func: pit_sisdr
	```
	# Results (SI-SDR [dB]) on vocal ensemble separation

	\| Method \| Lead vocal \| Soprano \| Alto \| Tenor \| Bass \|Vocal percussion\|
	\|:---------------:\|:--------------:\|:--------------:\|:--------------:\|:--------------:\|:--------------:\|:--------------:\|
	\| DPTNet \| 8.9 \| 8.5 \| 11.9 \| 14.9 \| 19.7 \| 21.9 \|