TensorBoard
File size: 1,448 Bytes
573027e
38f0a65
573027e
38f0a65
8471a16
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
---
license: cc-by-nc-4.0
---

### guitar_iil_b2048_r48000_z16.ts

Dataset: [IILGuitarTimbre](https://github.com/Intelligent-Instruments-Lab/IILGuitarTimbre).

Model: modified RAVE v1, 48kHz, block size 2048, 16 latent dimensions.

### organ_archive_b2048_r48000_z16.ts  

Dataset: public domain organ music from archive.org. Small amounts of voice and other instruments were included, and vinyl record noises are prominent.

Model: modified RAVE v1, 48kHz, block size 2048, 16 latent dimensions.

### organ_bach_b2048_sr48000_z16.ts

Dataset: various recordings of J. S. Bach music for church organ.

Model: modified RAVE v1, 48kHz, block size 2048, 16 latent dimensions.

### voice_vocalset_b2048_r48000_z16.ts

Dataset: [VocalSet](https://zenodo.org/record/1193957) singing voice dataset.

Model: modified RAVE v1, 48kHz, block size 2048, 16 latent dimensions.

### voice_hifitts_b2048_r48000_z16.ts

Dataset: [Hi-Fi TTS](http://arxiv.org/abs/2104.01497) audiobooks dataset.

Model: modified RAVE v1, 48kHz, block size 2048, 16 latent dimensions.

### voice_jvs_b2048_r44100_z16.ts

Dataset: [Hi-Fi TTS](http://arxiv.org/abs/2104.01497) speaker 9017 (John Van Stan).

Model: RAVE v3, 44.1kHz, block size 2048, 16 latent dimensions.

### voice_vctk_b2048_r44100_z16.ts

Dataset: [CSTR VCTK Corpus](https://datashare.ed.ac.uk/handle/10283/3443) multispeaker read speech dataset.

Model: RAVE v3, 44.1kHz, block size 2048, 22 latent dimensions.