Spaces:
Sleeping
Sleeping
# Datasets Format | |
Amphion support the following academic datasets (sort alphabetically): | |
- [Datasets Format](#datasets-format) | |
- [AudioCaps](#audiocaps) | |
- [CSD](#csd) | |
- [KiSing](#kising) | |
- [LibriTTS](#libritts) | |
- [LJSpeech](#ljspeech) | |
- [M4Singer](#m4singer) | |
- [NUS-48E](#nus-48e) | |
- [Opencpop](#opencpop) | |
- [OpenSinger](#opensinger) | |
- [Opera](#opera) | |
- [PopBuTFy](#popbutfy) | |
- [PopCS](#popcs) | |
- [PJS](#pjs) | |
- [SVCC](#svcc) | |
- [VCTK](#vctk) | |
The downloading link and the file structure tree of each dataset is displayed as follows. | |
## AudioCaps | |
AudioCaps is a dataset of around 44K audio-caption pairs, where each audio clip corresponds to a caption with rich semantic information. You can download the dataset [here](https://github.com/cdjkim/audiocaps). The file structure tree is like: | |
```plaintext | |
[AudioCaps dataset path] | |
β£ AudioCpas | |
βΒ Β β£ wav | |
β β β£ ---1_cCGK4M_0_10000.wav | |
β β β£ ---lTs1dxhU_30000_40000.wav | |
β β β£ ... | |
``` | |
## CSD | |
The official CSD dataset can be download [here](https://zenodo.org/records/4785016). The file structure tree is like: | |
```plaintext | |
[CSD dataset path] | |
β£ english | |
β£ korean | |
β£ utterances | |
β β£ en001a | |
β β β£ {UtterenceID}.wav | |
β β£ en001b | |
β β£ en002a | |
β β£ en002b | |
β β£ ... | |
β£ README | |
``` | |
## KiSing | |
The official KiSing dataset can be download [here](http://shijt.site/index.php/2021/05/16/kising-the-first-open-source-mandarin-singing-voice-synthesis-corpus/). The file structure tree is like: | |
```plaintext | |
[KiSing dataset path] | |
β£ clean | |
β β£ 421 | |
β β£ 422 | |
β β£ ... | |
``` | |
## LibriTTS | |
The official LibriTTS dataset can be download [here](https://www.openslr.org/60/). The file structure tree is like: | |
```plaintext | |
[LibriTTS dataset path] | |
β£ BOOKS.txt | |
β£ CHAPTERS.txt | |
β£ eval_sentences10.tsv | |
β£ LICENSE.txt | |
β£ NOTE.txt | |
β£ reader_book.tsv | |
β£ README_librispeech.txt | |
β£ README_libritts.txt | |
β£ speakers.tsv | |
β£ SPEAKERS.txt | |
β£ dev-clean (Subset) | |
β β£ 1272{Speaker_ID} | |
β β β£ 128104 {Chapter_ID} | |
β β β β£ 1272_128104_000001_000000.normalized.txt | |
β β β β£ 1272_128104_000001_000000.original.txt | |
β β β β£ 1272_128104_000001_000000.wav | |
β β β β£ ... | |
β β β β£ 1272_128104.book.tsv | |
β β β β£ 1272_128104.trans.tsv | |
β β β£ ... | |
β β£ ... | |
β£ dev-other (Subset) | |
β β£ 116 (Speaker) | |
β β β£ 288045 {Chapter_ID} | |
β β β β£ 116_288045_000003_000000.normalized.txt | |
β β β β£ 116_288045_000003_000000.original.txt | |
β β β β£ 116_288045_000003_000000.wav | |
β β β β£ ... | |
β β β β£ 116_288045.book.tsv | |
β β β β£ 116_288045.trans.tsv | |
β β β£ ... | |
β β£ ... | |
β β£ ... | |
β£ test-clean (Subset) | |
β β£ {Speaker_ID} | |
β β β£ {Chapter_ID} | |
β β β β£ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.normalized.txt | |
β β β β£ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.original.txt | |
β β β β£ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.wav | |
β β β β£ ... | |
β β β β£ {Speaker_ID}_{Chapter_ID}.book.tsv | |
β β β β£ {Speaker_ID}_{Chapter_ID}.trans.tsv | |
β β β£ ... | |
β β£ ... | |
β£ test-other | |
β β£ {Speaker_ID} | |
β β β£ {Chapter_ID} | |
β β β β£ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.normalized.txt | |
β β β β£ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.original.txt | |
β β β β£ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.wav | |
β β β β£ ... | |
β β β β£ {Speaker_ID}_{Chapter_ID}.book.tsv | |
β β β β£ {Speaker_ID}_{Chapter_ID}.trans.tsv | |
β β β£ ... | |
β β£ ... | |
β£ train-clean-100 | |
β β£ {Speaker_ID} | |
β β β£ {Chapter_ID} | |
β β β β£ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.normalized.txt | |
β β β β£ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.original.txt | |
β β β β£ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.wav | |
β β β β£ ... | |
β β β β£ {Speaker_ID}_{Chapter_ID}.book.tsv | |
β β β β£ {Speaker_ID}_{Chapter_ID}.trans.tsv | |
β β β£ ... | |
β β£ ... | |
β£ train-clean-360 | |
β β£ {Speaker_ID} | |
β β β£ {Chapter_ID} | |
β β β β£ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.normalized.txt | |
β β β β£ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.original.txt | |
β β β β£ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.wav | |
β β β β£ ... | |
β β β β£ {Speaker_ID}_{Chapter_ID}.book.tsv | |
β β β β£ {Speaker_ID}_{Chapter_ID}.trans.tsv | |
β β β£ ... | |
β β£ ... | |
β£ train-other-500 | |
β β£ {Speaker_ID} | |
β β β£ {Chapter_ID} | |
β β β β£ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.normalized.txt | |
β β β β£ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.original.txt | |
β β β β£ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.wav | |
β β β β£ ... | |
β β β β£ {Speaker_ID}_{Chapter_ID}.book.tsv | |
β β β β£ {Speaker_ID}_{Chapter_ID}.trans.tsv | |
β β β£ ... | |
β β£ ... | |
``` | |
## LJSpeech | |
The official LibriTTS dataset can be download [here](https://keithito.com/LJ-Speech-Dataset/). The file structure tree is like: | |
```plaintext | |
[LJSpeech dataset path] | |
β£ metadata.csv | |
β£ wavs | |
β β£ LJ001-0001.wav | |
β β£ LJ001-0002.wav | |
β β£ ... | |
β£ README | |
``` | |
## M4Singer | |
The official M4Singer dataset can be downloaded [here](https://drive.google.com/file/d/1xC37E59EWRRFFLdG3aJkVqwtLDgtFNqW/view). The file structure tree is like: | |
```plaintext | |
[M4Singer dataset path] | |
β£ {Singer_1}#{Song_1} | |
β β£ 0000.mid | |
β β£ 0000.TextGrid | |
β β£ 0000.wav | |
β β£ ... | |
β£ {Singer_1}#{Song_2} | |
β£ ... | |
β£ {Singer_2}#{Song_1} | |
β£ {Singer_2}#{Song_2} | |
β£ ... | |
β meta.json | |
``` | |
## NUS-48E | |
The official NUS-48E dataset can be download [here](https://drive.google.com/drive/folders/12pP9uUl0HTVANU3IPLnumTJiRjPtVUMx). The file structure tree is like: | |
```plaintext | |
[NUS-48E dataset path] | |
β£ {SpeakerID} | |
β β£ read | |
β β β£ {SongID}.txt | |
β β β£ {SongID}.wav | |
β β β£ ... | |
β β£ sing | |
β β β£ {SongID}.txt | |
β β β£ {SongID}.wav | |
β β β£ ... | |
β£ ... | |
β£ README.txt | |
``` | |
## Opencpop | |
The official Opera dataset can be downloaded [here](https://wenet.org.cn/opencpop/). The file structure tree is like: | |
```plaintext | |
[Opencpop dataset path] | |
β£ midis | |
β β£ 2001.midi | |
β β£ 2002.midi | |
β β£ 2003.midi | |
β β£ ... | |
β£ segments | |
β β£ wavs | |
β β β£ 2001000001.wav | |
β β β£ 2001000002.wav | |
β β β£ 2001000003.wav | |
β β β£ ... | |
β β£ test.txt | |
β β£ train.txt | |
β β transcriptions.txt | |
β£ textgrids | |
β β£ 2001.TextGrid | |
β β£ 2002.TextGrid | |
β β£ 2003.TextGrid | |
β β£ ... | |
β£ wavs | |
β β£ 2001.wav | |
β β£ 2002.wav | |
β β£ 2003.wav | |
β β£ ... | |
β£ TERMS_OF_ACCESS | |
β readme.md | |
``` | |
## OpenSinger | |
The official OpenSinger dataset can be downloaded [here](https://drive.google.com/file/d/1EofoZxvalgMjZqzUEuEdleHIZ6SHtNuK/view). The file structure tree is like: | |
```plaintext | |
[OpenSinger dataset path] | |
β£ ManRaw | |
β β£ {Singer_1}_{Song_1} | |
β β β£ {Singer_1}_{Song_1}_0.lab | |
β β β£ {Singer_1}_{Song_1}_0.txt | |
β β β£ {Singer_1}_{Song_1}_0.wav | |
β β β£ ... | |
β β£ {Singer_1}_{Song_2} | |
β β£ ... | |
β£ WomanRaw | |
β£ LICENSE | |
β README.md | |
``` | |
## Opera | |
The official Opera dataset can be downloaded [here](http://isophonics.net/SingingVoiceDataset). The file structure tree is like: | |
```plaintext | |
[Opera dataset path] | |
β£ monophonic | |
β β£ chinese | |
β β β£ {Gender}_{SingerID} | |
β β β β£ {Emotion}_{SongID}.wav | |
β β β β£ ... | |
β β β£ ... | |
β β£ western | |
β£ polyphonic | |
β β£ chinese | |
β β£ western | |
β£ CrossculturalDataSet.xlsx | |
``` | |
## PopBuTFy | |
The official PopBuTFy dataset can be downloaded [here](https://github.com/MoonInTheRiver/NeuralSVB). The file structure tree is like: | |
```plaintext | |
[PopBuTFy dataset path] | |
β£ data | |
β β£ {SingerID}#singing#{SongName}_Amateur | |
β β β£ {SingerID}#singing#{SongName}_Amateur_{UtteranceID}.mp3 | |
β β β£ ... | |
β β£ {SingerID}#singing#{SongName}_Professional | |
β β β£ {SingerID}#singing#{SongName}_Professional_{UtteranceID}.mp3 | |
β β β£ ... | |
β£ text_labels | |
β TERMS_OF_ACCESS | |
``` | |
## PopCS | |
The official PopCS dataset can be downloaded [here](https://github.com/MoonInTheRiver/DiffSinger/blob/master/resources/apply_form.md). The file structure tree is like: | |
```plaintext | |
[PopCS dataset path] | |
β£ popcs | |
β β£ popcs-{SongName} | |
β β β£ {UtteranceID}_ph.txt | |
β β β£ {UtteranceID}_wf0.wav | |
β β β£ {UtteranceID}.TextGrid | |
β β β£ {UtteranceID}.txt | |
β β β£ ... | |
β β£ ... | |
β TERMS_OF_ACCESS | |
``` | |
## PJS | |
The official PJS dataset can be downloaded [here](https://sites.google.com/site/shinnosuketakamichi/research-topics/pjs_corpus). The file structure tree is like: | |
```plaintext | |
[PJS dataset path] | |
β£ PJS_corpus_ver1.1 | |
β β£ background_noise | |
β β£ pjs{SongID} | |
β β β£ pjs{SongID}_song.wav | |
β β β£ pjs{SongID}_speech.wav | |
β β β£ pjs{SongID}.lab | |
β β β£ pjs{SongID}.mid | |
β β β£ pjs{SongID}.musicxml | |
β β β£ pjs{SongID}.txt | |
β β£ ... | |
``` | |
## SVCC | |
The official SVCC dataset can be downloaded [here](https://github.com/lesterphillip/SVCC23_FastSVC/tree/main/egs/generate_dataset). The file structure tree is like: | |
```plaintext | |
[SVCC dataset path] | |
β£ Data | |
β β£ CDF1 | |
β β β£ 10001.wav | |
β β β£ 10002.wav | |
β β β£ ... | |
β β£ CDM1 | |
β β£ IDF1 | |
β β£ IDM1 | |
β README.md | |
``` | |
## VCTK | |
The official VCTK dataset can be downloaded [here](https://datashare.ed.ac.uk/handle/10283/3443). The file structure tree is like: | |
```plaintext | |
[VCTK dataset path] | |
β£ txt | |
β β£ {Speaker_1} | |
β β β£ {Speaker_1}_001.txt | |
β β β£ {Speaker_1}_002.txt | |
β β β£ ... | |
β β£ {Speaker_2} | |
β β£ ... | |
β£ wav48_silence_trimmed | |
β β£ {Speaker_1} | |
β β β£ {Speaker_1}_001_mic1.flac | |
β β β£ {Speaker_1}_001_mic2.flac | |
β β β£ {Speaker_1}_002_mic1.flac | |
β β β£ ... | |
β β£ {Speaker_2} | |
β β£ ... | |
β£ speaker-info.txt | |
β update.txt | |
``` | |