Spaces:
Runtime error
Datasets Format
Amphion support the following academic datasets (sort alphabetically):
The downloading link and the file structure tree of each dataset is displayed as follows.
AudioCaps
AudioCaps is a dataset of around 44K audio-caption pairs, where each audio clip corresponds to a caption with rich semantic information. You can download the dataset here. The file structure tree is like:
[AudioCaps dataset path]
β£ AudioCpas
β β£ wav
β β β£ ---1_cCGK4M_0_10000.wav
β β β£ ---lTs1dxhU_30000_40000.wav
β β β£ ...
CSD
The official CSD dataset can be download here. The file structure tree is like:
[CSD dataset path]
β£ english
β£ korean
β£ utterances
β β£ en001a
β β β£ {UtterenceID}.wav
β β£ en001b
β β£ en002a
β β£ en002b
β β£ ...
β£ README
KiSing
The official KiSing dataset can be download here. The file structure tree is like:
[KiSing dataset path]
β£ clean
β β£ 421
β β£ 422
β β£ ...
LibriTTS
The official LibriTTS dataset can be download here. The file structure tree is like:
[LibriTTS dataset path]
β£ BOOKS.txt
β£ CHAPTERS.txt
β£ eval_sentences10.tsv
β£ LICENSE.txt
β£ NOTE.txt
β£ reader_book.tsv
β£ README_librispeech.txt
β£ README_libritts.txt
β£ speakers.tsv
β£ SPEAKERS.txt
β£ dev-clean (Subset)
β β£ 1272{Speaker_ID}
β β β£ 128104 {Chapter_ID}
β β β β£ 1272_128104_000001_000000.normalized.txt
β β β β£ 1272_128104_000001_000000.original.txt
β β β β£ 1272_128104_000001_000000.wav
β β β β£ ...
β β β β£ 1272_128104.book.tsv
β β β β£ 1272_128104.trans.tsv
β β β£ ...
β β£ ...
β£ dev-other (Subset)
β β£ 116 (Speaker)
β β β£ 288045 {Chapter_ID}
β β β β£ 116_288045_000003_000000.normalized.txt
β β β β£ 116_288045_000003_000000.original.txt
β β β β£ 116_288045_000003_000000.wav
β β β β£ ...
β β β β£ 116_288045.book.tsv
β β β β£ 116_288045.trans.tsv
β β β£ ...
β β£ ...
β β£ ...
β£ test-clean (Subset)
β β£ {Speaker_ID}
β β β£ {Chapter_ID}
β β β β£ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.normalized.txt
β β β β£ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.original.txt
β β β β£ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.wav
β β β β£ ...
β β β β£ {Speaker_ID}_{Chapter_ID}.book.tsv
β β β β£ {Speaker_ID}_{Chapter_ID}.trans.tsv
β β β£ ...
β β£ ...
β£ test-other
β β£ {Speaker_ID}
β β β£ {Chapter_ID}
β β β β£ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.normalized.txt
β β β β£ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.original.txt
β β β β£ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.wav
β β β β£ ...
β β β β£ {Speaker_ID}_{Chapter_ID}.book.tsv
β β β β£ {Speaker_ID}_{Chapter_ID}.trans.tsv
β β β£ ...
β β£ ...
β£ train-clean-100
β β£ {Speaker_ID}
β β β£ {Chapter_ID}
β β β β£ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.normalized.txt
β β β β£ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.original.txt
β β β β£ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.wav
β β β β£ ...
β β β β£ {Speaker_ID}_{Chapter_ID}.book.tsv
β β β β£ {Speaker_ID}_{Chapter_ID}.trans.tsv
β β β£ ...
β β£ ...
β£ train-clean-360
β β£ {Speaker_ID}
β β β£ {Chapter_ID}
β β β β£ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.normalized.txt
β β β β£ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.original.txt
β β β β£ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.wav
β β β β£ ...
β β β β£ {Speaker_ID}_{Chapter_ID}.book.tsv
β β β β£ {Speaker_ID}_{Chapter_ID}.trans.tsv
β β β£ ...
β β£ ...
β£ train-other-500
β β£ {Speaker_ID}
β β β£ {Chapter_ID}
β β β β£ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.normalized.txt
β β β β£ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.original.txt
β β β β£ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.wav
β β β β£ ...
β β β β£ {Speaker_ID}_{Chapter_ID}.book.tsv
β β β β£ {Speaker_ID}_{Chapter_ID}.trans.tsv
β β β£ ...
β β£ ...
LJSpeech
The official LibriTTS dataset can be download here. The file structure tree is like:
[LJSpeech dataset path]
β£ metadata.csv
β£ wavs
β β£ LJ001-0001.wav
β β£ LJ001-0002.wav
β β£ ...
β£ README
M4Singer
The official M4Singer dataset can be downloaded here. The file structure tree is like:
[M4Singer dataset path]
β£ {Singer_1}#{Song_1}
β β£ 0000.mid
β β£ 0000.TextGrid
β β£ 0000.wav
β β£ ...
β£ {Singer_1}#{Song_2}
β£ ...
β£ {Singer_2}#{Song_1}
β£ {Singer_2}#{Song_2}
β£ ...
β meta.json
NUS-48E
The official NUS-48E dataset can be download here. The file structure tree is like:
[NUS-48E dataset path]
β£ {SpeakerID}
β β£ read
β β β£ {SongID}.txt
β β β£ {SongID}.wav
β β β£ ...
β β£ sing
β β β£ {SongID}.txt
β β β£ {SongID}.wav
β β β£ ...
β£ ...
β£ README.txt
Opencpop
The official Opera dataset can be downloaded here. The file structure tree is like:
[Opencpop dataset path]
β£ midis
β β£ 2001.midi
β β£ 2002.midi
β β£ 2003.midi
β β£ ...
β£ segments
β β£ wavs
β β β£ 2001000001.wav
β β β£ 2001000002.wav
β β β£ 2001000003.wav
β β β£ ...
β β£ test.txt
β β£ train.txt
β β transcriptions.txt
β£ textgrids
β β£ 2001.TextGrid
β β£ 2002.TextGrid
β β£ 2003.TextGrid
β β£ ...
β£ wavs
β β£ 2001.wav
β β£ 2002.wav
β β£ 2003.wav
β β£ ...
β£ TERMS_OF_ACCESS
β readme.md
OpenSinger
The official OpenSinger dataset can be downloaded here. The file structure tree is like:
[OpenSinger dataset path]
β£ ManRaw
β β£ {Singer_1}_{Song_1}
β β β£ {Singer_1}_{Song_1}_0.lab
β β β£ {Singer_1}_{Song_1}_0.txt
β β β£ {Singer_1}_{Song_1}_0.wav
β β β£ ...
β β£ {Singer_1}_{Song_2}
β β£ ...
β£ WomanRaw
β£ LICENSE
β README.md
Opera
The official Opera dataset can be downloaded here. The file structure tree is like:
[Opera dataset path]
β£ monophonic
β β£ chinese
β β β£ {Gender}_{SingerID}
β β β β£ {Emotion}_{SongID}.wav
β β β β£ ...
β β β£ ...
β β£ western
β£ polyphonic
β β£ chinese
β β£ western
β£ CrossculturalDataSet.xlsx
PopBuTFy
The official PopBuTFy dataset can be downloaded here. The file structure tree is like:
[PopBuTFy dataset path]
β£ data
β β£ {SingerID}#singing#{SongName}_Amateur
β β β£ {SingerID}#singing#{SongName}_Amateur_{UtteranceID}.mp3
β β β£ ...
β β£ {SingerID}#singing#{SongName}_Professional
β β β£ {SingerID}#singing#{SongName}_Professional_{UtteranceID}.mp3
β β β£ ...
β£ text_labels
β TERMS_OF_ACCESS
PopCS
The official PopCS dataset can be downloaded here. The file structure tree is like:
[PopCS dataset path]
β£ popcs
β β£ popcs-{SongName}
β β β£ {UtteranceID}_ph.txt
β β β£ {UtteranceID}_wf0.wav
β β β£ {UtteranceID}.TextGrid
β β β£ {UtteranceID}.txt
β β β£ ...
β β£ ...
β TERMS_OF_ACCESS
PJS
The official PJS dataset can be downloaded here. The file structure tree is like:
[PJS dataset path]
β£ PJS_corpus_ver1.1
β β£ background_noise
β β£ pjs{SongID}
β β β£ pjs{SongID}_song.wav
β β β£ pjs{SongID}_speech.wav
β β β£ pjs{SongID}.lab
β β β£ pjs{SongID}.mid
β β β£ pjs{SongID}.musicxml
β β β£ pjs{SongID}.txt
β β£ ...
SVCC
The official SVCC dataset can be downloaded here. The file structure tree is like:
[SVCC dataset path]
β£ Data
β β£ CDF1
β β β£ 10001.wav
β β β£ 10002.wav
β β β£ ...
β β£ CDM1
β β£ IDF1
β β£ IDM1
β README.md
VCTK
The official VCTK dataset can be downloaded here. The file structure tree is like:
[VCTK dataset path]
β£ txt
β β£ {Speaker_1}
β β β£ {Speaker_1}_001.txt
β β β£ {Speaker_1}_002.txt
β β β£ ...
β β£ {Speaker_2}
β β£ ...
β£ wav48_silence_trimmed
β β£ {Speaker_1}
β β β£ {Speaker_1}_001_mic1.flac
β β β£ {Speaker_1}_001_mic2.flac
β β β£ {Speaker_1}_002_mic1.flac
β β β£ ...
β β£ {Speaker_2}
β β£ ...
β£ speaker-info.txt
β update.txt