A newer version of the Gradio SDK is available:
5.9.1
Datasets Format
Amphion support the following academic datasets (sort alphabetically):
The downloading link and the file structure tree of each dataset is displayed as follows.
Note: When using Docker to run Amphion, mount the dataset to the container is necessary after downloading. Check Mount dataset in Docker container for more details.
AudioCaps
AudioCaps is a dataset of around 44K audio-caption pairs, where each audio clip corresponds to a caption with rich semantic information.
Download AudioCaps dataset here. The file structure looks like below:
[AudioCaps dataset path]
β£ AudioCpas
β β£ wav
β β β£ ---1_cCGK4M_0_10000.wav
β β β£ ---lTs1dxhU_30000_40000.wav
β β β£ ...
CSD
Download the official CSD dataset here. The file structure looks like below:
[CSD dataset path]
β£ english
β£ korean
β£ utterances
β β£ en001a
β β β£ {UtterenceID}.wav
β β£ en001b
β β£ en002a
β β£ en002b
β β£ ...
β£ README
CustomSVCDataset
We support custom dataset for Singing Voice Conversion. Organize your data in the following structure to construct your own dataset:
[Your Custom Dataset Path]
β£ singer1
β β£ song1
β β β£ utterance1.wav
β β β£ utterance2.wav
β β β£ ...
β β£ song2
β β£ ...
β£ singer2
β£ ...
Hi-Fi TTS
Download the official Hi-Fi TTS dataset here. The file structure looks like below:
[Hi-Fi TTS dataset path]
β£ audio
β β£ 11614_other {Speaker_ID}_{SNR_subset}
β β β£ 10547 {Book_ID}
β β β β£ thousandnights8_04_anonymous_0001.flac
β β β β£ thousandnights8_04_anonymous_0003.flac
β β β β£ thousandnights8_04_anonymous_0004.flac
β β β β£ ...
β β β£ ...
β β£ ...
β£ 92_manifest_clean_dev.json
β£ 92_manifest_clean_test.json
β£ 92_manifest_clean_train.json
β£ ...
β£ {Speaker_ID}_manifest_{SNR_subset}_{dataset_split}.json
β£ ...
β£ books_bandwidth.tsv
β£ LICENSE.txt
β£ readers_books_clean.txt
β£ readers_books_other.txt
β£ README.txt
KiSing
Download the official KiSing dataset here. The file structure looks like below:
[KiSing dataset path]
β£ clean
β β£ 421
β β£ 422
β β£ ...
LibriLight
Download the official LibriLight dataset here. The file structure looks like below:
[LibriTTS dataset path]
β£ small (Subset)
β β£ 100 {Speaker_ID}
β β β£ sea_fairies_0812_librivox_64kb_mp3 {Chapter_ID}
β β β β£ 01_baum_sea_fairies_64kb.flac
β β β β£ 02_baum_sea_fairies_64kb.flac
β β β β£ 03_baum_sea_fairies_64kb.flac
β β β β£ 22_baum_sea_fairies_64kb.flac
β β β β£ 01_baum_sea_fairies_64kb.json
β β β β£ 02_baum_sea_fairies_64kb.json
β β β β£ 03_baum_sea_fairies_64kb.json
β β β β£ 22_baum_sea_fairies_64kb.json
β β β β£ ...
β β β£ ...
β β£ ...
β£ medium (Subset)
β£ ...
LibriTTS
Download the official LibriTTS dataset here. The file structure looks like below:
[LibriTTS dataset path]
β£ BOOKS.txt
β£ CHAPTERS.txt
β£ eval_sentences10.tsv
β£ LICENSE.txt
β£ NOTE.txt
β£ reader_book.tsv
β£ README_librispeech.txt
β£ README_libritts.txt
β£ speakers.tsv
β£ SPEAKERS.txt
β£ dev-clean (Subset)
β β£ 1272{Speaker_ID}
β β β£ 128104 {Chapter_ID}
β β β β£ 1272_128104_000001_000000.normalized.txt
β β β β£ 1272_128104_000001_000000.original.txt
β β β β£ 1272_128104_000001_000000.wav
β β β β£ ...
β β β β£ 1272_128104.book.tsv
β β β β£ 1272_128104.trans.tsv
β β β£ ...
β β£ ...
β£ dev-other (Subset)
β β£ 116 (Speaker)
β β β£ 288045 {Chapter_ID}
β β β β£ 116_288045_000003_000000.normalized.txt
β β β β£ 116_288045_000003_000000.original.txt
β β β β£ 116_288045_000003_000000.wav
β β β β£ ...
β β β β£ 116_288045.book.tsv
β β β β£ 116_288045.trans.tsv
β β β£ ...
β β£ ...
β β£ ...
β£ test-clean (Subset)
β β£ {Speaker_ID}
β β β£ {Chapter_ID}
β β β β£ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.normalized.txt
β β β β£ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.original.txt
β β β β£ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.wav
β β β β£ ...
β β β β£ {Speaker_ID}_{Chapter_ID}.book.tsv
β β β β£ {Speaker_ID}_{Chapter_ID}.trans.tsv
β β β£ ...
β β£ ...
β£ test-other
β β£ {Speaker_ID}
β β β£ {Chapter_ID}
β β β β£ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.normalized.txt
β β β β£ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.original.txt
β β β β£ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.wav
β β β β£ ...
β β β β£ {Speaker_ID}_{Chapter_ID}.book.tsv
β β β β£ {Speaker_ID}_{Chapter_ID}.trans.tsv
β β β£ ...
β β£ ...
β£ train-clean-100
β β£ {Speaker_ID}
β β β£ {Chapter_ID}
β β β β£ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.normalized.txt
β β β β£ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.original.txt
β β β β£ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.wav
β β β β£ ...
β β β β£ {Speaker_ID}_{Chapter_ID}.book.tsv
β β β β£ {Speaker_ID}_{Chapter_ID}.trans.tsv
β β β£ ...
β β£ ...
β£ train-clean-360
β β£ {Speaker_ID}
β β β£ {Chapter_ID}
β β β β£ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.normalized.txt
β β β β£ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.original.txt
β β β β£ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.wav
β β β β£ ...
β β β β£ {Speaker_ID}_{Chapter_ID}.book.tsv
β β β β£ {Speaker_ID}_{Chapter_ID}.trans.tsv
β β β£ ...
β β£ ...
β£ train-other-500
β β£ {Speaker_ID}
β β β£ {Chapter_ID}
β β β β£ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.normalized.txt
β β β β£ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.original.txt
β β β β£ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.wav
β β β β£ ...
β β β β£ {Speaker_ID}_{Chapter_ID}.book.tsv
β β β β£ {Speaker_ID}_{Chapter_ID}.trans.tsv
β β β£ ...
β β£ ...
LJSpeech
Download the official LJSpeech dataset here. The file structure looks like below:
[LJSpeech dataset path]
β£ metadata.csv
β£ wavs
β β£ LJ001-0001.wav
β β£ LJ001-0002.wav
β β£ ...
β£ README
M4Singer
Download the official M4Singer dataset here. The file structure looks like below:
[M4Singer dataset path]
β£ {Singer_1}#{Song_1}
β β£ 0000.mid
β β£ 0000.TextGrid
β β£ 0000.wav
β β£ ...
β£ {Singer_1}#{Song_2}
β£ ...
β£ {Singer_2}#{Song_1}
β£ {Singer_2}#{Song_2}
β£ ...
β meta.json
NUS-48E
Download the official NUS-48E dataset here. The file structure looks like below:
[NUS-48E dataset path]
β£ {SpeakerID}
β β£ read
β β β£ {SongID}.txt
β β β£ {SongID}.wav
β β β£ ...
β β£ sing
β β β£ {SongID}.txt
β β β£ {SongID}.wav
β β β£ ...
β£ ...
β£ README.txt
Opencpop
Download the official Opencpop dataset here. The file structure looks like below:
[Opencpop dataset path]
β£ midis
β β£ 2001.midi
β β£ 2002.midi
β β£ 2003.midi
β β£ ...
β£ segments
β β£ wavs
β β β£ 2001000001.wav
β β β£ 2001000002.wav
β β β£ 2001000003.wav
β β β£ ...
β β£ test.txt
β β£ train.txt
β β transcriptions.txt
β£ textgrids
β β£ 2001.TextGrid
β β£ 2002.TextGrid
β β£ 2003.TextGrid
β β£ ...
β£ wavs
β β£ 2001.wav
β β£ 2002.wav
β β£ 2003.wav
β β£ ...
β£ TERMS_OF_ACCESS
β readme.md
OpenSinger
Download the official OpenSinger dataset here. The file structure looks like below:
[OpenSinger dataset path]
β£ ManRaw
β β£ {Singer_1}_{Song_1}
β β β£ {Singer_1}_{Song_1}_0.lab
β β β£ {Singer_1}_{Song_1}_0.txt
β β β£ {Singer_1}_{Song_1}_0.wav
β β β£ ...
β β£ {Singer_1}_{Song_2}
β β£ ...
β£ WomanRaw
β£ LICENSE
β README.md
Opera
Download the official Opera dataset here. The file structure looks like below:
[Opera dataset path]
β£ monophonic
β β£ chinese
β β β£ {Gender}_{SingerID}
β β β β£ {Emotion}_{SongID}.wav
β β β β£ ...
β β β£ ...
β β£ western
β£ polyphonic
β β£ chinese
β β£ western
β£ CrossculturalDataSet.xlsx
PopBuTFy
Download the official PopBuTFy dataset here. The file structure looks like below:
[PopBuTFy dataset path]
β£ data
β β£ {SingerID}#singing#{SongName}_Amateur
β β β£ {SingerID}#singing#{SongName}_Amateur_{UtteranceID}.mp3
β β β£ ...
β β£ {SingerID}#singing#{SongName}_Professional
β β β£ {SingerID}#singing#{SongName}_Professional_{UtteranceID}.mp3
β β β£ ...
β£ text_labels
β TERMS_OF_ACCESS
PopCS
Download the official PopCS dataset here. The file structure looks like below:
[PopCS dataset path]
β£ popcs
β β£ popcs-{SongName}
β β β£ {UtteranceID}_ph.txt
β β β£ {UtteranceID}_wf0.wav
β β β£ {UtteranceID}.TextGrid
β β β£ {UtteranceID}.txt
β β β£ ...
β β£ ...
β TERMS_OF_ACCESS
PJS
Download the official PJS dataset here. The file structure looks like below:
[PJS dataset path]
β£ PJS_corpus_ver1.1
β β£ background_noise
β β£ pjs{SongID}
β β β£ pjs{SongID}_song.wav
β β β£ pjs{SongID}_speech.wav
β β β£ pjs{SongID}.lab
β β β£ pjs{SongID}.mid
β β β£ pjs{SongID}.musicxml
β β β£ pjs{SongID}.txt
β β£ ...
SVCC
Download the official SVCC dataset here. The file structure looks like below:
[SVCC dataset path]
β£ Data
β β£ CDF1
β β β£ 10001.wav
β β β£ 10002.wav
β β β£ ...
β β£ CDM1
β β£ IDF1
β β£ IDM1
β README.md
VCTK
Download the official VCTK dataset here. The file structure looks like below:
[VCTK dataset path]
β£ txt
β β£ {Speaker_1}
β β β£ {Speaker_1}_001.txt
β β β£ {Speaker_1}_002.txt
β β β£ ...
β β£ {Speaker_2}
β β£ ...
β£ wav48_silence_trimmed
β β£ {Speaker_1}
β β β£ {Speaker_1}_001_mic1.flac
β β β£ {Speaker_1}_001_mic2.flac
β β β£ {Speaker_1}_002_mic1.flac
β β β£ ...
β β£ {Speaker_2}
β β£ ...
β£ speaker-info.txt
β update.txt