Text-to-Speech

Sleeping

App Files Files Community

Text-to-Speech / egs /datasets /README.md

zyingt

Upload 685 files

0d80816 11 months ago

preview code

raw

history blame contribute delete

9.97 kB

	# Datasets Format

	Amphion support the following academic datasets (sort alphabetically):

	- [Datasets Format](#datasets-format)
	- [AudioCaps](#audiocaps)
	- [CSD](#csd)
	- [KiSing](#kising)
	- [LibriTTS](#libritts)
	- [LJSpeech](#ljspeech)
	- [M4Singer](#m4singer)
	- [NUS-48E](#nus-48e)
	- [Opencpop](#opencpop)
	- [OpenSinger](#opensinger)
	- [Opera](#opera)
	- [PopBuTFy](#popbutfy)
	- [PopCS](#popcs)
	- [PJS](#pjs)
	- [SVCC](#svcc)
	- [VCTK](#vctk)

	The downloading link and the file structure tree of each dataset is displayed as follows.

	## AudioCaps

	AudioCaps is a dataset of around 44K audio-caption pairs, where each audio clip corresponds to a caption with rich semantic information. You can download the dataset [here](https://github.com/cdjkim/audiocaps). The file structure tree is like:

	```plaintext
	[AudioCaps dataset path]
	┣ AudioCpas
	┃ ┣ wav
	┃ ┃ ┣ ---1_cCGK4M_0_10000.wav
	┃ ┃ ┣ ---lTs1dxhU_30000_40000.wav
	┃ ┃ ┣ ...
	```

	## CSD

	The official CSD dataset can be download [here](https://zenodo.org/records/4785016). The file structure tree is like:

	```plaintext
	[CSD dataset path]
	┣ english
	┣ korean
	┣ utterances
	┃ ┣ en001a
	┃ ┃ ┣ {UtterenceID}.wav
	┃ ┣ en001b
	┃ ┣ en002a
	┃ ┣ en002b
	┃ ┣ ...
	┣ README
	```

	## KiSing

	The official KiSing dataset can be download [here](http://shijt.site/index.php/2021/05/16/kising-the-first-open-source-mandarin-singing-voice-synthesis-corpus/). The file structure tree is like:

	```plaintext
	[KiSing dataset path]
	┣ clean
	┃ ┣ 421
	┃ ┣ 422
	┃ ┣ ...
	```

	## LibriTTS

	The official LibriTTS dataset can be download [here](https://www.openslr.org/60/). The file structure tree is like:

	```plaintext
	[LibriTTS dataset path]
	┣ BOOKS.txt
	┣ CHAPTERS.txt
	┣ eval_sentences10.tsv
	┣ LICENSE.txt
	┣ NOTE.txt
	┣ reader_book.tsv
	┣ README_librispeech.txt
	┣ README_libritts.txt
	┣ speakers.tsv
	┣ SPEAKERS.txt
	┣ dev-clean (Subset)
	┃ ┣ 1272{Speaker_ID}
	┃ ┃ ┣ 128104 {Chapter_ID}
	┃ ┃ ┃ ┣ 1272_128104_000001_000000.normalized.txt
	┃ ┃ ┃ ┣ 1272_128104_000001_000000.original.txt
	┃ ┃ ┃ ┣ 1272_128104_000001_000000.wav
	┃ ┃ ┃ ┣ ...
	┃ ┃ ┃ ┣ 1272_128104.book.tsv
	┃ ┃ ┃ ┣ 1272_128104.trans.tsv
	┃ ┃ ┣ ...
	┃ ┣ ...
	┣ dev-other (Subset)
	┃ ┣ 116 (Speaker)
	┃ ┃ ┣ 288045 {Chapter_ID}
	┃ ┃ ┃ ┣ 116_288045_000003_000000.normalized.txt
	┃ ┃ ┃ ┣ 116_288045_000003_000000.original.txt
	┃ ┃ ┃ ┣ 116_288045_000003_000000.wav
	┃ ┃ ┃ ┣ ...
	┃ ┃ ┃ ┣ 116_288045.book.tsv
	┃ ┃ ┃ ┣ 116_288045.trans.tsv
	┃ ┃ ┣ ...
	┃ ┣ ...
	┃ ┣ ...
	┣ test-clean (Subset)
	┃ ┣ {Speaker_ID}
	┃ ┃ ┣ {Chapter_ID}
	┃ ┃ ┃ ┣ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.normalized.txt
	┃ ┃ ┃ ┣ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.original.txt
	┃ ┃ ┃ ┣ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.wav
	┃ ┃ ┃ ┣ ...
	┃ ┃ ┃ ┣ {Speaker_ID}_{Chapter_ID}.book.tsv
	┃ ┃ ┃ ┣ {Speaker_ID}_{Chapter_ID}.trans.tsv
	┃ ┃ ┣ ...
	┃ ┣ ...
	┣ test-other
	┃ ┣ {Speaker_ID}
	┃ ┃ ┣ {Chapter_ID}
	┃ ┃ ┃ ┣ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.normalized.txt
	┃ ┃ ┃ ┣ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.original.txt
	┃ ┃ ┃ ┣ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.wav
	┃ ┃ ┃ ┣ ...
	┃ ┃ ┃ ┣ {Speaker_ID}_{Chapter_ID}.book.tsv
	┃ ┃ ┃ ┣ {Speaker_ID}_{Chapter_ID}.trans.tsv
	┃ ┃ ┣ ...
	┃ ┣ ...
	┣ train-clean-100
	┃ ┣ {Speaker_ID}
	┃ ┃ ┣ {Chapter_ID}
	┃ ┃ ┃ ┣ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.normalized.txt
	┃ ┃ ┃ ┣ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.original.txt
	┃ ┃ ┃ ┣ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.wav
	┃ ┃ ┃ ┣ ...
	┃ ┃ ┃ ┣ {Speaker_ID}_{Chapter_ID}.book.tsv
	┃ ┃ ┃ ┣ {Speaker_ID}_{Chapter_ID}.trans.tsv
	┃ ┃ ┣ ...
	┃ ┣ ...
	┣ train-clean-360
	┃ ┣ {Speaker_ID}
	┃ ┃ ┣ {Chapter_ID}
	┃ ┃ ┃ ┣ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.normalized.txt
	┃ ┃ ┃ ┣ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.original.txt
	┃ ┃ ┃ ┣ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.wav
	┃ ┃ ┃ ┣ ...
	┃ ┃ ┃ ┣ {Speaker_ID}_{Chapter_ID}.book.tsv
	┃ ┃ ┃ ┣ {Speaker_ID}_{Chapter_ID}.trans.tsv
	┃ ┃ ┣ ...
	┃ ┣ ...
	┣ train-other-500
	┃ ┣ {Speaker_ID}
	┃ ┃ ┣ {Chapter_ID}
	┃ ┃ ┃ ┣ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.normalized.txt
	┃ ┃ ┃ ┣ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.original.txt
	┃ ┃ ┃ ┣ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.wav
	┃ ┃ ┃ ┣ ...
	┃ ┃ ┃ ┣ {Speaker_ID}_{Chapter_ID}.book.tsv
	┃ ┃ ┃ ┣ {Speaker_ID}_{Chapter_ID}.trans.tsv
	┃ ┃ ┣ ...
	┃ ┣ ...
	```


	## LJSpeech

	The official LibriTTS dataset can be download [here](https://keithito.com/LJ-Speech-Dataset/). The file structure tree is like:

	```plaintext
	[LJSpeech dataset path]
	┣ metadata.csv
	┣ wavs
	┃ ┣ LJ001-0001.wav
	┃ ┣ LJ001-0002.wav
	┃ ┣ ...
	┣ README
	```

	## M4Singer

	The official M4Singer dataset can be downloaded [here](https://drive.google.com/file/d/1xC37E59EWRRFFLdG3aJkVqwtLDgtFNqW/view). The file structure tree is like:

	```plaintext
	[M4Singer dataset path]
	┣ {Singer_1}#{Song_1}
	┃ ┣ 0000.mid
	┃ ┣ 0000.TextGrid
	┃ ┣ 0000.wav
	┃ ┣ ...
	┣ {Singer_1}#{Song_2}
	┣ ...
	┣ {Singer_2}#{Song_1}
	┣ {Singer_2}#{Song_2}
	┣ ...
	┗ meta.json
	```

	## NUS-48E

	The official NUS-48E dataset can be download [here](https://drive.google.com/drive/folders/12pP9uUl0HTVANU3IPLnumTJiRjPtVUMx). The file structure tree is like:

	```plaintext
	[NUS-48E dataset path]
	┣ {SpeakerID}
	┃ ┣ read
	┃ ┃ ┣ {SongID}.txt
	┃ ┃ ┣ {SongID}.wav
	┃ ┃ ┣ ...
	┃ ┣ sing
	┃ ┃ ┣ {SongID}.txt
	┃ ┃ ┣ {SongID}.wav
	┃ ┃ ┣ ...
	┣ ...
	┣ README.txt

	```

	## Opencpop

	The official Opera dataset can be downloaded [here](https://wenet.org.cn/opencpop/). The file structure tree is like:

	```plaintext
	[Opencpop dataset path]
	┣ midis
	┃ ┣ 2001.midi
	┃ ┣ 2002.midi
	┃ ┣ 2003.midi
	┃ ┣ ...
	┣ segments
	┃ ┣ wavs
	┃ ┃ ┣ 2001000001.wav
	┃ ┃ ┣ 2001000002.wav
	┃ ┃ ┣ 2001000003.wav
	┃ ┃ ┣ ...
	┃ ┣ test.txt
	┃ ┣ train.txt
	┃ ┗ transcriptions.txt
	┣ textgrids
	┃ ┣ 2001.TextGrid
	┃ ┣ 2002.TextGrid
	┃ ┣ 2003.TextGrid
	┃ ┣ ...
	┣ wavs
	┃ ┣ 2001.wav
	┃ ┣ 2002.wav
	┃ ┣ 2003.wav
	┃ ┣ ...
	┣ TERMS_OF_ACCESS
	┗ readme.md
	```

	## OpenSinger

	The official OpenSinger dataset can be downloaded [here](https://drive.google.com/file/d/1EofoZxvalgMjZqzUEuEdleHIZ6SHtNuK/view). The file structure tree is like:

	```plaintext
	[OpenSinger dataset path]
	┣ ManRaw
	┃ ┣ {Singer_1}_{Song_1}
	┃ ┃ ┣ {Singer_1}_{Song_1}_0.lab
	┃ ┃ ┣ {Singer_1}_{Song_1}_0.txt
	┃ ┃ ┣ {Singer_1}_{Song_1}_0.wav
	┃ ┃ ┣ ...
	┃ ┣ {Singer_1}_{Song_2}
	┃ ┣ ...
	┣ WomanRaw
	┣ LICENSE
	┗ README.md
	```

	## Opera

	The official Opera dataset can be downloaded [here](http://isophonics.net/SingingVoiceDataset). The file structure tree is like:

	```plaintext
	[Opera dataset path]
	┣ monophonic
	┃ ┣ chinese
	┃ ┃ ┣ {Gender}_{SingerID}
	┃ ┃ ┃ ┣ {Emotion}_{SongID}.wav
	┃ ┃ ┃ ┣ ...
	┃ ┃ ┣ ...
	┃ ┣ western
	┣ polyphonic
	┃ ┣ chinese
	┃ ┣ western
	┣ CrossculturalDataSet.xlsx
	```

	## PopBuTFy

	The official PopBuTFy dataset can be downloaded [here](https://github.com/MoonInTheRiver/NeuralSVB). The file structure tree is like:

	```plaintext
	[PopBuTFy dataset path]
	┣ data
	┃ ┣ {SingerID}#singing#{SongName}_Amateur
	┃ ┃ ┣ {SingerID}#singing#{SongName}_Amateur_{UtteranceID}.mp3
	┃ ┃ ┣ ...
	┃ ┣ {SingerID}#singing#{SongName}_Professional
	┃ ┃ ┣ {SingerID}#singing#{SongName}_Professional_{UtteranceID}.mp3
	┃ ┃ ┣ ...
	┣ text_labels
	┗ TERMS_OF_ACCESS
	```

	## PopCS

	The official PopCS dataset can be downloaded [here](https://github.com/MoonInTheRiver/DiffSinger/blob/master/resources/apply_form.md). The file structure tree is like:

	```plaintext
	[PopCS dataset path]
	┣ popcs
	┃ ┣ popcs-{SongName}
	┃ ┃ ┣ {UtteranceID}_ph.txt
	┃ ┃ ┣ {UtteranceID}_wf0.wav
	┃ ┃ ┣ {UtteranceID}.TextGrid
	┃ ┃ ┣ {UtteranceID}.txt
	┃ ┃ ┣ ...
	┃ ┣ ...
	┗ TERMS_OF_ACCESS
	```

	## PJS

	The official PJS dataset can be downloaded [here](https://sites.google.com/site/shinnosuketakamichi/research-topics/pjs_corpus). The file structure tree is like:

	```plaintext
	[PJS dataset path]
	┣ PJS_corpus_ver1.1
	┃ ┣ background_noise
	┃ ┣ pjs{SongID}
	┃ ┃ ┣ pjs{SongID}_song.wav
	┃ ┃ ┣ pjs{SongID}_speech.wav
	┃ ┃ ┣ pjs{SongID}.lab
	┃ ┃ ┣ pjs{SongID}.mid
	┃ ┃ ┣ pjs{SongID}.musicxml
	┃ ┃ ┣ pjs{SongID}.txt
	┃ ┣ ...
	```

	## SVCC

	The official SVCC dataset can be downloaded [here](https://github.com/lesterphillip/SVCC23_FastSVC/tree/main/egs/generate_dataset). The file structure tree is like:

	```plaintext
	[SVCC dataset path]
	┣ Data
	┃ ┣ CDF1
	┃ ┃ ┣ 10001.wav
	┃ ┃ ┣ 10002.wav
	┃ ┃ ┣ ...
	┃ ┣ CDM1
	┃ ┣ IDF1
	┃ ┣ IDM1
	┗ README.md
	```

	## VCTK

	The official VCTK dataset can be downloaded [here](https://datashare.ed.ac.uk/handle/10283/3443). The file structure tree is like:

	```plaintext
	[VCTK dataset path]
	┣ txt
	┃ ┣ {Speaker_1}
	┃ ┃ ┣ {Speaker_1}_001.txt
	┃ ┃ ┣ {Speaker_1}_002.txt
	┃ ┃ ┣ ...
	┃ ┣ {Speaker_2}
	┃ ┣ ...
	┣ wav48_silence_trimmed
	┃ ┣ {Speaker_1}
	┃ ┃ ┣ {Speaker_1}_001_mic1.flac
	┃ ┃ ┣ {Speaker_1}_001_mic2.flac
	┃ ┃ ┣ {Speaker_1}_002_mic1.flac
	┃ ┃ ┣ ...
	┃ ┣ {Speaker_2}
	┃ ┣ ...
	┣ speaker-info.txt
	┗ update.txt
	```