Spaces:

OFA-Sys
/

OFA-Generic_Interface

Build error

App Files Files Community

OFA-Generic_Interface / fairseq /examples /speech_synthesis /docs /common_voice_example.md

logicwong

init

c9bb3f2 about 2 years ago

preview code

raw

history blame contribute delete

No virus

2 kB

	[[Back]](..)

	# Common Voice

	[Common Voice](https://commonvoice.mozilla.org/en/datasets) is a public domain speech corpus with 11.2K hours of read
	speech in 76 languages (the latest version 7.0). We provide examples for building
	[Transformer](https://arxiv.org/abs/1809.08895) models on this dataset.


	## Data preparation
	[Download](https://commonvoice.mozilla.org/en/datasets) and unpack Common Voice v4 to a path `${DATA_ROOT}/${LANG_ID}`.
	Create splits and generate audio manifests with
	```bash
	python -m examples.speech_synthesis.preprocessing.get_common_voice_audio_manifest \
	--data-root ${DATA_ROOT} \
	--lang ${LANG_ID} \
	--output-manifest-root ${AUDIO_MANIFEST_ROOT} --convert-to-wav
	```

	Then, extract log-Mel spectrograms, generate feature manifest and create data configuration YAML with
	```bash
	python -m examples.speech_synthesis.preprocessing.get_feature_manifest \
	--audio-manifest-root ${AUDIO_MANIFEST_ROOT} \
	--output-root ${FEATURE_MANIFEST_ROOT} \
	--ipa-vocab --lang ${LANG_ID}
	```
	where we use phoneme inputs (`--ipa-vocab`) as example.

	To denoise audio and trim leading/trailing silence using signal processing based VAD, run
	```bash
	for SPLIT in dev test train; do
	python -m examples.speech_synthesis.preprocessing.denoise_and_vad_audio \
	--audio-manifest ${AUDIO_MANIFEST_ROOT}/${SPLIT}.audio.tsv \
	--output-dir ${PROCESSED_DATA_ROOT} \
	--denoise --vad --vad-agg-level 2
	done
	```


	## Training
	(Please refer to [the LJSpeech example](../docs/ljspeech_example.md#transformer).)


	## Inference
	(Please refer to [the LJSpeech example](../docs/ljspeech_example.md#inference).)

	## Automatic Evaluation
	(Please refer to [the LJSpeech example](../docs/ljspeech_example.md#automatic-evaluation).)

	## Results

	\| Language \| Speakers \| --arch \| Params \| Test MCD \| Model \|
	\|---\|---\|---\|---\|---\|---\|
	\| English \| 200 \| tts_transformer \| 54M \| 3.8 \| [Download](https://dl.fbaipublicfiles.com/fairseq/s2/cv4_en200_transformer_phn.tar) \|

	[[Back]](..)