--- license: mit datasets: - Wenetspeech4TTS/WenetSpeech4TTS language: - zh pipeline_tag: text-to-speech --- ## The vanilla VALL E train on WenetSpeech4TTS using Amphion tooltik. The entire training process follows its training code, except that the text-to-phoneme feature step is slightly different. ### Checkpoints - **base_model.bin** : VALL-E trained with the WenetSpeech4TTS Basic subset - **38sft_model.bin** : VALL-E Basic fine-tuning with the WenetSpeech4TTS Standard subset - **4sft_model.bin** : VALL-E Standard fine-tuning with the WenetSpeech4TTS Premium subset ### usage Inference code and more details : [ISCSLP2024_CoVoC_baseline](https://github.com/xkx-hub/ISCSLP2024_CoVoC_baseline). ``` ```