Chinese
Edit model card

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

The Models are available for download for non-commercial purposes . Terms of Access: The researcher has requested permission to use the models. In exchange for such permission, the researcher hereby agrees to the following terms and conditions:

  1. Researcher shall use the models only for non-commercial research and educational purposes.
  2. The authors make no representations or warranties regarding the models, including but not limited to warranties of non-infringement or fitness for a particular purpose.
  3. Researcher accepts full responsibility for his or her use of the models and shall defend and indemnify the authors of the models, including their employees, Trustees, officers and agents, against any and all claims arising from Researcher's use of the models, including but not limited to Researcher's use of any copies of copyrighted models files that he or she may create from the models.
    4.Researcher may provide research associates and colleagues with access to the models provided that they first agree to be bound by these terms and conditions.
  4. The authors reserve the right to terminate Researcher's access to the models at any time.
  5. If Researcher is employed by a for-profit, commercial entity, Researcher's employer shall also be bound by these terms and conditions, and Researcher hereby represents that he or she is fully authorized to enter into this agreement on behalf of such employer.

Log in or Sign Up to review the conditions and access this model content.

ISCSLP2024 Conversational Voice Clone Challenge(CoVoC) baseline model.

There are two baseline models in this competition.

VALL-E:

VALL-E is trained using Amphion.

First, training is performed on the Wenetspeech4TTS dataset, and the model weight is valle_base_model.bin.

After that, fine-tuning is performed on the HQ-Conversations dataset, the model weight is valle_HQ-sft_model.bin.

For specific inference code, please refer to ISCSLP2024_CoVoC_baseline Github for more details.

fish-speech:

An open-source speech model, fish-speech, whose LLAMA and vits_decoder are fine-tuned using the HQ-Conversations dataset.

The training follows the default configuration of fish-speech.

For specific training code, please refer to Fish Speech Github for more details.

Downloads last month

-

Downloads are not tracked for this model. How to track
Unable to determine this model's library. Check the docs .

Dataset used to train kxxia/ISCSLP2024_CoVoC_basemodel