Papers
arxiv:2307.07218

Mega-TTS 2: Zero-Shot Text-to-Speech with Arbitrary Length Speech Prompts

Published on Jul 14, 2023
· Featured in Daily Papers on Jul 17, 2023
Authors:
,
,
,
,
,
,
,

Abstract

Zero-shot text-to-speech aims at synthesizing voices with unseen speech prompts. Previous large-scale multispeaker TTS models have successfully achieved this goal with an enrolled recording within 10 seconds. However, most of them are designed to utilize only short speech prompts. The limited information in short speech prompts significantly hinders the performance of fine-grained identity imitation. In this paper, we introduce Mega-TTS 2, a generic zero-shot multispeaker TTS model that is capable of synthesizing speech for unseen speakers with arbitrary-length prompts. Specifically, we 1) design a multi-reference timbre encoder to extract timbre information from multiple reference speeches; 2) and train a prosody language model with arbitrary-length speech prompts; With these designs, our model is suitable for prompts of different lengths, which extends the upper bound of speech quality for zero-shot text-to-speech. Besides arbitrary-length prompts, we introduce arbitrary-source prompts, which leverages the probabilities derived from multiple P-LLM outputs to produce expressive and controlled prosody. Furthermore, we propose a phoneme-level auto-regressive duration model to introduce in-context learning capabilities to duration modeling. Experiments demonstrate that our method could not only synthesize identity-preserving speech with a short prompt of an unseen speaker but also achieve improved performance with longer speech prompts. Audio samples can be found in https://mega-tts.github.io/mega2_demo/.

Community

Demo的效果令人震惊,几乎与Ground Truth一致了。

amazing result

niubility

It feels like the SVC model has been trained for more than 10,000 steps. You know, SVC training takes so long, and ordinary consumer graphics cards have to be at least more than 20 hours...

太牛了,手动点赞

Demo的效果令人震惊,几乎与Ground Truth一致了。

请问DEMO链接在哪,为什么我找不到

Demo的效果令人震惊,几乎与Ground Truth一致了。

请问DEMO链接在哪,为什么我找不到

Me so kind so here is the link:
https://mega-tts.github.io/mega2_demo/

坐等开源

速速开源

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2307.07218 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2307.07218 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2307.07218 in a Space README.md to link it from this page.

Collections including this paper 2