Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions Paper • 1712.05884 • Published Dec 16, 2017 • 2
VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild Paper • 2403.16973 • Published Mar 25 • 2
RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis Paper • 2404.03204 • Published Apr 4 • 7
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models Paper • 2311.07919 • Published Nov 14, 2023 • 9
Natural language guidance of high-fidelity text-to-speech with synthetic annotations Paper • 2402.01912 • Published Feb 2 • 11
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching Paper • 2410.06885 • Published Oct 9 • 42
Matcha-TTS: A fast TTS architecture with conditional flow matching Paper • 2309.03199 • Published Sep 6, 2023 • 11
SONAR: Sentence-Level Multimodal and Language-Agnostic Representations Paper • 2308.11466 • Published Aug 22, 2023 • 1