-
Movie Gen: A Cast of Media Foundation Models
Paper • 2410.13720 • Published • 89 -
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching
Paper • 2410.06885 • Published • 42 -
Flow Matching for Generative Modeling
Paper • 2210.02747 • Published • 1 -
Matcha-TTS: A fast TTS architecture with conditional flow matching
Paper • 2309.03199 • Published • 11
Collections
Discover the best community collections!
Collections including paper arxiv:2309.03199
-
BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data
Paper • 2402.08093 • Published • 57 -
E3 TTS: Easy End-to-End Diffusion-based Text to Speech
Paper • 2311.00945 • Published • 14 -
Matcha-TTS: A fast TTS architecture with conditional flow matching
Paper • 2309.03199 • Published • 11 -
Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions
Paper • 1712.05884 • Published • 2
-
Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions
Paper • 1712.05884 • Published • 2 -
VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild
Paper • 2403.16973 • Published • 2 -
High Fidelity Neural Audio Compression
Paper • 2210.13438 • Published • 4 -
RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis
Paper • 2404.03204 • Published • 7
-
Making Flow-Matching-Based Zero-Shot Text-to-Speech Laugh as You Like
Paper • 2402.07383 • Published • 13 -
Matcha-TTS: A fast TTS architecture with conditional flow matching
Paper • 2309.03199 • Published • 11 -
Natural language guidance of high-fidelity text-to-speech with synthetic annotations
Paper • 2402.01912 • Published • 11 -
Fast Timing-Conditioned Latent Audio Diffusion
Paper • 2402.04825 • Published • 7
-
Matcha-TTS: A fast TTS architecture with conditional flow matching
Paper • 2309.03199 • Published • 11 -
E3 TTS: Easy End-to-End Diffusion-based Text to Speech
Paper • 2311.00945 • Published • 14 -
Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling
Paper • 2311.00430 • Published • 57 -
coqui/XTTS-v2
Text-to-Speech • Updated • 1.7M • 2.12k