-
BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data
Paper • 2402.08093 • Published • 57 -
E3 TTS: Easy End-to-End Diffusion-based Text to Speech
Paper • 2311.00945 • Published • 14 -
Matcha-TTS: A fast TTS architecture with conditional flow matching
Paper • 2309.03199 • Published • 11 -
Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions
Paper • 1712.05884 • Published • 2
Collections
Discover the best community collections!
Collections including paper arxiv:2311.00945
-
Random Field Augmentations for Self-Supervised Representation Learning
Paper • 2311.03629 • Published • 6 -
Levels of AGI: Operationalizing Progress on the Path to AGI
Paper • 2311.02462 • Published • 34 -
Idempotent Generative Network
Paper • 2311.01462 • Published • 24 -
E3 TTS: Easy End-to-End Diffusion-based Text to Speech
Paper • 2311.00945 • Published • 14
-
Matryoshka Diffusion Models
Paper • 2310.15111 • Published • 41 -
Data Filtering Networks
Paper • 2309.17425 • Published • 6 -
FlashDecoding++: Faster Large Language Model Inference on GPUs
Paper • 2311.01282 • Published • 35 -
E3 TTS: Easy End-to-End Diffusion-based Text to Speech
Paper • 2311.00945 • Published • 14
-
Chain-of-Verification Reduces Hallucination in Large Language Models
Paper • 2309.11495 • Published • 38 -
Adapting Large Language Models via Reading Comprehension
Paper • 2309.09530 • Published • 77 -
CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages
Paper • 2309.09400 • Published • 84 -
Language Modeling Is Compression
Paper • 2309.10668 • Published • 83
-
InstructDiffusion: A Generalist Modeling Interface for Vision Tasks
Paper • 2309.03895 • Published • 13 -
ConceptGraphs: Open-Vocabulary 3D Scene Graphs for Perception and Planning
Paper • 2309.16650 • Published • 10 -
CCEdit: Creative and Controllable Video Editing via Diffusion Models
Paper • 2309.16496 • Published • 9 -
FreeNoise: Tuning-Free Longer Video Diffusion Via Noise Rescheduling
Paper • 2310.15169 • Published • 9
-
Large-Scale Automatic Audiobook Creation
Paper • 2309.03926 • Published • 54 -
FoleyGen: Visually-Guided Audio Generation
Paper • 2309.10537 • Published • 8 -
MusicAgent: An AI Agent for Music Understanding and Generation with Large Language Models
Paper • 2310.11954 • Published • 25 -
UniAudio: An Audio Foundation Model Toward Universal Audio Generation
Paper • 2310.00704 • Published • 21