Collections
Discover the best community collections!
Collections including paper arxiv:2404.13358
-
Kandinsky: an Improved Text-to-Image Synthesis with Image Prior and Latent Diffusion
Paper • 2310.03502 • Published • 74 -
GLIGEN: Open-Set Grounded Text-to-Image Generation
Paper • 2301.07093 • Published • 3 -
Music Consistency Models
Paper • 2404.13358 • Published • 12 -
Align Your Steps: Optimizing Sampling Schedules in Diffusion Models
Paper • 2404.14507 • Published • 21
-
Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions
Paper • 1712.05884 • Published • 2 -
Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization
Paper • 2404.09956 • Published • 10 -
Music Consistency Models
Paper • 2404.13358 • Published • 12 -
FlashSpeech: Efficient Zero-Shot Speech Synthesis
Paper • 2404.14700 • Published • 28
-
UniAudio: An Audio Foundation Model Toward Universal Audio Generation
Paper • 2310.00704 • Published • 16 -
Structural Similarities Between Language Models and Neural Response Measurements
Paper • 2306.01930 • Published • 2 -
Streaming Transformer ASR with Blockwise Synchronous Beam Search
Paper • 2006.14941 • Published • 2 -
NU-GAN: High resolution neural upsampling with GAN
Paper • 2010.11362 • Published • 2
-
EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions
Paper • 2402.17485 • Published • 182 -
MusicHiFi: Fast High-Fidelity Stereo Vocoding
Paper • 2403.10493 • Published • 16 -
Music Consistency Models
Paper • 2404.13358 • Published • 12
-
MusicMagus: Zero-Shot Text-to-Music Editing via Diffusion Models
Paper • 2402.06178 • Published • 12 -
DITTO: Diffusion Inference-Time T-Optimization for Music Generation
Paper • 2401.12179 • Published • 18 -
Fast Timing-Conditioned Latent Audio Diffusion
Paper • 2402.04825 • Published • 7 -
Brain2Music: Reconstructing Music from Human Brain Activity
Paper • 2307.11078 • Published • 37