Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2404.13358

Music Consistency Models

Paper • 2404.13358 • Published Apr 20 • 12

Papers - Kunlun

about 1 month ago

Music Consistency Models

Paper • 2404.13358 • Published Apr 20 • 12

Papers - Audio - Classifier-Free Guidance (CFG)

about 1 month ago

Music Consistency Models

Paper • 2404.13358 • Published Apr 20 • 12

Papers - Audio - Clap

We use an ensemble filtering strategy based on two different CLAP models: 630k-audioset-best and 630k-best

about 1 month ago

Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization

Paper • 2404.09956 • Published Apr 15 • 10
Long-form music generation with latent diffusion

Paper • 2404.10301 • Published Apr 16 • 23
Music Consistency Models

Paper • 2404.13358 • Published Apr 20 • 12

Papers - Image - Frechet Inception Distance (FID)

https://machinelearningmastery.com/how-to-implement-the-frechet-inception-distance-fid-from-scratch/

Kandinsky: an Improved Text-to-Image Synthesis with Image Prior and Latent Diffusion

Paper • 2310.03502 • Published Oct 5, 2023 • 74
GLIGEN: Open-Set Grounded Text-to-Image Generation

Paper • 2301.07093 • Published Jan 17, 2023 • 3
Music Consistency Models

Paper • 2404.13358 • Published Apr 20 • 12
Align Your Steps: Optimizing Sampling Schedules in Diffusion Models

Paper • 2404.14507 • Published about 1 month ago • 21

Papers - Audio - Mel Spectogram

Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions

Paper • 1712.05884 • Published Dec 16, 2017 • 2
Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization

Paper • 2404.09956 • Published Apr 15 • 10
Music Consistency Models

Paper • 2404.13358 • Published Apr 20 • 12
FlashSpeech: Efficient Zero-Shot Speech Synthesis

Paper • 2404.14700 • Published about 1 month ago • 28

about 8 hours ago

UniAudio: An Audio Foundation Model Toward Universal Audio Generation

Paper • 2310.00704 • Published Oct 1, 2023 • 16
Structural Similarities Between Language Models and Neural Response Measurements

Paper • 2306.01930 • Published Jun 2, 2023 • 2
Streaming Transformer ASR with Blockwise Synchronous Beam Search

Paper • 2006.14941 • Published Jun 25, 2020 • 2
NU-GAN: High resolution neural upsampling with GAN

Paper • 2010.11362 • Published Oct 22, 2020 • 2

EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions

Paper • 2402.17485 • Published Feb 27 • 182
MusicHiFi: Fast High-Fidelity Stereo Vocoding

Paper • 2403.10493 • Published Mar 15 • 16
Music Consistency Models

Paper • 2404.13358 • Published Apr 20 • 12

about 1 month ago

MusicMagus: Zero-Shot Text-to-Music Editing via Diffusion Models

Paper • 2402.06178 • Published Feb 9 • 12
DITTO: Diffusion Inference-Time T-Optimization for Music Generation

Paper • 2401.12179 • Published Jan 22 • 18
Fast Timing-Conditioned Latent Audio Diffusion

Paper • 2402.04825 • Published Feb 7 • 7
Brain2Music: Reconstructing Music from Human Brain Activity

Paper • 2307.11078 • Published Jul 20, 2023 • 37

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs