Switti: Designing Scale-Wise Transformers for Text-to-Image Synthesis Paper • 2412.01819 • Published 12 days ago • 29
CogVLM2: Visual Language Models for Image and Video Understanding Paper • 2408.16500 • Published Aug 29 • 56
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling Paper • 2408.16532 • Published Aug 29 • 47
Seed-Music: A Unified Framework for High Quality and Controlled Music Generation Paper • 2409.09214 • Published Sep 13 • 47
Transformer Explainer: Interactive Learning of Text-Generative Models Paper • 2408.04619 • Published Aug 8 • 155
Audio Conditioning for Music Generation via Discrete Bottleneck Features Paper • 2407.12563 • Published Jul 17 • 5
PicoAudio: Enabling Precise Timestamp and Frequency Controllability of Audio Events in Text-to-audio Generation Paper • 2407.02869 • Published Jul 3 • 18
Audio Mamba: Bidirectional State Space Model for Audio Representation Learning Paper • 2406.03344 • Published Jun 5 • 18
Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tuning Paper • 2405.18386 • Published May 28 • 20
SoundCTM: Uniting Score-based and Consistency Models for Text-to-Sound Generation Paper • 2405.18503 • Published May 28 • 9
DITTO-2: Distilled Diffusion Inference-Time T-Optimization for Music Generation Paper • 2405.20289 • Published May 30 • 10
Naturalistic Music Decoding from EEG Data via Latent Diffusion Models Paper • 2405.09062 • Published May 15 • 9
Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization Paper • 2404.09956 • Published Apr 15 • 11