Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters Paper • 2408.03314 • Published Aug 6, 2024 • 54
Switti: Designing Scale-Wise Transformers for Text-to-Image Synthesis Paper • 2412.01819 • Published Dec 2, 2024 • 34
CogVLM2: Visual Language Models for Image and Video Understanding Paper • 2408.16500 • Published Aug 29, 2024 • 57
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling Paper • 2408.16532 • Published Aug 29, 2024 • 48
Seed-Music: A Unified Framework for High Quality and Controlled Music Generation Paper • 2409.09214 • Published Sep 13, 2024 • 50
Transformer Explainer: Interactive Learning of Text-Generative Models Paper • 2408.04619 • Published Aug 8, 2024 • 156
Audio Conditioning for Music Generation via Discrete Bottleneck Features Paper • 2407.12563 • Published Jul 17, 2024 • 5
PicoAudio: Enabling Precise Timestamp and Frequency Controllability of Audio Events in Text-to-audio Generation Paper • 2407.02869 • Published Jul 3, 2024 • 18
Audio Mamba: Bidirectional State Space Model for Audio Representation Learning Paper • 2406.03344 • Published Jun 5, 2024 • 18
Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tuning Paper • 2405.18386 • Published May 28, 2024 • 20
SoundCTM: Uniting Score-based and Consistency Models for Text-to-Sound Generation Paper • 2405.18503 • Published May 28, 2024 • 9
DITTO-2: Distilled Diffusion Inference-Time T-Optimization for Music Generation Paper • 2405.20289 • Published May 30, 2024 • 11
Naturalistic Music Decoding from EEG Data via Latent Diffusion Models Paper • 2405.09062 • Published May 15, 2024 • 9
Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization Paper • 2404.09956 • Published Apr 15, 2024 • 11
MuPT: A Generative Symbolic Music Pretrained Transformer Paper • 2404.06393 • Published Apr 9, 2024 • 15