SoundCTM: Uniting Score-based and Consistency Models for Text-to-Sound Generation Paper • 2405.18503 • Published May 28 • 9
DITTO-2: Distilled Diffusion Inference-Time T-Optimization for Music Generation Paper • 2405.20289 • Published about 1 month ago • 9
LiveSpeech: Low-Latency Zero-shot Text-to-Speech via Autoregressive Modeling of Audio Discrete Codes Paper • 2406.02897 • Published 25 days ago • 12
Audio Mamba: Bidirectional State Space Model for Audio Representation Learning Paper • 2406.03344 • Published 24 days ago • 15
GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities Paper • 2406.11768 • Published 12 days ago • 19