Multi-Scale Sub-Band Constant-Q Transform Discriminator for High-Fidelity Vocoder Paper • 2311.14957 • Published Nov 25, 2023 • 2
AUDIT: Audio Editing by Following Instructions with Latent Diffusion Models Paper • 2304.00830 • Published Apr 3, 2023 • 2
Leveraging Content-based Features from Multiple Acoustic Models for Singing Voice Conversion Paper • 2310.11160 • Published Oct 17, 2023
CoAVT: A Cognition-Inspired Unified Audio-Visual-Text Pre-Training Model for Multimodal Processing Paper • 2401.12264 • Published Jan 22
SingVisio: Visual Analytics of Diffusion Model for Singing Voice Conversion Paper • 2402.12660 • Published Feb 20
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models Paper • 2403.03100 • Published Mar 5 • 34
SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words Paper • 2406.13340 • Published Jun 19
FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds Paper • 2407.01494 • Published Jul 1 • 13
PicoAudio: Enabling Precise Timestamp and Frequency Controllability of Audio Events in Text-to-audio Generation Paper • 2407.02869 • Published Jul 3 • 18
PicoAudio: Enabling Precise Timestamp and Frequency Controllability of Audio Events in Text-to-audio Generation Paper • 2407.02869 • Published Jul 3 • 18
Amphion: An Open-Source Audio, Music and Speech Generation Toolkit Paper • 2312.09911 • Published Dec 15, 2023 • 53 • 4