UniAudio: An Audio Foundation Model Toward Universal Audio Generation Paper • 2310.00704 • Published Oct 1, 2023 • 16
Structural Similarities Between Language Models and Neural Response Measurements Paper • 2306.01930 • Published Jun 2, 2023 • 2
Streaming Transformer ASR with Blockwise Synchronous Beam Search Paper • 2006.14941 • Published Jun 25, 2020 • 2
A Multimodal Approach to Device-Directed Speech Detection with Large Language Models Paper • 2403.14438 • Published Mar 21 • 2
Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions Paper • 1712.05884 • Published Dec 16, 2017 • 2
VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild Paper • 2403.16973 • Published Mar 25 • 2
Masked Audio Generation using a Single Non-Autoregressive Transformer Paper • 2401.04577 • Published Jan 9 • 37
WavLLM: Towards Robust and Adaptive Speech Large Language Model Paper • 2404.00656 • Published Mar 31 • 5
RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis Paper • 2404.03204 • Published Apr 4 • 7
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models Paper • 2311.07919 • Published Nov 14, 2023 • 7
Custom Data Augmentation for low resource ASR using Bark and Retrieval-Based Voice Conversion Paper • 2311.14836 • Published Nov 24, 2023 • 2
MuPT: A Generative Symbolic Music Pretrained Transformer Paper • 2404.06393 • Published 30 days ago • 14
Audio Dialogues: Dialogues dataset for audio and music understanding Paper • 2404.07616 • Published 28 days ago • 14
Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization Paper • 2404.09956 • Published 24 days ago • 10
SemantiCodec: An Ultra Low Bitrate Semantic Audio Codec for General Sound Paper • 2405.00233 • Published 8 days ago • 10
LLM-AD: Large Language Model based Audio Description System Paper • 2405.00983 • Published 7 days ago • 10