-
Large-scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation
Paper • 2211.06687 • Published • 3 -
EnCLAP: Combining Neural Audio Codec and Audio-Text Joint Embedding for Automated Audio Captioning
Paper • 2401.17690 • Published • 4 -
Amphion: An Open-Source Audio, Music and Speech Generation Toolkit
Paper • 2312.09911 • Published • 51 -
Audiobox: Unified Audio Generation with Natural Language Prompts
Paper • 2312.15821 • Published • 12
Collections
Discover the best community collections!
Collections including paper arxiv:2311.07919
-
Large-Scale Automatic Audiobook Creation
Paper • 2309.03926 • Published • 52 -
Improving Language Model-Based Zero-Shot Text-to-Speech Synthesis with Multi-Scale Acoustic Prompts
Paper • 2309.11977 • Published • 2 -
SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models
Paper • 2308.16692 • Published • 1 -
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining
Paper • 2308.05734 • Published • 33
-
Large-Scale Automatic Audiobook Creation
Paper • 2309.03926 • Published • 52 -
UniAudio: An Audio Foundation Model Toward Universal Audio Generation
Paper • 2310.00704 • Published • 16 -
Improving Language Model-Based Zero-Shot Text-to-Speech Synthesis with Multi-Scale Acoustic Prompts
Paper • 2309.11977 • Published • 2 -
SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models
Paper • 2308.16692 • Published • 1
-
Loop Copilot: Conducting AI Ensembles for Music Generation and Iterative Editing
Paper • 2310.12404 • Published • 13 -
MusicAgent: An AI Agent for Music Understanding and Generation with Large Language Models
Paper • 2310.11954 • Published • 24 -
A Survey of AI Music Generation Tools and Models
Paper • 2308.12982 • Published • 1 -
UniAudio: An Audio Foundation Model Toward Universal Audio Generation
Paper • 2310.00704 • Published • 16
-
Ensemble-Instruct: Generating Instruction-Tuning Data with a Heterogeneous Mixture of LMs
Paper • 2310.13961 • Published • 4 -
Fabricator: An Open Source Toolkit for Generating Labeled Training Data with Teacher LLMs
Paper • 2309.09582 • Published • 4 -
Auto-Instruct: Automatic Instruction Generation and Ranking for Black-Box Language Models
Paper • 2310.13127 • Published • 10 -
Evaluating the Robustness to Instructions of Large Language Models
Paper • 2308.14306 • Published • 1
-
Woodpecker: Hallucination Correction for Multimodal Large Language Models
Paper • 2310.16045 • Published • 13 -
HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models
Paper • 2310.14566 • Published • 23 -
SILC: Improving Vision Language Pretraining with Self-Distillation
Paper • 2310.13355 • Published • 5 -
Conditional Diffusion Distillation
Paper • 2310.01407 • Published • 19
-
Large Language Models as Optimizers
Paper • 2309.03409 • Published • 72 -
Natural Language Supervision for General-Purpose Audio Representations
Paper • 2309.05767 • Published • 7 -
Connecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers
Paper • 2309.08532 • Published • 50 -
AudioSR: Versatile Audio Super-resolution at Scale
Paper • 2309.07314 • Published • 23