-
ModaVerse: Efficiently Transforming Modalities with LLMs
Paper • 2401.06395 • Published • 3 -
Boosting Large Language Model for Speech Synthesis: An Empirical Study
Paper • 2401.00246 • Published • 7 -
An Integration of Pre-Trained Speech and Language Models for End-to-End Speech Recognition
Paper • 2312.03668 • Published • 1 -
Towards General-Purpose Speech Abilities for Large Language Models Using Unpaired Data
Paper • 2311.06753 • Published • 5
Collections
Discover the best community collections!
Collections including paper arxiv:2311.07919
-
RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis
Paper • 2404.03204 • Published • 7 -
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models
Paper • 2311.07919 • Published • 7 -
FlashSpeech: Efficient Zero-Shot Speech Synthesis
Paper • 2404.14700 • Published • 29
-
Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions
Paper • 1712.05884 • Published • 2 -
VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild
Paper • 2403.16973 • Published • 2 -
High Fidelity Neural Audio Compression
Paper • 2210.13438 • Published • 3 -
RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis
Paper • 2404.03204 • Published • 7
-
Qwen Technical Report
Paper • 2309.16609 • Published • 30 -
Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities
Paper • 2308.12966 • Published • 6 -
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models
Paper • 2311.07919 • Published • 7 -
180📷🎨👀
Qwen-VL-Plus
-
UniAudio: An Audio Foundation Model Toward Universal Audio Generation
Paper • 2310.00704 • Published • 16 -
Structural Similarities Between Language Models and Neural Response Measurements
Paper • 2306.01930 • Published • 2 -
Streaming Transformer ASR with Blockwise Synchronous Beam Search
Paper • 2006.14941 • Published • 2 -
NU-GAN: High resolution neural upsampling with GAN
Paper • 2010.11362 • Published • 2
-
DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows
Paper • 2402.10379 • Published • 27 -
Using Simulation and Domain Adaptation to Improve Efficiency of Deep Robotic Grasping
Paper • 1709.07857 • Published • 2 -
Simple synthetic data reduces sycophancy in large language models
Paper • 2308.03958 • Published • 20 -
Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities
Paper • 2308.12966 • Published • 6
-
From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations
Paper • 2401.01885 • Published • 26 -
Media2Face: Co-speech Facial Animation Generation With Multi-Modality Guidance
Paper • 2401.15687 • Published • 19 -
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action
Paper • 2312.17172 • Published • 24 -
MouSi: Poly-Visual-Expert Vision-Language Models
Paper • 2401.17221 • Published • 6