Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2311.07919

ModaVerse: Efficiently Transforming Modalities with LLMs

Paper • 2401.06395 • Published Jan 12 • 3
Boosting Large Language Model for Speech Synthesis: An Empirical Study

Paper • 2401.00246 • Published Dec 30, 2023 • 7
An Integration of Pre-Trained Speech and Language Models for End-to-End Speech Recognition

Paper • 2312.03668 • Published Dec 6, 2023 • 1
Towards General-Purpose Speech Abilities for Large Language Models Using Unpaired Data

Paper • 2311.06753 • Published Nov 12, 2023 • 5

Papers - Audio - Captions

Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models

Paper • 2311.07919 • Published Nov 14, 2023 • 7
Audio Dialogues: Dialogues dataset for audio and music understanding

Paper • 2404.07616 • Published Apr 11 • 14

Papers - Audio - Understanding

Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models

Paper • 2311.07919 • Published Nov 14, 2023 • 7

Papers - Audio - Text to Speech

RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis

Paper • 2404.03204 • Published Apr 4 • 7
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models

Paper • 2311.07919 • Published Nov 14, 2023 • 7
FlashSpeech: Efficient Zero-Shot Speech Synthesis

Paper • 2404.14700 • Published Apr 23 • 29

Papers - Audio - TTS

Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions

Paper • 1712.05884 • Published Dec 16, 2017 • 2
VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild

Paper • 2403.16973 • Published Mar 25 • 2
High Fidelity Neural Audio Compression

Paper • 2210.13438 • Published Oct 24, 2022 • 3
RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis

Paper • 2404.03204 • Published Apr 4 • 7

Qwen Technical Report

Paper • 2309.16609 • Published Sep 28, 2023 • 30
Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities

Paper • 2308.12966 • Published Aug 24, 2023 • 6
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models

Paper • 2311.07919 • Published Nov 14, 2023 • 7
Running

180

📷🎨👀

Qwen-VL-Plus

UniAudio: An Audio Foundation Model Toward Universal Audio Generation

Paper • 2310.00704 • Published Oct 1, 2023 • 16
Structural Similarities Between Language Models and Neural Response Measurements

Paper • 2306.01930 • Published Jun 2, 2023 • 2
Streaming Transformer ASR with Blockwise Synchronous Beam Search

Paper • 2006.14941 • Published Jun 25, 2020 • 2
NU-GAN: High resolution neural upsampling with GAN

Paper • 2010.11362 • Published Oct 22, 2020 • 2

Models - Audio - Translation

Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models

Paper • 2311.07919 • Published Nov 14, 2023 • 7

Papers - Synthetic Data

DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows

Paper • 2402.10379 • Published Feb 16 • 27
Using Simulation and Domain Adaptation to Improve Efficiency of Deep Robotic Grasping

Paper • 1709.07857 • Published Sep 22, 2017 • 2
Simple synthetic data reduces sycophancy in large language models

Paper • 2308.03958 • Published Aug 7, 2023 • 20
Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities

Paper • 2308.12966 • Published Aug 24, 2023 • 6

Multimodal Papers

From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations

Paper • 2401.01885 • Published Jan 3 • 26
Media2Face: Co-speech Facial Animation Generation With Multi-Modality Guidance

Paper • 2401.15687 • Published Jan 28 • 19
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action

Paper • 2312.17172 • Published Dec 28, 2023 • 24
MouSi: Poly-Visual-Expert Vision-Language Models

Paper • 2401.17221 • Published Jan 30 • 6

Previous
1
2
Next

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs