Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2308.06873

Making Flow-Matching-Based Zero-Shot Text-to-Speech Laugh as You Like

Paper • 2402.07383 • Published Feb 12 • 13
Matcha-TTS: A fast TTS architecture with conditional flow matching

Paper • 2309.03199 • Published Sep 6, 2023 • 10
Natural language guidance of high-fidelity text-to-speech with synthetic annotations

Paper • 2402.01912 • Published Feb 2 • 11
Fast Timing-Conditioned Latent Audio Diffusion

Paper • 2402.04825 • Published Feb 7 • 7

Large-Scale Automatic Audiobook Creation

Paper • 2309.03926 • Published Sep 7, 2023 • 53
Improving Language Model-Based Zero-Shot Text-to-Speech Synthesis with Multi-Scale Acoustic Prompts

Paper • 2309.11977 • Published Sep 21, 2023 • 2
SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models

Paper • 2308.16692 • Published Aug 31, 2023 • 1
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining

Paper • 2308.05734 • Published Aug 10, 2023 • 36

Large-Scale Automatic Audiobook Creation

Paper • 2309.03926 • Published Sep 7, 2023 • 53
UniAudio: An Audio Foundation Model Toward Universal Audio Generation

Paper • 2310.00704 • Published Oct 1, 2023 • 19
Improving Language Model-Based Zero-Shot Text-to-Speech Synthesis with Multi-Scale Acoustic Prompts

Paper • 2309.11977 • Published Sep 21, 2023 • 2
SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models

Paper • 2308.16692 • Published Aug 31, 2023 • 1

OmnimatteRF: Robust Omnimatte with 3D Background Modeling

Paper • 2309.07749 • Published Sep 14, 2023 • 7
AudioSR: Versatile Audio Super-resolution at Scale

Paper • 2309.07314 • Published Sep 13, 2023 • 24
Generative Image Dynamics

Paper • 2309.07906 • Published Sep 14, 2023 • 52
MagiCapture: High-Resolution Multi-Concept Portrait Customization

Paper • 2309.06895 • Published Sep 13, 2023 • 27

LLaSM: Large Language and Speech Model

Paper • 2308.15930 • Published Aug 30, 2023 • 30
SpeechX: Neural Codec Language Model as a Versatile Speech Transformer

Paper • 2308.06873 • Published Aug 14, 2023 • 25
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining

Paper • 2308.05734 • Published Aug 10, 2023 • 36
JEN-1: Text-Guided Universal Music Generation with Omnidirectional Diffusion Models

Paper • 2308.04729 • Published Aug 9, 2023 • 31

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs