-
Large-Scale Automatic Audiobook Creation
Paper • 2309.03926 • Published • 52 -
BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data
Paper • 2402.08093 • Published • 52 -
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models
Paper • 2403.03100 • Published • 32 -
VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers
Paper • 2406.05370 • Published • 12
Collections
Discover the best community collections!
Collections including paper arxiv:2403.03100
-
WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens
Paper • 2401.09985 • Published • 13 -
CustomVideo: Customizing Text-to-Video Generation with Multiple Subjects
Paper • 2401.09962 • Published • 6 -
Inflation with Diffusion: Efficient Temporal Adaptation for Text-to-Video Super-Resolution
Paper • 2401.10404 • Published • 9 -
ActAnywhere: Subject-Aware Video Background Generation
Paper • 2401.10822 • Published • 13
-
VideoSwap: Customized Video Subject Swapping with Interactive Semantic Point Correspondence
Paper • 2312.02087 • Published • 19 -
FaceStudio: Put Your Face Everywhere in Seconds
Paper • 2312.02663 • Published • 28 -
Orthogonal Adaptation for Modular Customization of Diffusion Models
Paper • 2312.02432 • Published • 12 -
ReconFusion: 3D Reconstruction with Diffusion Priors
Paper • 2312.02981 • Published • 8
-
A survey on Kornia: an Open Source Differentiable Computer Vision Library for PyTorch
Paper • 2009.10521 • Published • 1 -
Kornia: an Open Source Differentiable Computer Vision Library for PyTorch
Paper • 1910.02190 • Published • 1 -
Learning Symmetrization for Equivariance with Orbit Distance Minimization
Paper • 2311.07143 • Published • 1 -
GS-SLAM: Dense Visual SLAM with 3D Gaussian Splatting
Paper • 2311.11700 • Published • 3
-
Large Language Models as Optimizers
Paper • 2309.03409 • Published • 72 -
Natural Language Supervision for General-Purpose Audio Representations
Paper • 2309.05767 • Published • 7 -
Connecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers
Paper • 2309.08532 • Published • 50 -
AudioSR: Versatile Audio Super-resolution at Scale
Paper • 2309.07314 • Published • 23