Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2403.03100

Large-Scale Automatic Audiobook Creation

Paper • 2309.03926 • Published Sep 7, 2023 • 52
BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data

Paper • 2402.08093 • Published Feb 12 • 52
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models

Paper • 2403.03100 • Published Mar 5 • 32
VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers

Paper • 2406.05370 • Published 28 days ago • 12

WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens

Paper • 2401.09985 • Published Jan 18 • 13
CustomVideo: Customizing Text-to-Video Generation with Multiple Subjects

Paper • 2401.09962 • Published Jan 18 • 6
Inflation with Diffusion: Efficient Temporal Adaptation for Text-to-Video Super-Resolution

Paper • 2401.10404 • Published Jan 18 • 9
ActAnywhere: Subject-Aware Video Background Generation

Paper • 2401.10822 • Published Jan 19 • 13

paper to review

VideoSwap: Customized Video Subject Swapping with Interactive Semantic Point Correspondence

Paper • 2312.02087 • Published Dec 4, 2023 • 19
FaceStudio: Put Your Face Everywhere in Seconds

Paper • 2312.02663 • Published Dec 5, 2023 • 28
Orthogonal Adaptation for Modular Customization of Diffusion Models

Paper • 2312.02432 • Published Dec 5, 2023 • 12
ReconFusion: 3D Reconstruction with Diffusion Priors

Paper • 2312.02981 • Published Dec 5, 2023 • 8

Papers that involves geometric computer vision using kornia

A survey on Kornia: an Open Source Differentiable Computer Vision Library for PyTorch

Paper • 2009.10521 • Published Sep 21, 2020 • 1
Kornia: an Open Source Differentiable Computer Vision Library for PyTorch

Paper • 1910.02190 • Published Oct 5, 2019 • 1
Learning Symmetrization for Equivariance with Orbit Distance Minimization

Paper • 2311.07143 • Published Nov 13, 2023 • 1
GS-SLAM: Dense Visual SLAM with 3D Gaussian Splatting

Paper • 2311.11700 • Published Nov 20, 2023 • 3

Large Language Models as Optimizers

Paper • 2309.03409 • Published Sep 7, 2023 • 72
Natural Language Supervision for General-Purpose Audio Representations

Paper • 2309.05767 • Published Sep 11, 2023 • 7
Connecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers

Paper • 2309.08532 • Published Sep 15, 2023 • 50
AudioSR: Versatile Audio Super-resolution at Scale

Paper • 2309.07314 • Published Sep 13, 2023 • 23

Previous
1
2
Next

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs