WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens Paper • 2401.09985 • Published Jan 18 • 12
CustomVideo: Customizing Text-to-Video Generation with Multiple Subjects Paper • 2401.09962 • Published Jan 18 • 6
Inflation with Diffusion: Efficient Temporal Adaptation for Text-to-Video Super-Resolution Paper • 2401.10404 • Published Jan 18 • 8
Lumiere: A Space-Time Diffusion Model for Video Generation Paper • 2401.12945 • Published Jan 23 • 82
AnimateLCM: Accelerating the Animation of Personalized Diffusion Models and Adapters with Decoupled Consistency Learning Paper • 2402.00769 • Published Feb 1 • 17
VideoPrism: A Foundational Visual Encoder for Video Understanding Paper • 2402.13217 • Published Feb 20 • 18
Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis Paper • 2402.14797 • Published Feb 22 • 18
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models Paper • 2402.17177 • Published Feb 27 • 87
Sora Generates Videos with Stunning Geometrical Consistency Paper • 2402.17403 • Published Feb 27 • 14
Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners Paper • 2402.17723 • Published Feb 27 • 15
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers Paper • 2402.19479 • Published Feb 29 • 30
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models Paper • 2403.03100 • Published Mar 5 • 32
Tuning-Free Noise Rectification for High Fidelity Image-to-Video Generation Paper • 2403.02827 • Published Mar 5 • 5
Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding Paper • 2403.09626 • Published Mar 14 • 11
SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations Paper • 2108.01073 • Published Aug 2, 2021 • 6
Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization Paper • 2404.09956 • Published 27 days ago • 10
MotionMaster: Training-free Camera Motion Transfer For Video Generation Paper • 2404.15789 • Published 18 days ago • 10
LLM-AD: Large Language Model based Audio Description System Paper • 2405.00983 • Published 11 days ago • 13