WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens Paper • 2401.09985 • Published Jan 18 • 13
CustomVideo: Customizing Text-to-Video Generation with Multiple Subjects Paper • 2401.09962 • Published Jan 18 • 6
Inflation with Diffusion: Efficient Temporal Adaptation for Text-to-Video Super-Resolution Paper • 2401.10404 • Published Jan 18 • 9
Lumiere: A Space-Time Diffusion Model for Video Generation Paper • 2401.12945 • Published Jan 23 • 85
AnimateLCM: Accelerating the Animation of Personalized Diffusion Models and Adapters with Decoupled Consistency Learning Paper • 2402.00769 • Published Feb 1 • 19
VideoPrism: A Foundational Visual Encoder for Video Understanding Paper • 2402.13217 • Published Feb 20 • 19
Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis Paper • 2402.14797 • Published Feb 22 • 19
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models Paper • 2402.17177 • Published Feb 27 • 88
Sora Generates Videos with Stunning Geometrical Consistency Paper • 2402.17403 • Published Feb 27 • 15
Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners Paper • 2402.17723 • Published Feb 27 • 16
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers Paper • 2402.19479 • Published Feb 29 • 30
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models Paper • 2403.03100 • Published Mar 5 • 32
Tuning-Free Noise Rectification for High Fidelity Image-to-Video Generation Paper • 2403.02827 • Published Mar 5 • 5
Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding Paper • 2403.09626 • Published Mar 14 • 12
SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations Paper • 2108.01073 • Published Aug 2, 2021 • 7
Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization Paper • 2404.09956 • Published Apr 15 • 11
MotionMaster: Training-free Camera Motion Transfer For Video Generation Paper • 2404.15789 • Published Apr 24 • 10
LLM-AD: Large Language Model based Audio Description System Paper • 2405.00983 • Published May 2 • 13
FIFO-Diffusion: Generating Infinite Videos from Text without Training Paper • 2405.11473 • Published May 19 • 53
Visual Echoes: A Simple Unified Transformer for Audio-Visual Generation Paper • 2405.14598 • Published May 23 • 11
Denoising LM: Pushing the Limits of Error Correction Models for Speech Recognition Paper • 2405.15216 • Published May 24 • 11
I2VEdit: First-Frame-Guided Video Editing via Image-to-Video Diffusion Models Paper • 2405.16537 • Published May 26 • 15
Looking Backward: Streaming Video-to-Video Translation with Feature Banks Paper • 2405.15757 • Published May 24 • 14
Human4DiT: Free-view Human Video Generation with 4D Diffusion Transformer Paper • 2405.17405 • Published May 27 • 14
Collaborative Video Diffusion: Consistent Multi-video Generation with Camera Control Paper • 2405.17414 • Published May 27 • 10
Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tuning Paper • 2405.18386 • Published May 28 • 17
T2V-Turbo: Breaking the Quality Bottleneck of Video Consistency Model with Mixed Reward Feedback Paper • 2405.18750 • Published May 29 • 20
EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture Paper • 2405.18991 • Published May 29 • 12
MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model Paper • 2405.20222 • Published May 30 • 10
DeMamba: AI-Generated Video Detection on Million-Scale GenVideo Benchmark Paper • 2405.19707 • Published May 30 • 4
Learning Temporally Consistent Video Depth from Video Diffusion Priors Paper • 2406.01493 • Published 30 days ago • 17
ZeroSmooth: Training-free Diffuser Adaptation for High Frame Rate Video Generation Paper • 2406.00908 • Published about 1 month ago • 11
Searching Priors Makes Text-to-Video Synthesis Better Paper • 2406.03215 • Published 28 days ago • 11
ShareGPT4Video: Improving Video Understanding and Generation with Better Captions Paper • 2406.04325 • Published 26 days ago • 69
VideoTetris: Towards Compositional Text-to-Video Generation Paper • 2406.04277 • Published 27 days ago • 21
MotionClone: Training-Free Motion Cloning for Controllable Video Generation Paper • 2406.05338 • Published 25 days ago • 39
NaRCan: Natural Refined Canonical Image with Integration of Diffusion Prior for Video Editing Paper • 2406.06523 • Published 22 days ago • 48
Hierarchical Patch Diffusion Models for High-Resolution Video Generation Paper • 2406.07792 • Published 21 days ago • 13
AV-DiT: Efficient Audio-Visual Diffusion Transformer for Joint Audio and Video Generation Paper • 2406.07686 • Published 21 days ago • 13
TC-Bench: Benchmarking Temporal Compositionality in Text-to-Video and Image-to-Video Generation Paper • 2406.08656 • Published 20 days ago • 7
Rethinking Human Evaluation Protocol for Text-to-Video Models: Enhancing Reliability,Reproducibility, and Practicality Paper • 2406.08845 • Published 20 days ago • 8
ExVideo: Extending Video Diffusion Models via Parameter-Efficient Post-Tuning Paper • 2406.14130 • Published 13 days ago • 10
MantisScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation Paper • 2406.15252 • Published 12 days ago • 13
DiffIR2VR-Zero: Zero-Shot Video Restoration with Diffusion-based Image Restoration Models Paper • 2407.01519 • Published 1 day ago • 18
SVG: 3D Stereoscopic Video Generation via Denoising Frame Matrix Paper • 2407.00367 • Published 4 days ago • 5