VideoCrafter1: Open Diffusion Models for High-Quality Video Generation Paper • 2310.19512 • Published Oct 30, 2023 • 15
VideoMamba: State Space Model for Efficient Video Understanding Paper • 2403.06977 • Published Mar 11 • 27
VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models Paper • 2401.09047 • Published Jan 17 • 13
DragAnything: Motion Control for Anything using Entity Representation Paper • 2403.07420 • Published Mar 12 • 13
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation Paper • 2201.12086 • Published Jan 28, 2022 • 3
VisionGPT-3D: A Generalized Multimodal Agent for Enhanced 3D Vision Understanding Paper • 2403.09530 • Published Mar 14 • 8
Generic 3D Diffusion Adapter Using Controlled Multi-View Editing Paper • 2403.12032 • Published Mar 18 • 14
Vid2Robot: End-to-end Video-conditioned Policy Learning with Cross-Attention Transformers Paper • 2403.12943 • Published Mar 19 • 14
FRESCO: Spatial-Temporal Correspondence for Zero-Shot Video Translation Paper • 2403.12962 • Published Mar 19 • 7
Efficient Video Diffusion Models via Content-Frame Motion-Latent Decomposition Paper • 2403.14148 • Published Mar 21 • 18
VidToMe: Video Token Merging for Zero-Shot Video Editing Paper • 2312.10656 • Published Dec 17, 2023 • 10
StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text Paper • 2403.14773 • Published Mar 21 • 10
Improving Automatic VQA Evaluation Using Large Language Models Paper • 2310.02567 • Published Oct 4, 2023 • 3
Octree-GS: Towards Consistent Real-time Rendering with LOD-Structured 3D Gaussians Paper • 2403.17898 • Published Mar 26 • 14
Lumiere: A Space-Time Diffusion Model for Video Generation Paper • 2401.12945 • Published Jan 23 • 86
Garment3DGen: 3D Garment Stylization and Texture Generation Paper • 2403.18816 • Published Mar 27 • 21
Zero-shot Prompt-based Video Encoder for Surgical Gesture Recognition Paper • 2403.19786 • Published Mar 28 • 2
CameraCtrl: Enabling Camera Control for Text-to-Video Generation Paper • 2404.02101 • Published Apr 2 • 22
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction Paper • 2404.02905 • Published Apr 3 • 65
MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with Interleaved Visual-Textual Tokens Paper • 2404.03413 • Published Apr 4 • 25
Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model Paper • 2404.09967 • Published Apr 15 • 20
Align Your Steps: Optimizing Sampling Schedules in Diffusion Models Paper • 2404.14507 • Published Apr 22 • 21
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning Paper • 2404.16994 • Published Apr 25 • 35
StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation Paper • 2405.01434 • Published May 2 • 52
Lighting Every Darkness with 3DGS: Fast Training and Real-Time Rendering for HDR View Synthesis Paper • 2406.06216 • Published Jun 10 • 19
EvTexture: Event-driven Texture Enhancement for Video Super-Resolution Paper • 2406.13457 • Published Jun 19 • 16
Adaptive Caching for Faster Video Generation with Diffusion Transformers Paper • 2411.02397 • Published Nov 4 • 23
CAT4D: Create Anything in 4D with Multi-View Video Diffusion Models Paper • 2411.18613 • Published 24 days ago • 50
Apollo: An Exploration of Video Understanding in Large Multimodal Models Paper • 2412.10360 • Published 8 days ago • 129