MovieLLM: Enhancing Long Video Understanding with AI-Generated Movies Paper • 2403.01422 • Published Mar 3 • 24
VisionGPT-3D: A Generalized Multimodal Agent for Enhanced 3D Vision Understanding Paper • 2403.09530 • Published Mar 14 • 8
VidToMe: Video Token Merging for Zero-Shot Video Editing Paper • 2312.10656 • Published Dec 17, 2023 • 9
DreamPolisher: Towards High-Quality Text-to-3D Generation via Geometric Diffusion Paper • 2403.17237 • Published Mar 25 • 8
Octree-GS: Towards Consistent Real-time Rendering with LOD-Structured 3D Gaussians Paper • 2403.17898 • Published Mar 26 • 12
MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with Interleaved Visual-Textual Tokens Paper • 2404.03413 • Published Apr 4 • 21
Phenaki: Variable Length Video Generation From Open Domain Textual Description Paper • 2210.02399 • Published Oct 5, 2022 • 3
Align Your Steps: Optimizing Sampling Schedules in Diffusion Models Paper • 2404.14507 • Published 17 days ago • 21