Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding Paper • 2403.09626 • Published Mar 14 • 11
VideoAgent: Long-form Video Understanding with Large Language Model as Agent Paper • 2403.10517 • Published Mar 15 • 28
VSTAR: Generative Temporal Nursing for Longer Dynamic Video Synthesis Paper • 2403.13501 • Published Mar 20 • 9
MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with Interleaved Visual-Textual Tokens Paper • 2404.03413 • Published Apr 4 • 21