VIMI: Grounding Video Generation through Multi-modal Instruction Paper • 2407.06304 • Published 14 days ago • 8
Graph-Based Captioning: Enhancing Visual Descriptions by Interconnecting Region Captions Paper • 2407.06723 • Published 13 days ago • 9
MotionLLM: Understanding Human Behaviors from Human Motions and Videos Paper • 2405.20340 • Published May 30 • 19