EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Video Generation Paper • 2411.08380 • Published Nov 13 • 25
Large Language Models Can Self-Improve in Long-context Reasoning Paper • 2411.08147 • Published Nov 12 • 62
Inf-CL Collection The corresponding demos/checkpoints/papers/datasets of Inf-CL. • 2 items • Updated Oct 25 • 3
VideoLLaMA 2 Collection Optimized VideoLLaMA with improved spatial-temporal modeling and better audio understanding capability • 13 items • Updated 17 days ago • 20
Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss Paper • 2410.17243 • Published Oct 22 • 89
The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio Paper • 2410.12787 • Published Oct 16 • 30