Robots Pre-train Robots: Manipulation-Centric Robotic Representation from Large-Scale Robot Dataset Paper • 2410.22325 • Published Oct 29 • 9
SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree Paper • 2410.16268 • Published Oct 21 • 65
PHI-S: Distribution Balancing for Label-Free Multi-Teacher Distillation Paper • 2410.01680 • Published Oct 2 • 32
MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans? Paper • 2408.13257 • Published Aug 23 • 25
Building and better understanding vision-language models: insights and future directions Paper • 2408.12637 • Published Aug 22 • 122
LongVILA: Scaling Long-Context Visual Language Models for Long Videos Paper • 2408.10188 • Published Aug 19 • 51