GR00T N1: An Open Foundation Model for Generalist Humanoid Robots Paper • 2503.14734 • Published Mar 18
Error Analyses of Auto-Regressive Video Diffusion Models: A Unified Framework Paper • 2503.10704 • Published Mar 12 • 5
VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding Paper • 2501.13106 • Published Jan 22 • 91
Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding Paper • 2311.16922 • Published Nov 28, 2023 • 1
MS-DETR: Natural Language Video Localization with Sampling Moment-Moment Interaction Paper • 2305.18969 • Published May 30, 2023