Learning Adaptive Parallel Reasoning with Language Models Paper โข 2504.15466 โข Published 7 days ago โข 42
Describe Anything: Detailed Localized Image and Video Captioning Paper โข 2504.16072 โข Published 7 days ago โข 55
Generate, but Verify: Reducing Hallucination in Vision-Language Models with Retrospective Resampling Paper โข 2504.13169 โข Published 12 days ago โข 39
Pre-training Auto-regressive Robotic Models with 4D Representations Paper โข 2502.13142 โข Published Feb 18 โข 5
QLASS: Boosting Language Agent Inference via Q-Guided Stepwise Search Paper โข 2502.02584 โข Published Feb 4 โข 17
Zero-Shot Novel View and Depth Synthesis with Multi-View Geometric Diffusion Paper โข 2501.18804 โข Published Jan 30 โข 5
MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion Paper โข 2410.03825 โข Published Oct 4, 2024 โข 19
CameraCtrl: Enabling Camera Control for Text-to-Video Generation Paper โข 2404.02101 โข Published Apr 2, 2024 โข 25
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers Paper โข 2402.19479 โข Published Feb 29, 2024 โข 35