VisualClaw: A Real-Time, Personalized Agent for the Physical World Paper • 2606.16295 • Published 3 days ago • 22
Your Agent, Their Asset: A Real-World Safety Analysis of OpenClaw Paper • 2604.04759 • Published Apr 6 • 24
MM-Zero: Self-Evolving Multi-Model Vision Language Models From Zero Data Paper • 2603.09206 • Published Mar 10 • 54
Towards Multimodal Lifelong Understanding: A Dataset and Agentic Baseline Paper • 2603.05484 • Published Mar 5 • 4
Nemotron-H: A Family of Accurate and Efficient Hybrid Mamba-Transformer Models Paper • 2504.03624 • Published Apr 4, 2025 • 19
ColorBench: Can VLMs See and Understand the Colorful World? A Comprehensive Benchmark for Color Perception, Reasoning, and Robustness Paper • 2504.10514 • Published Apr 10, 2025 • 48
A Survey of State of the Art Large Vision Language Models: Alignment, Benchmark, Evaluations and Challenges Paper • 2501.02189 • Published Jan 4, 2025 • 1
Self-Rewarding Vision-Language Model via Reasoning Decomposition Paper • 2508.19652 • Published Aug 27, 2025 • 85
First Frame Is the Place to Go for Video Content Customization Paper • 2511.15700 • Published Nov 19, 2025 • 54
OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning Paper • 2505.04601 • Published May 7, 2025 • 29
AHELM: A Holistic Evaluation of Audio-Language Models Paper • 2508.21376 • Published Aug 29, 2025 • 9
Bridging Perspectives: A Survey on Cross-view Collaborative Intelligence with Egocentric-Exocentric Vision Paper • 2506.06253 • Published Jun 6, 2025 • 9
AV-Reasoner: Improving and Benchmarking Clue-Grounded Audio-Visual Counting for MLLMs Paper • 2506.05328 • Published Jun 5, 2025 • 21
Libra-Leaderboard: Towards Responsible AI through a Balanced Leaderboard of Safety and Capability Paper • 2412.18551 • Published Dec 24, 2024
Language Models Can See Better: Visual Contrastive Decoding For LLM Multimodal Reasoning Paper • 2502.11751 • Published Feb 17, 2025
STAR-1: Safer Alignment of Reasoning LLMs with 1K Data Paper • 2504.01903 • Published Apr 2, 2025 • 1
SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models Paper • 2504.11468 • Published Apr 10, 2025 • 30
Nemotron-H: A Family of Accurate and Efficient Hybrid Mamba-Transformer Models Paper • 2504.03624 • Published Apr 4, 2025 • 19