ViLBench: A Suite for Vision-Language Process Reward Modeling Paper • 2503.20271 • Published 1 day ago • 2
AccVideo: Accelerating Video Diffusion Model with Synthetic Dataset Paper • 2503.19462 • Published 2 days ago • 4
GenHancer: Imperfect Generative Models are Secretly Strong Vision-Centric Enhancers Paper • 2503.19480 • Published 2 days ago • 11
LEGO-Puzzles: How Good Are MLLMs at Multi-Step Spatial Reasoning? Paper • 2503.19990 • Published 1 day ago • 22
LogQuant: Log-Distributed 2-Bit Quantization of KV Cache with Superior Accuracy Preservation Paper • 2503.19950 • Published 1 day ago • 4
Attention IoU: Examining Biases in CelebA using Attention Maps Paper • 2503.19846 • Published 1 day ago • 2
MCTS-RAG: Enhancing Retrieval-Augmented Generation with Monte Carlo Tree Search Paper • 2503.20757 • Published about 16 hours ago • 6
Dita: Scaling Diffusion Transformer for Generalist Vision-Language-Action Policy Paper • 2503.19757 • Published 2 days ago • 32
Beyond Words: Advancing Long-Text Image Generation via Multimodal Autoregressive Models Paper • 2503.20198 • Published 1 day ago • 2
Wan: Open and Advanced Large-Scale Video Generative Models Paper • 2503.20314 • Published 1 day ago • 14
Open Deep Search: Democratizing Search with Open-source Reasoning Agents Paper • 2503.20201 • Published 1 day ago • 12
Unconditional Priors Matter! Improving Conditional Generation of Fine-Tuned Diffusion Models Paper • 2503.20240 • Published 1 day ago • 17
Strong Baseline: Multi-UAV Tracking via YOLOv12 with BoT-SORT-ReID Paper • 2503.17237 • Published 6 days ago • 4
MDocAgent: A Multi-Modal Multi-Agent Framework for Document Understanding Paper • 2503.13964 • Published 9 days ago • 16
Exploring Hallucination of Large Multimodal Models in Video Understanding: Benchmark, Analysis and Mitigation Paper • 2503.19622 • Published 2 days ago • 27
Latent Space Super-Resolution for Higher-Resolution Image Generation with Diffusion Models Paper • 2503.18446 • Published 3 days ago • 7
ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning Paper • 2503.19470 • Published 2 days ago • 11