FORT-Searcher: Synthesizing Shortcut-Resistant Search Tasks for Training Deep Search Agents Paper • 2606.12087 • Published 5 days ago • 71
From Correctness to Utility: Gain-Based Prefix Evaluation for LLM Reasoning Paper • 2606.07190 • Published 10 days ago • 34
HakushoBench: A Japanese Chart and Table VQA Benchmark from Governmental White Papers Paper • 2606.01132 • Published 15 days ago • 6
Search and Refine During Think: Autonomous Retrieval-Augmented Reasoning of LLMs Paper • 2505.11277 • Published May 16, 2025 • 29