BenchMAX: A Comprehensive Multilingual Evaluation Suite for Large Language Models Paper • 2502.07346 • Published 27 days ago • 51
CoSER: Coordinating LLM-Based Persona Simulation of Established Roles Paper • 2502.09082 • Published 25 days ago • 27
Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training Paper • 2501.11425 • Published Jan 20 • 92
VidEgoThink: Assessing Egocentric Video Understanding Capabilities for Embodied AI Paper • 2410.11623 • Published Oct 15, 2024 • 48
Revealing the Barriers of Language Agents in Planning Paper • 2410.12409 • Published Oct 16, 2024 • 26