MMEvalPro: Calibrating Multimodal Benchmarks Towards Trustworthy and Efficient Evaluation Paper • 2407.00468 • Published Jun 29 • 35
CivRealm: A Learning and Reasoning Odyssey in Civilization for Decision-Making Agents Paper • 2401.10568 • Published Jan 19 • 15