Challenge LLMs to Reason About Reasoning: A Benchmark to Unveil Cognitive Depth in LLMs Paper • 2312.17080 • Published Dec 28, 2023 • 1
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing Paper • 2404.12253 • Published 21 days ago • 49
SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with Text-Rich Visual Comprehension Paper • 2404.16790 • Published 14 days ago • 7