Challenge LLMs to Reason About Reasoning: A Benchmark to Unveil Cognitive Depth in LLMs Paper • 2312.17080 • Published Dec 28, 2023 • 1
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing Paper • 2404.12253 • Published Apr 18 • 51
SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with Text-Rich Visual Comprehension Paper • 2404.16790 • Published Apr 25 • 7