Introducing the LiveCodeBench Leaderboard - Holistic and Contamination-Free Evaluation of Code LLMs 11 days ago • 8
Introducing ConTextual: How well can your Multimodal model jointly reason over text and image in text-rich scenes? Mar 5 • 2
NPHardEval Leaderboard: Unveiling the Reasoning Abilities of Large Language Models through Complexity Classes and Dynamic Updates Feb 2 • 1
The Hallucinations Leaderboard, an Open Effort to Measure Hallucinations in Large Language Models Jan 29 • 4
A guide to setting up your own Hugging Face leaderboard: an end-to-end example with Vectara's hallucination leaderboard Jan 12 • 3