BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval Paper • 2407.12883 • Published Jul 16, 2024 • 9
IMPersona: Evaluating Individual Level LM Impersonation Paper • 2504.04332 • Published 17 days ago • 1
IMPersona: Evaluating Individual Level LM Impersonation Paper • 2504.04332 • Published 17 days ago • 1
view article Article Introducing the LiveCodeBench Leaderboard - Holistic and Contamination-Free Evaluation of Code LLMs Apr 16, 2024 • 15