From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline Paper • 2406.11939 • Published Jun 17 • 6
From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline Paper • 2406.11939 • Published Jun 17 • 6
From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline Paper • 2406.11939 • Published Jun 17 • 6
Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference Paper • 2403.04132 • Published Mar 7 • 38
Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference Paper • 2403.04132 • Published Mar 7 • 38
Judging LLM-as-a-judge with MT-Bench and Chatbot Arena Paper • 2306.05685 • Published Jun 9, 2023 • 31
LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset Paper • 2309.11998 • Published Sep 21, 2023 • 25
LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset Paper • 2309.11998 • Published Sep 21, 2023 • 25