weichiang commited on
Commit
7e04c2f
β€’
1 Parent(s): 561d82a

Update app.py

Browse files
Files changed (1) hide show
  1. app.py +2 -1
app.py CHANGED
@@ -45,7 +45,8 @@ Contribute your vote πŸ—³οΈ at [chat.lmsys.org](https://chat.lmsys.org)! Find m
45
 
46
  def make_full_leaderboard_md(elo_results):
47
  leaderboard_md = f"""
48
- Two more benchmarks are displayed: **MT-Bench** and **MMLU**.
 
49
  - [MT-Bench](https://arxiv.org/abs/2306.05685): a set of challenging multi-turn questions. We use GPT-4 to grade the model responses.
50
  - [MMLU](https://arxiv.org/abs/2009.03300) (5-shot): a test to measure a model's multitask accuracy on 57 tasks.
51
 
 
45
 
46
  def make_full_leaderboard_md(elo_results):
47
  leaderboard_md = f"""
48
+ Three benchmarks are displayed: **Arena Elo**, **MT-Bench** and **MMLU**.
49
+ - [Chatbot Arena](https://chat.lmsys.org/?arena) - a crowdsourced, randomized battle platform. We use 200K+ user votes to compute Elo ratings.
50
  - [MT-Bench](https://arxiv.org/abs/2306.05685): a set of challenging multi-turn questions. We use GPT-4 to grade the model responses.
51
  - [MMLU](https://arxiv.org/abs/2009.03300) (5-shot): a test to measure a model's multitask accuracy on 57 tasks.
52