Ludwig Stumpp commited on
Commit
01a6a43
1 Parent(s): d9de755

Add lambada results and description column for benchmarks

Browse files
Files changed (3) hide show
  1. .vscode/extensions.json +2 -1
  2. benchmarks.csv +3 -2
  3. leaderboard.csv +20 -10
.vscode/extensions.json CHANGED
@@ -1,5 +1,6 @@
1
  {
2
  "recommendations": [
3
- "janisdd.vscode-edit-csv"
 
4
  ]
5
  }
 
1
  {
2
  "recommendations": [
3
+ "janisdd.vscode-edit-csv",
4
+ "mechatroner.rainbow-csv"
5
  ]
6
  }
benchmarks.csv CHANGED
@@ -1,2 +1,3 @@
1
- Benchmark Name ,Author ,URL
2
- Chatbot Arena Elo (lmsys) ,LMSYS ,https://lmsys.org/blog/2023-05-03-arena/
 
 
1
+ "Benchmark Name " ,"Author " ,URL ,"Description "
2
+ "Chatbot Arena Elo (lmsys) " ,"LMSYS " ,https://lmsys.org/blog/2023-05-03-arena/ ,"In this blog post, we introduce Chatbot Arena, an LLM benchmark platform featuring anonymous randomized battles in a crowdsourced manner. Chatbot Arena adopts the Elo rating system, which is a widely-used rating system in chess and other competitive games. (Source: https://lmsys.org/blog/2023-05-03-arena/)"
3
+ "LAMBADA " ,"Paperno et al. " ,https://arxiv.org/abs/1606.06031 ,"The LAMBADA evaluates the capabilities of computational models for text understanding by means of a word prediction task. LAMBADA is a collection of narrative passages sharing the characteristic that human subjects are able to guess their last word if they are exposed to the whole passage, but not if they only see the last sentence preceding the target word. To succeed on LAMBADA, computational models cannot simply rely on local context, but must be able to keep track of information in the broader discourse. (Source: https://huggingface.co/datasets/lambada)"
leaderboard.csv CHANGED
@@ -1,10 +1,20 @@
1
- Model Name ,Chatbot Arena Elo (lmsys)
2
- alpaca-13b , 1008
3
- chatglm-6b , 985
4
- dolly-v2-12b , 944
5
- fastchat-t5-3b , 951
6
- koala-13b , 1082
7
- llama-13b , 932
8
- stablelm-tuned-alpha-7b , 858
9
- vicuna-13b , 1169
10
- oasst-pythia-12b , 1065
 
 
 
 
 
 
 
 
 
 
 
1
+ Model Name ,Chatbot Arena Elo (lmsys) ,LAMBADA
2
+ alpaca-13b , 1008 ,
3
+ cerebras-7b , , 0.636
4
+ cerebras-13b , , 0.635
5
+ chatglm-6b , 985 ,
6
+ dolly-v2-12b , 944 ,
7
+ fastchat-t5-3b , 951 ,
8
+ gpt-neox-20b , , 0.719
9
+ gptj-6b , , 0.683
10
+ koala-13b , 1082 ,
11
+ llama-7b , , 0.738
12
+ llama-13b , 932 ,
13
+ mpt-7b , , 0.702
14
+ opt-7b , , 0.677
15
+ opt-13b , , 0.692
16
+ stablelm-base-alpha-7b , , 0.533
17
+ stablelm-tuned-alpha-7b , 858 ,
18
+ vicuna-13b , 1169 ,
19
+ oasst-pythia-7b , , 0.667
20
+ oasst-pythia-12b , 1065 , 0.704