Dongfu Jiang commited on
Commit
a2f8211
1 Parent(s): edac579

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -48,11 +48,11 @@ We test the pairwise comparison on
48
 
49
  | Model | Summ | Exam | Code | Rewriting | Crea W | Func W | Comm | NLP | Overall |
50
  |:---------------------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:-----:|:--------:|:---------:|
51
- | Closed -source Models | | | | | | | | | |
52
  | ChatGPT | 33.3 | 40.3 | 36.6 | 31.6 | 48.2 | 40.4 | 47.6 | 45.8 | 42.7 |
53
  | Claude -2 | 30.6 | 36.1 | 41.7 | 34.2 | 48.1 | 42.5 | 40.6 | 48.5 | 42.4 |
54
  | GPT -4 | 59.7 | 51.4 | 69.2 | 58.3 | 66.7 | 60.4 | 58.3 | 65.2 | 61.9 |
55
- | Open -source Models | | | | | | | | | |
56
  | SteamSHP | 33.3 | 29.2 | 26.7 | 33.3 | 40.7 | 31.3 | 51.4 | 51.9 | 40.6 |
57
  | PandaLM | 29.2 | 33.3 | 31.7 | 23.3 | 43.5 | 32.9 | 44.8 | 48.9 | 38.9 |
58
  | LLaMA -2-Chat -13B | 20.8 | 27.8 | 19.2 | 20 | 31.5 | 27.5 | 35.8 | 31.8 | 29 |
 
48
 
49
  | Model | Summ | Exam | Code | Rewriting | Crea W | Func W | Comm | NLP | Overall |
50
  |:---------------------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:-----:|:--------:|:---------:|
51
+ | Closed -source Models |
52
  | ChatGPT | 33.3 | 40.3 | 36.6 | 31.6 | 48.2 | 40.4 | 47.6 | 45.8 | 42.7 |
53
  | Claude -2 | 30.6 | 36.1 | 41.7 | 34.2 | 48.1 | 42.5 | 40.6 | 48.5 | 42.4 |
54
  | GPT -4 | 59.7 | 51.4 | 69.2 | 58.3 | 66.7 | 60.4 | 58.3 | 65.2 | 61.9 |
55
+ | Open -source Models |
56
  | SteamSHP | 33.3 | 29.2 | 26.7 | 33.3 | 40.7 | 31.3 | 51.4 | 51.9 | 40.6 |
57
  | PandaLM | 29.2 | 33.3 | 31.7 | 23.3 | 43.5 | 32.9 | 44.8 | 48.9 | 38.9 |
58
  | LLaMA -2-Chat -13B | 20.8 | 27.8 | 19.2 | 20 | 31.5 | 27.5 | 35.8 | 31.8 | 29 |