yhyu13
commited on
Commit
•
79ec3a4
1
Parent(s):
33bd671
Add alpaca eval
Browse files- README.md +21 -0
- alpaca_eval/chatgpt_fn_--SOLAR-10-7B-LMCocktail/alpaca_eval_log.txt +0 -0
- alpaca_eval/chatgpt_fn_--SOLAR-10-7B-LMCocktail/annotation_chatgpt_fn.json +0 -0
- alpaca_eval/chatgpt_fn_--SOLAR-10-7B-LMCocktail/leaderboard.csv +15 -0
- alpaca_eval/chatgpt_fn_--SOLAR-10-7B-LMCocktail/model_outputs.json +0 -0
- alpaca_eval/chatgpt_fn_--SOLAR-10-7B-LMCocktail/reference_outputs.json +0 -0
README.md
CHANGED
@@ -14,6 +14,27 @@ https://huggingface.co/rishiraj/meow
|
|
14 |
|
15 |
who rank #1 and #2 among models <13B in the https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard by 2023/12/20.
|
16 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
17 |
|
18 |
# Code
|
19 |
|
|
|
14 |
|
15 |
who rank #1 and #2 among models <13B in the https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard by 2023/12/20.
|
16 |
|
17 |
+
# Alpaca Eval
|
18 |
+
|
19 |
+
I am thrilled to announce that ChatGPT has ranked LMCocktail 10.7B as the second best model next to GPT4 on AlpcaEval in my local community run. You can also check the leaderboard at [./alpaca_eval/chatgpt_fn_--SOLAR-10-7B-LMCocktail/](./alpaca_eval/chatgpt_fn_--SOLAR-10-7B-LMCocktail/)
|
20 |
+
|
21 |
+
```
|
22 |
+
win_rate standard_error n_total avg_length
|
23 |
+
gpt4 73.79 1.54 805 1365
|
24 |
+
SOLAR-10.7B-LMCocktail(new)73.45 1.56 804 1203
|
25 |
+
claude 70.37 1.60 805 1082
|
26 |
+
chatgpt 66.09 1.66 805 811
|
27 |
+
wizardlm-13b 65.16 1.67 805 985
|
28 |
+
vicuna-13b 64.10 1.69 805 1037
|
29 |
+
guanaco-65b 62.36 1.71 805 1249
|
30 |
+
oasst-rlhf-llama-33b 62.05 1.71 805 1079
|
31 |
+
alpaca-farm-ppo-human 60.25 1.72 805 803
|
32 |
+
falcon-40b-instruct 56.52 1.74 805 662
|
33 |
+
text_davinci_003 50.00 0.00 805 307
|
34 |
+
alpaca-7b 45.22 1.74 805 396
|
35 |
+
text_davinci_001 28.07 1.56 805 296
|
36 |
+
```
|
37 |
+
|
38 |
|
39 |
# Code
|
40 |
|
alpaca_eval/chatgpt_fn_--SOLAR-10-7B-LMCocktail/alpaca_eval_log.txt
ADDED
The diff for this file is too large to render.
See raw diff
|
|
alpaca_eval/chatgpt_fn_--SOLAR-10-7B-LMCocktail/annotation_chatgpt_fn.json
ADDED
The diff for this file is too large to render.
See raw diff
|
|
alpaca_eval/chatgpt_fn_--SOLAR-10-7B-LMCocktail/leaderboard.csv
ADDED
@@ -0,0 +1,15 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
,win_rate,standard_error,n_wins,n_wins_base,n_draws,n_total,mode,avg_length
|
2 |
+
gpt4,73.7888198757764,1.5359801545073597,588,205,12,805,minimal,1365
|
3 |
+
SOLAR-10.7B-LMCocktail,73.44527363184079,1.5572150363643398,590,213,1,804,community,1203
|
4 |
+
claude,70.37267080745342,1.599519507147828,562,234,9,805,minimal,1082
|
5 |
+
chatgpt,66.08695652173913,1.6626479994330317,529,270,6,805,minimal,811
|
6 |
+
wizardlm-13b,65.15527950310559,1.670034107787565,520,276,9,805,minimal,985
|
7 |
+
vicuna-13b,64.09937888198758,1.6895185863153146,515,288,2,805,minimal,1037
|
8 |
+
guanaco-65b,62.36024844720497,1.7086348811605765,502,303,0,805,minimal,1249
|
9 |
+
oasst-rlhf-llama-33b,62.0496894409938,1.7080028976103514,498,304,3,805,minimal,1079
|
10 |
+
alpaca-farm-ppo-human,60.24844720496895,1.7169496733548772,481,316,8,805,minimal,803
|
11 |
+
falcon-40b-instruct,56.52173913043478,1.7438750520312944,453,348,4,805,minimal,662
|
12 |
+
phi-2-alpaca-gpt4-dpo,55.59701492537313,1.7533719245384989,447,357,0,804,community,4532
|
13 |
+
text_davinci_003,50.0,0.0,0,0,805,805,minimal,307
|
14 |
+
alpaca-7b,45.21739130434783,1.7375846781579476,356,433,16,805,minimal,396
|
15 |
+
text_davinci_001,28.07453416149068,1.5602183426587484,216,569,20,805,minimal,296
|
alpaca_eval/chatgpt_fn_--SOLAR-10-7B-LMCocktail/model_outputs.json
ADDED
The diff for this file is too large to render.
See raw diff
|
|
alpaca_eval/chatgpt_fn_--SOLAR-10-7B-LMCocktail/reference_outputs.json
ADDED
The diff for this file is too large to render.
See raw diff
|
|