OpsEval / data /oracle_zh_mc.csv
Junetheriver's picture
update leaderboard
cb9ec9c
raw
history blame
1.91 kB
name,zero_native,zero_self_con,zero_cot,zero_cot_self_con,few_native,few_self_con,few_cot,few_cot_self_con
Baichuan-13B-Chat,12.88,12.07,25.96,27.57,18.91,19.52,27.97,30.58
Chinese-Alpaca-2-13B,22.94,22.94,25.75,25.75,25.15,25.15,22.33,22.33
GPT-3.5-turbo,36.42,35.81,39.24,43.26,39.84,39.44,27.16,27.77
LLaMA-2-13B,23.94,24.35,29.58,31.99,24.55,26.76,21.13,20.72
Qwen-7B-Chat,18.51,17.71,27.36,28.37,29.78,29.58,33.6,31.79
ChatGLM2-6B,23.34,23.34,24.35,24.14,22.94,22.94,26.16,26.16
Chinese-LLaMA-2-13B,14.69,14.69,19.92,19.92,19.72,19.72,20.93,20.93
InternLM-7B,25.96,25.96,25.96,25.96,29.18,29.18,28.37,28.37
LLaMA-2-7B,20.72,20.72,27.16,27.97,21.53,18.51,18.31,17.91
Baichuan2-13B-Chat,25.7,25.5,20.1,21.3,27.7,26.7,22.7,24.7
GPT-4,/,/,59.38,65.17,/,/,44.06,48.09
AquilaChat2-34B,34.66,34.66,47.74,47.74,44.48,44.48,NULL,NULL
Mistral-7B,1.9,1.9,45.61,45.61,15,15,35.97,35.97
Yi-34B-Chat,49.9,49.3,52.72,53.72,56.34,56.34,51.31,54.33
DevOps-Model-14B-Chat,24.75,22.74,28.37,27.77,36.62,37.02,27.57,26.36
Qwen-72B-Chat,48.29,48.49,49.5,49.7,49.7,49.7,45.27,44.87
ERNIE-Bot-4.0,48.56,48.56,50.64,50.64,48,48,54,54
Mistral-7B,0.2,0.2,26.76,26.76,10.26,10.26,32.19,32.19
Qwen-14B-Chat,27.57,27.57,32.39,36.02,40.04,35.41,30.38,33.4
LLaMA-2-70B-Chat,15.29,15.29,34.81,34.81,26.76,26.76,33.8,33.8
ChatGLM3-6B,21.32796781,21.32796781,28.97384306,28.97384306,21.73038229,21.73038229,29.57746479,29.57746479
InternLM2-Chat-7B,28.57142857,28.57142857,31.79074447,31.79074447,30.78470825,30.78470825,31.18712274,31.18712274
gemma_2b,18.51107,18.51107,24.9497,24.9497,21.52918,21.52918,27.7666,27.7666
gemma_7b,19.3159,19.3159,53.94737,53.94737,18.51107,18.51107,5.204461,5.204461
qwen1.5-14b-base,20.92555,20.92555,35.61368,35.61368,41.44869,41.44869,30.78471,30.78471
qwen1.5-14b-chat,24.14487,23.34004,40.64386,41.04628,38.22938,38.02817,39.43662,40.04024
Claude-3-Opus,48.09,48.09,,,,,,