OpsEval / data /oracle_en_mc.csv
Junetheriver's picture
update leaderboard
cb9ec9c
raw
history blame
1.9 kB
name,zero_native,zero_self_con,zero_cot,zero_cot_self_con,few_native,few_self_con,few_cot,few_cot_self_con
Baichuan-13B-Chat,12.47,11.67,16.5,19.52,24.55,22.54,26.36,28.77
Chinese-Alpaca-2-13B,23.14,23.14,28.97,28.97,16.3,16.3,14.29,14.29
GPT-3.5-turbo,38.63,38.83,40.04,42.05,36.62,37.63,42.66,43.86
LLaMA-2-13B,16.1,20.32,23.94,29.58,20.12,22.33,24.35,33.8
Qwen-7B-Chat,18.91,19.11,22.13,23.94,26.76,25.55,34.81,33.4
ChatGLM2-6B,20.72,20.52,19.92,19.72,20.12,20.12,22.94,22.74
Chinese-LLaMA-2-13B,13.88,13.88,20.52,20.52,16.9,16.9,23.34,23.34
InternLM-7B,26.36,26.36,25.55,25.55,25.55,25.55,27.97,27.97
LLaMA-2-7B,22.13,23.74,23.74,26.56,19.32,20.52,28.77,33.6
Baichuan2-13B-Chat,17.1,19.1,18.7,22.9,25.9,26.5,20.9,24.5
GPT-4,/,/,59.02,64.56,/,/,58.35,62.58
AquilaChat2-34B,36.63,36.63,44.83,44.83,46.65,46.65,NULL,NULL
Yi-34B-Chat,47.08,48.69,47.08,46.28,58.15,58.35,56.94,58.95
DevOps-Model-14B-Chat,25.15,26.96,35.41,38.83,33.2,34.81,27.36,27.36
Qwen-72B-Chat,47.28,47.48,48.09,48.09,49.7,49.7,43.46,43.66
ERNIE-Bot-4.0,43.8,43.8,47.14,47.14,46,46,54,54
Mistral-7B,17.1,17.1,26.76,26.76,31.19,31.19,27.97,27.97
Qwen-14B-Chat,24.95,28.37,33,36.62,27.97,28.37,27.97,24.14
LLaMA-2-70B-Chat,19.72,19.72,27.97,27.97,26.56,26.56,32.6,32.6
ChatGLM3-6B,20.92555332,20.92555332,25.15090543,25.15090543,24.74849095,24.74849095,29.1750503,29.1750503
InternLM2-Chat-20B,,,59.21052632,59.21052632,,,,
InternLM2-Chat-7B,27.16297787,27.16297787,28.16901408,28.16901408,29.97987928,29.97987928,30.18108652,30.18108652
gemma_2b,16.90141,16.90141,19.5171,19.5171,16.09658,16.09658,24.74849,24.74849
gemma_7b,14.28571,14.28571,30.98592,30.98592,2.60223,2.60223,43.85965,43.85965
qwen1.5-14b-base,29.17505,29.17505,33.60161,33.60161,36.82093,36.82093,27.7666,27.7666
qwen1.5-14b-chat,32.79678,35.41247,39.43662,43.05835,32.39437,33.60161,36.82093,38.833
Claude-3-Opus,46.47,46.47,,,,,,