OpsEval / data /zte_en_mc.csv
Junetheriver's picture
update leaderboard
cb9ec9c
raw
history blame
1.56 kB
name,zero_native,zero_self_con,zero_cot,zero_cot_self_con,few_native,few_self_con,few_cot,few_cot_self_con
Baichuan-13B-Chat,11.6,14.31,14.68,18.46,14.56,15.68,16.21,16.82
Chinese-Alpaca-2-13B,20.86,20.86,23.08,23.08,29.75,29.75,32.83,32.83
GPT-3.5-turbo,35.04,34.82,38.46,43.5,39.29,39.19,41.01,42.58
LLaMA-2-13B,15.62,18.32,29.88,34.45,23.16,29.14,37.59,44.3
Qwen-7B-Chat,33.37,33.74,32.97,34.1,32.98,32.7,36.6,36.65
ChatGLM2-6B,15.94,16.06,19.83,19.91,26.27,26.22,28.25,28.37
Chinese-LLaMA-2-13B,10.02,10.02,19.51,19.51,34.51,34.51,33.34,33.34
InternLM-7B,20.48,20.48,23.85,23.85,23.69,23.69,26.06,26.06
LLaMA-2-7B,19.42,21.62,25.46,27.11,21.45,24.85,33.6,34.83
GPT-4,/,/,56.9,65.49,/,/,59.39,63.54
Yi-34B-Chat,38.24,37.04,48.24,52.1,61.33,61.19,53.53,53.39
DevOps-Model-14B-Chat,31.04,30.51,42.84,47.37,52.25,49.38,45.9,47.23
Qwen-72B-Chat,53.19,53.19,55.25,55.52,58.13,58.13,58.72,58.99
ERNIE-Bot-4.0,43.66,43.66,51.99,51.99,44,44,50,50
Mistral-7B,26.91,26.91,30.65,30.65,40.52,40.52,46.84,46.84
Qwen-14B-Chat,33.71,36.25,41.24,42.51,51.19,50.39,57.18,59.18
LLaMA-2-70B-Chat,23.64,23.64,39.31,39.31,38.98,39.12,47.9,47.9
ChatGLM3-6B,30.4,30.4,30.7,30.7,26.9,26.9,37.2,37.2
InternLM2-Chat-20B,39.1,39.1,37.7,37.7,47.7,47.7,33.5,33.5
InternLM2-Chat-7B,36.8,36.8,31.7,31.7,46.3,46.3,36.9,36.9
gemma_2b,20.1,20.1,24.2,24.2,31.2,31.2,35.5,35.5
gemma_7b,23.1,23.1,34.4,34.4,21.4,21.4,33.1,33.1
qwen1.5-14b-base,34,34,42.8,42.8,57.9,57.9,40.2,40.2
qwen1.5-14b-chat,34.5,35.6,41.7,41.1,33.2,34.7,46.2,47.4
Claude-3-Opus,62.4,62.4,,,,,,