OpsEval / data /bosc_zh_mc.csv
Junetheriver's picture
update leaderboard
cb9ec9c
raw
history blame
No virus
1 kB
name,few_native,few_self_con,few_cot,few_cot_self_con,zero_native,zero_self_con,zero_cot,zero_cot_self_con
GPT4,52.5,52.5,62.5,62.5,57.5,57.5,57.5,57.5
Yi-34B-Chat,50,50,52.5,55,55,55,60,67.5
DevOps-Model-14B-Chat,50,50,55,62.5,35,27.5,37.5,52.5
LLaMA-2-7B,45,45,45,45,32.5,32.5,45,45
Qwen-72B-Chat,45,45,60,60,50,50,47.5,47.5
GPT-3.5-turbo,40,40,50,55,50,47.5,55,55
ERNIE-Bot-4.0,52.5,52.5,57.5,57.5,57.5,57.5,60,60
Mistral-7B,20,20,50,50,0,0,37.5,37.5
LLaMA-2-13B,50,50,42.5,42.5,42.5,42.5,50,50
Baichuan2-13B-Chat,37.5,37.5,42.5,45,37.5,40,47.5,52.5
Qwen-14B-Chat,50,47.5,55,57.5,47.5,45,50,47.5
LLaMA-2-70B-Chat,25,25,45,45,0,0,57.5,57.5
ChatGLM3-6B,47.5,47.5,45,45,35,35,50,50
InternLM2-Chat-20B,47.5,47.5,,,47.5,47.5,,
InternLM2-Chat-7B,55,55,62.5,62.5,60,60,57.5,57.5
gemma_2b,32.5,32.5,40,40,37.5,37.5,40,40
gemma_7b,40,40,50,50,32.5,32.5,62.5,62.5
qwen1.5-14b-base,47.5,47.5,45,45,47.5,47.5,50,50
qwen1.5-14b-chat,52.5,55,60,60,45,47.5,60,72.5
Claude-3-Opus,,,,,67.5,67.5,,