OpsEval / data_v2 /bosc_zh_mc_gen.csv
Junetheriver's picture
update leaderboard 2024-09-11
a0e246d
name,zero_naive,zero_self_con,zero_cot,zero_cot_self_con,few_naive,few_self_con,few_cot,few_cot_self_con
Baichuan2-13B-Chat,37.5,40.0,47.5,52.5,37.5,37.5,42.5,45.0
ChatGLM3-6B,35.0,35.0,50.0,50.0,47.5,47.5,45.0,45.0
DevOps-Model-14B-Chat,35.0,27.5,37.5,52.5,50.0,50.0,55.0,62.5
ERNIE-Bot-4.0,57.5,57.5,60.0,60.0,52.5,52.5,57.5,57.5
GPT-3.5-turbo,50.0,47.5,55.0,55.0,40.0,40.0,50.0,55.0
Gpt4,57.5,57.5,57.5,57.5,52.5,52.5,62.5,62.5
InternLM2-Chat-20B,47.5,47.5,,,47.5,47.5,,
InternLM2-Chat-7B,60.0,60.0,57.5,57.5,55.0,55.0,62.5,62.5
LLaMA-2-13B,42.5,42.5,50.0,50.0,50.0,50.0,42.5,42.5
LLaMA-2-70B-Chat,0.0,0.0,57.5,57.5,25.0,25.0,45.0,45.0
LLaMA-2-7B,32.5,32.5,45.0,45.0,45.0,45.0,45.0,45.0
Mistral-7B,0.0,0.0,37.5,37.5,20.0,20.0,50.0,50.0
Qwen-14B-Chat,47.5,45.0,50.0,47.5,50.0,47.5,55.0,57.5
Qwen-72B-Chat,50.0,50.0,47.5,47.5,45.0,45.0,60.0,60.0
Yi-34B-Chat,55.0,55.0,60.0,67.5,50.0,50.0,52.5,55.0
Claude-3-Opus,72.85714285714286,72.85714285714286,,,,,,
Gemma_2B,37.5,37.5,40.0,40.0,32.5,32.5,40.0,40.0
Gemma_7B,32.5,32.5,62.5,62.5,40.0,40.0,50.0,50.0
Meta-Llama-3-8B-Instruct,52.85714285714286,52.85714285714286,47.14285714285714,47.14285714285714,52.85714285714286,52.85714285714286,30.0,30.0
Qwen1.5-14B-Base,47.5,47.5,50.0,50.0,47.5,47.5,45.0,45.0
Qwen1.5-14B-Chat,45.0,47.5,60.0,72.5,52.5,55.0,60.0,60.0