OpsEval / data_v2 /lenovo_zh_mc_gen.csv
Junetheriver's picture
update leaderboard 2025-02-27
cd43969
raw
history blame
1.62 kB
name,zero_naive,zero_self_con,zero_cot,zero_cot_self_con,few_naive,few_self_con,few_cot,few_cot_self_con
Baichuan2-13B-Chat,65.0,60.0,72.5,67.5,62.5,60.0,70.0,67.5
ChatGLM3-6B,60.0,60.0,60.0,60.0,55.0,55.0,60.0,60.0
DevOps-Model-14B-Chat,60.0,67.5,65.0,57.5,67.5,70.0,62.5,70.0
ERNIE-Bot-4.0,75.0,75.0,77.5,77.5,75.0,75.0,82.5,82.5
GPT-3.5-turbo,60.0,62.5,65.0,70.0,57.5,57.5,62.5,62.5
GPT-4,77.5,77.5,82.5,82.5,77.5,77.5,82.5,82.5
LLaMA-2-13B,45.0,45.0,62.5,62.5,60.0,60.0,55.0,55.0
LLaMA-2-70B-Chat,22.5,22.5,75.0,75.0,20.0,20.0,57.5,57.5
LLaMA-2-7B,32.5,32.5,45.0,45.0,60.0,60.0,55.0,55.0
Mistral-7B,47.5,47.5,62.5,62.5,35.0,35.0,60.0,60.0
Qwen-14B-Chat,70.0,67.5,70.0,67.5,70.0,65.0,65.0,67.5
Qwen-72B-Chat,72.5,72.5,75.0,75.0,75.0,75.0,75.0,75.0
Yi-34B-Chat,75.0,75.0,87.5,82.5,62.5,57.5,52.5,52.5
Claude-3-Opus,71.42857142857143,71.42857142857143,,,,,,
Deepseek-R1-Distill-Llama-8B,47.142857142857146,47.142857142857146,60.0,60.0,60.0,60.0,40.0,40.0
Deepseek-R1-Distill-Qwen-1.5B,24.285714285714285,24.285714285714285,34.285714285714285,34.285714285714285,55.71428571428571,55.71428571428571,52.857142857142854,52.857142857142854
Deepseek-R1-Distill-Qwen-14B,68.57142857142858,68.57142857142858,,,68.57142857142858,68.57142857142858,,
Deepseek-R1-Distill-Qwen-32B,61.42857142857143,61.42857142857143,,,60.0,60.0,,
Deepseek-R1-Distill-Qwen-7B,40.0,40.0,30.0,30.0,42.85714285714286,42.85714285714286,57.142857142857146,57.142857142857146
Meta-Llama-3-8B-Instruct,47.14285714285714,47.14285714285714,44.285714285714285,44.285714285714285,45.714285714285715,45.714285714285715,32.857142857142854,32.857142857142854