OpsEval / data /gtja_zh_mc.csv
Junetheriver's picture
update leaderboard
cb9ec9c
raw
history blame
1.53 kB
name,few_native,few_self_con,few_cot,few_cot_self_con,zero_native,zero_self_con,zero_cot,zero_cot_self_con
GPT4,70.33,70.33,71.43,71.43,68.13,68.13,67.03,67.03
Yi-34B-Chat,69.23,70.33,49.45,47.25,71.43,74.73,71.43,73.63
DevOps-Model-14B-Chat,61.54,59.34,52.75,63.74,41.76,38.46,45.05,49.45
LLaMA-2-7B,42.86,42.86,45.05,45.05,28.57,28.57,45.05,45.05
Qwen-72B-Chat,70.33,70.33,74.73,74.73,71.43,71.43,67.03,67.03
GPT-3.5-turbo,47.25,52.75,57.14,58.24,49.45,52.75,59.34,62.64
ERNIE-Bot-4.0,65.93,65.93,68.13,68.13,68.13,68.13,64.84,64.84
Mistral-7B,14.29,14.29,38.46,38.46,5.49,5.49,47.25,47.25
LLaMA-2-13B,47.25,47.25,42.86,42.86,30.77,30.77,47.25,47.25
Baichuan2-13B-Chat,38.46,38.46,49.45,51.65,41.76,41.76,53.85,60.44
Qwen-14B-Chat,54.95,54.95,59.34,61.54,47.25,47.25,53.85,54.95
LLaMA-2-70B-Chat,19.78,19.78,49.45,49.45,6.59,6.59,48.35,48.35
ChatGLM3-6B,43.95604396,43.95604396,47.25274725,47.25274725,43.95604396,43.95604396,53.84615385,53.84615385
InternLM2-Chat-20B,65.93406593,65.93406593,,,56.04395604,56.04395604,,
InternLM2-Chat-7B,54.94505495,54.94505495,51.64835165,51.64835165,56.04395604,56.04395604,59.34065934,59.34065934
gemma_2b,32.96703,32.96703,29.67033,29.67033,30.76923,30.76923,43.95604,43.95604
gemma_7b,34.06593,34.06593,50.54945,50.54945,29.67033,29.67033,56.04396,56.04396
qwen1.5-14b-base,68.13187,68.13187,42.85714,42.85714,53.84615,53.84615,63.73626,63.73626
qwen1.5-14b-chat,59.34066,57.14286,60.43956,62.63736,56.04396,54.94505,67.03297,68.13187
Claude-3-Opus,,,,,65.83,65.83,,