Spaces:
Running
Running
name,zero_naive,zero_self_con,zero_cot,zero_cot_self_con,few_naive,few_self_con,few_cot,few_cot_self_con | |
Baichuan2-13B-Chat,65.0,60.0,72.5,67.5,62.5,60.0,70.0,67.5 | |
ChatGLM3-6B,60.0,60.0,60.0,60.0,55.0,55.0,60.0,60.0 | |
DevOps-Model-14B-Chat,60.0,67.5,65.0,57.5,67.5,70.0,62.5,70.0 | |
ERNIE-Bot-4.0,75.0,75.0,77.5,77.5,75.0,75.0,82.5,82.5 | |
GPT-3.5-turbo,60.0,62.5,65.0,70.0,57.5,57.5,62.5,62.5 | |
GPT-4,77.5,77.5,82.5,82.5,77.5,77.5,82.5,82.5 | |
LLaMA-2-13B,45.0,45.0,62.5,62.5,60.0,60.0,55.0,55.0 | |
LLaMA-2-70B-Chat,22.5,22.5,75.0,75.0,20.0,20.0,57.5,57.5 | |
LLaMA-2-7B,32.5,32.5,45.0,45.0,60.0,60.0,55.0,55.0 | |
Mistral-7B,47.5,47.5,62.5,62.5,35.0,35.0,60.0,60.0 | |
Qwen-14B-Chat,70.0,67.5,70.0,67.5,70.0,65.0,65.0,67.5 | |
Qwen-72B-Chat,72.5,72.5,75.0,75.0,75.0,75.0,75.0,75.0 | |
Yi-34B-Chat,75.0,75.0,87.5,82.5,62.5,57.5,52.5,52.5 | |
Claude-3-Opus,71.42857142857143,71.42857142857143,,,,,, | |
Deepseek-R1-Distill-Llama-8B,47.142857142857146,47.142857142857146,60.0,60.0,60.0,60.0,40.0,40.0 | |
Deepseek-R1-Distill-Qwen-1.5B,24.285714285714285,24.285714285714285,34.285714285714285,34.285714285714285,55.71428571428571,55.71428571428571,52.857142857142854,52.857142857142854 | |
Deepseek-R1-Distill-Qwen-14B,68.57142857142858,68.57142857142858,,,68.57142857142858,68.57142857142858,, | |
Deepseek-R1-Distill-Qwen-32B,61.42857142857143,61.42857142857143,,,60.0,60.0,, | |
Deepseek-R1-Distill-Qwen-7B,40.0,40.0,30.0,30.0,42.85714285714286,42.85714285714286,57.142857142857146,57.142857142857146 | |
Meta-Llama-3-8B-Instruct,47.14285714285714,47.14285714285714,44.285714285714285,44.285714285714285,45.714285714285715,45.714285714285715,32.857142857142854,32.857142857142854 | |