OpsEval / leaderboard /wired_network_en.csv
Junetheriver's picture
added wired network results
45c0614
raw
history blame
1.9 kB
-,-,Zero-shot,-,v,-,3-Shot,-,-,-,-
-,-,Naïve,SC,CoT,CoT+SC,Naïve,SC,CoT,CoT+SC,Best Score
1,✨ GPT-4,/,/,/,/,/,/,88.7,88.7,88.7
2,✨ Yi-34B-Chat,57.75,59.14,65.11,68.79,68.16,68.37,78.09,80.06,80.06
3,✨ Qwen-72B-Chat,70.41,70.5,72.38,72.56,70.32,70.32,70.13,70.22,72.56
4,✨ GPT-3.5-turbo,66.6,66.8,69.6,72,68.3,68.3,70.9,72.5,72.5
5,✨ ERNIE-Bot-4.0,61.15,61.15,70,70,60,60,70,70,70
6,✨ qwen1.5-14b-chat,54.9,56.44,64.09,67.1,52.23,53.52,59.54,64.18,67.1
7,✨ qwen1.5-14b-base,34.88,34.88,60.82,60.82,65.55,65.55,47.08,47.08,65.55
8,✨ DevOps-Model-14B-Chat,30.69,30.59,55.77,63.63,63.85,61.96,41.15,44.01,63.85
9,✨ Qwen-14B-Chat,43.78,47.81,56.58,59.4,62.09,59.7,49.06,55.88,62.09
10,✨ LLaMA-2-13B,41.8,46.5,53.1,58.7,53.3,53,56.8,61,61
11,✨ InternLM2-Chat-20B,56.36,56.36,26.18,26.18,60.48,60.48,45.1,45.1,60.48
12,✨ LLaMA-2-70B-Chat,25.29,25.29,57.97,58.06,52.97,52.97,58.55,58.55,58.55
13,✨ InternLM2-Chat-7B,49.74,49.74,56.19,56.19,48.2,48.2,49.74,49.74,56.19
14,✨ LLaMA-2-7B,39.5,40,45.4,49.5,48.2,46.8,52,55.2,55.2
15,✨ Qwen-7B-Chat,45.9,46,47.3,50.1,52.1,51,48.3,49.8,52.1
16,✨ gemma_7b,25.09,25.09,50.86,50.86,30.24,30.24,51.56,51.56,51.56
17,✨ InternLM-7B,38.7,38.7,43.9,43.9,45.2,45.2,51.4,51.4,51.4
18,✨ Chinese-Alpaca-2-13B,37.7,37.7,49.7,49.7,48.6,48.6,50.5,50.5,50.5
19,✨ Mistral-7B,29.27,29.27,46.3,46.3,47.22,47.22,45.58,45.58,47.22
20,✨ AquilaChat2-34B,36.63,36.63,44.83,44.83,46.65,46.65,NULL,NULL,46.65
21,✨ ChatGLM3-6B,43.38,43.38,44.59,44.59,42.1,42.1,43.47,43.47,44.59
22,✨ ChatGLM2-6B,24.8,24.7,36.6,36.5,37.6,37.6,40.5,40.5,40.5
23,✨ Chinese-LLaMA-2-13B,29.4,29.4,37.8,37.8,40.4,40.4,28.8,28.8,40.4
24,✨ gemma_2b,26.46,26.46,33.42,33.42,26.63,26.63,37.54,37.54,37.54
25,✨ Baichuan-13B-Chat,18.3,20.4,28.6,37,24.1,26.7,18.2,17.8,37
26,✨ Baichuan2-13B-Chat,14.1,15.3,24.1,25.8,32.3,33.1,25.6,27.7,33.1