Spaces:
Build error
Build error
Upload 4 files
Browse files- 专业学科能力榜单.csv +20 -0
- 安全与责任榜单.csv +17 -0
- 综合榜单.csv +17 -0
- 通用语言能力榜单.csv +17 -0
专业学科能力榜单.csv
ADDED
|
@@ -0,0 +1,20 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
排名,大模型名称,"回答获取
|
| 2 |
+
方式",机构,"中学水平
|
| 3 |
+
正确率","大学水平
|
| 4 |
+
正确率",平均正确率
|
| 5 |
+
1,GPT 4-Turbo,API,OpenAI,83.95%,72.14%,76.77%
|
| 6 |
+
2,GPT 4,API,OpenAI,81.94%,73.22%,76.64%
|
| 7 |
+
3,Gemini Pro,网页,Google,79.93%,60.69%,68.24%
|
| 8 |
+
4,文心一言4.0,API,百度,69.23%,66.09%,67.32%
|
| 9 |
+
5,Claude 2,网页,Anthropic,80.94%,55.29%,65.35%
|
| 10 |
+
6,商汤日日新,API,商汤科技,73.24%,58.10%,64.04%
|
| 11 |
+
7,GPT 3.5-Turbo,API,OpenAI,76.92%,54.43%,63.25%
|
| 12 |
+
8,MiniMax,API,MiniMax,73.91%,53.78%,61.68%
|
| 13 |
+
9,Llama 2,API,Meta,64.55%,58.53%,60.89%
|
| 14 |
+
10,通义千问2.0,API,阿里巴巴,65.89%,49.46%,55.91%
|
| 15 |
+
11,讯飞星火v3.0,API,科大讯飞,67.22%,47.52%,55.25%
|
| 16 |
+
12,百川大模型,API,百川智能,59.87%,49.89%,53.81%
|
| 17 |
+
13,360智脑,API,360,74.92%,36.07%,51.31%
|
| 18 |
+
14,智谱清言,API,清华&智谱,54.85%,39.31%,45.41%
|
| 19 |
+
15,BLOOMZ-7B,API,BigScience,32.11%,32.18%,32.15%
|
| 20 |
+
16,悟道·天鹰,API,智源研究院,29.10%,24.84%,26.51%
|
安全与责任榜单.csv
ADDED
|
@@ -0,0 +1,17 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
排名,大模型名称,机构,回答获取方式,一般攻击,指令攻击,综合得分
|
| 2 |
+
1,Llama2 70B,Meta,API,89.39,76.57,85.12
|
| 3 |
+
2,Gemini Pro,Google,网页,85.53,72.50,81.18
|
| 4 |
+
3,GPT4-Turbo,OpenAI,API,85.13,63.86,78.04
|
| 5 |
+
4,Claude2,Anthropic,网页,78.47,68.79,75.24
|
| 6 |
+
5,通义千问2.0 (qwen-max),阿里巴巴,API,73.46,61.43,69.45
|
| 7 |
+
6,商汤日日新 (SenseNova),商汤科技,API,77.21,53.21,69.21
|
| 8 |
+
7,文心一言4(ERNIEBot-4),百度,API,70.37,62.86,67.86
|
| 9 |
+
8,智谱清言(ChatGLM3),清华&智谱,API,71.25,55.36,65.95
|
| 10 |
+
9,百川大模型(Baichuan2),百川智能,API,65.76,53.29,61.60
|
| 11 |
+
10,GPT3.5-Turbo,OpenAI,API,67.21,43.93,59.45
|
| 12 |
+
11,悟道·天鹰 (AquilaChat-7B),智源研究院,API,65.23,39.36,56.61
|
| 13 |
+
12,讯飞星火v3.0,科大讯飞,API,59.80,48.57,56.06
|
| 14 |
+
13,GPT4,OpenAI,API,60.71,43.21,54.88
|
| 15 |
+
14,360智脑 (360GPT_S2_V9),360,API,62.13,37.07,53.78
|
| 16 |
+
15,MiniMax (abab5.5-chat),MiniMax,API,60.04,28.57,49.55
|
| 17 |
+
16,BLOOMZ-7B,BigScience,API,49.95,42.07,47.32
|
综合榜单.csv
ADDED
|
@@ -0,0 +1,17 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
综合排名,大模型名称,机构,回答获取方式,通用语言能力,专业学科能力,安全与责任,综合得分
|
| 2 |
+
1,GPT4-Turbo,OpenAI,API,91.01,76.77,78.04,82.89
|
| 3 |
+
2,Gemini Pro,Google,网页,85.96,68.24,81.18,78.95
|
| 4 |
+
3,Llama2 70B,Meta,API,80.09,60.89,85.12,75.27
|
| 5 |
+
6,GPT4,OpenAI,API,83.99,76.64,54.88,73.70
|
| 6 |
+
4,文心一言4 (ERNIEBot-4),百度,API,81.78,67.32,67.86,73.33
|
| 7 |
+
5,Claude2,Anthropic,网页,77.88,65.35,75.24,73.13
|
| 8 |
+
7,GPT3.5-Turbo,OpenAI,API,83.12,63.26,59.45,70.27
|
| 9 |
+
8,商汤日日新 (SenseNova),商汤科技,API,74.11,64.04,69.21,69.53
|
| 10 |
+
9,通义千问2.0 (qwen-max),阿里巴巴,API,76.39,55.91,69.45,67.90
|
| 11 |
+
10,MiniMax (abab5.5-chat),MiniMax,API,70.81,61.68,49.55,62.08
|
| 12 |
+
11,讯飞星火v3.0,科大讯飞,API,70.24,55.25,56.06,61.55
|
| 13 |
+
12,智谱清言 (ChatGLM3),清华&智谱,API,70.66,45.41,65.95,61.24
|
| 14 |
+
13,百川大模型(Baichuan2),百川智能,API,63.67,53.81,61.60,59.93
|
| 15 |
+
14,360智脑 (360GPT_S2_V9),360,API,68.95,51.31,53.78,59.14
|
| 16 |
+
15,悟道·天鹰 (AquilaChat-7B),智源研究院,API,56.82,26.51,56.61,47.00
|
| 17 |
+
16,BLOOMZ-7B,BigScience,API,51.44,32.15,47.32,44.10
|
通用语言能力榜单.csv
ADDED
|
@@ -0,0 +1,17 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
综合排名,大模型名称,机构,回答获取方式,自由问答,内容总结,内容创作,指令遵循,逻辑推理,多轮对话,场景模拟,角色扮演,综合得分
|
| 2 |
+
1,GPT4-Turbo,OpenAI,API,81.93,93.32,96.57,99.14,89.83,97.57,86.14,83.57,91.01
|
| 3 |
+
2,Gemini Pro,Google,网页,90.79,90.16,95.93,89.14,49.15,99.71,87.8,85.02,85.96
|
| 4 |
+
3,GPT4,OpenAI,API,73.14,86.07,89.29,95.29,62.71,97.86,83.29,84.29,83.99
|
| 5 |
+
4,GPT3.5-Turbo,OpenAI,API,77,87.27,89.93,97.96,57.63,93.29,82.29,79.57,83.12
|
| 6 |
+
5,文心一言4 (ERNIEBot4),百度,API,66.57,78.23,84.07,92.57,67.8,96.57,84.57,83.86,81.78
|
| 7 |
+
6,Llama2 70B,Meta,API,84.64,78.21,82.14,93.29,37.29,96.86,84.77,83.48,80.09
|
| 8 |
+
7,Claude2,Anthropic,网页,78,73.68,78,98.71,32.2,98.14,77.26,87.05,77.88
|
| 9 |
+
8,通义千问2.0 (qwen-max),阿里巴巴,API,70.36,68.64,73.43,97.43,59.32,93.29,79.86,68.77,76.39
|
| 10 |
+
9,商汤日日新 (SenseNova),商汤科技,API,69.43,71.43,75.71,95.86,57.63,70.71,78.17,73.91,74.11
|
| 11 |
+
10,MiniMax (abab5.5-chat),MiniMax,API,62.64,62.95,67.86,86.71,49.15,90.43,71.4,75.38,70.81
|
| 12 |
+
11,智谱清言 (ChatGLM3),清华&智谱,API,67.64,61.34,66.79,84.29,37.29,97.86,74.71,75.38,70.66
|
| 13 |
+
12,讯飞星火v3.0,科大讯飞,API,58,62.54,68.14,85.71,55.93,76.86,79.11,75.61,70.24
|
| 14 |
+
13,360智脑 (360GPT_S2_V9),360,API,54.5,55.55,62.64,84.86,67.8,89.29,66.51,70.46,68.95
|
| 15 |
+
14,百川大模型 (Baichuan2),百川智能,API,59,64.89,70.57,73.43,30.51,80.43,67.51,63.01,63.67
|
| 16 |
+
15,悟道·天鹰(AquilaChat-7B),智源研究院,API,56.71,56.73,62.07,58.57,30.51,70.71,56.26,62.98,56.82
|
| 17 |
+
16,BLOOMZ-7B,BigScience,API,52.86,45.34,50.93,63.71,22.03,65.71,50.69,60.23,51.44
|