CathieDaDa commited on
Commit
3476023
·
verified ·
1 Parent(s): 3237a42

Upload 4 files

Browse files
专业学科能力榜单.csv ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 排名,大模型名称,"回答获取
2
+ 方式",机构,"中学水平
3
+ 正确率","大学水平
4
+ 正确率",平均正确率
5
+ 1,GPT 4-Turbo,API,OpenAI,83.95%,72.14%,76.77%
6
+ 2,GPT 4,API,OpenAI,81.94%,73.22%,76.64%
7
+ 3,Gemini Pro,网页,Google,79.93%,60.69%,68.24%
8
+ 4,文心一言4.0,API,百度,69.23%,66.09%,67.32%
9
+ 5,Claude 2,网页,Anthropic,80.94%,55.29%,65.35%
10
+ 6,商汤日日新,API,商汤科技,73.24%,58.10%,64.04%
11
+ 7,GPT 3.5-Turbo,API,OpenAI,76.92%,54.43%,63.25%
12
+ 8,MiniMax,API,MiniMax,73.91%,53.78%,61.68%
13
+ 9,Llama 2,API,Meta,64.55%,58.53%,60.89%
14
+ 10,通义千问2.0,API,阿里巴巴,65.89%,49.46%,55.91%
15
+ 11,讯飞星火v3.0,API,科大讯飞,67.22%,47.52%,55.25%
16
+ 12,百川大模型,API,百川智能,59.87%,49.89%,53.81%
17
+ 13,360智脑,API,360,74.92%,36.07%,51.31%
18
+ 14,智谱清言,API,清华&智谱,54.85%,39.31%,45.41%
19
+ 15,BLOOMZ-7B,API,BigScience,32.11%,32.18%,32.15%
20
+ 16,悟道·天鹰,API,智源研究院,29.10%,24.84%,26.51%
安全与责任榜单.csv ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 排名,大模型名称,机构,回答获取方式,一般攻击,指令攻击,综合得分
2
+ 1,Llama2 70B,Meta,API,89.39,76.57,85.12
3
+ 2,Gemini Pro,Google,网页,85.53,72.50,81.18
4
+ 3,GPT4-Turbo,OpenAI,API,85.13,63.86,78.04
5
+ 4,Claude2,Anthropic,网页,78.47,68.79,75.24
6
+ 5,通义千问2.0 (qwen-max),阿里巴巴,API,73.46,61.43,69.45
7
+ 6,商汤日日新 (SenseNova),商汤科技,API,77.21,53.21,69.21
8
+ 7,文心一言4(ERNIEBot-4),百度,API,70.37,62.86,67.86
9
+ 8,智谱清言(ChatGLM3),清华&智谱,API,71.25,55.36,65.95
10
+ 9,百川大模型(Baichuan2),百川智能,API,65.76,53.29,61.60
11
+ 10,GPT3.5-Turbo,OpenAI,API,67.21,43.93,59.45
12
+ 11,悟道·天鹰 (AquilaChat-7B),智源研究院,API,65.23,39.36,56.61
13
+ 12,讯飞星火v3.0,科大讯飞,API,59.80,48.57,56.06
14
+ 13,GPT4,OpenAI,API,60.71,43.21,54.88
15
+ 14,360智脑 (360GPT_S2_V9),360,API,62.13,37.07,53.78
16
+ 15,MiniMax (abab5.5-chat),MiniMax,API,60.04,28.57,49.55
17
+ 16,BLOOMZ-7B,BigScience,API,49.95,42.07,47.32
综合榜单.csv ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 综合排名,大模型名称,机构,回答获取方式,通用语言能力,专业学科能力,安全与责任,综合得分
2
+ 1,GPT4-Turbo,OpenAI,API,91.01,76.77,78.04,82.89
3
+ 2,Gemini Pro,Google,网页,85.96,68.24,81.18,78.95
4
+ 3,Llama2 70B,Meta,API,80.09,60.89,85.12,75.27
5
+ 6,GPT4,OpenAI,API,83.99,76.64,54.88,73.70
6
+ 4,文心一言4 (ERNIEBot-4),百度,API,81.78,67.32,67.86,73.33
7
+ 5,Claude2,Anthropic,网页,77.88,65.35,75.24,73.13
8
+ 7,GPT3.5-Turbo,OpenAI,API,83.12,63.26,59.45,70.27
9
+ 8,商汤日日新 (SenseNova),商汤科技,API,74.11,64.04,69.21,69.53
10
+ 9,通义千问2.0 (qwen-max),阿里巴巴,API,76.39,55.91,69.45,67.90
11
+ 10,MiniMax (abab5.5-chat),MiniMax,API,70.81,61.68,49.55,62.08
12
+ 11,讯飞星火v3.0,科大讯飞,API,70.24,55.25,56.06,61.55
13
+ 12,智谱清言 (ChatGLM3),清华&智谱,API,70.66,45.41,65.95,61.24
14
+ 13,百川大模型(Baichuan2),百川智能,API,63.67,53.81,61.60,59.93
15
+ 14,360智脑 (360GPT_S2_V9),360,API,68.95,51.31,53.78,59.14
16
+ 15,悟道·天鹰 (AquilaChat-7B),智源研究院,API,56.82,26.51,56.61,47.00
17
+ 16,BLOOMZ-7B,BigScience,API,51.44,32.15,47.32,44.10
通用语言能力榜单.csv ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 综合排名,大模型名称,机构,回答获取方式,自由问答,内容总结,内容创作,指令遵循,逻辑推理,多轮对话,场景模拟,角色扮演,综合得分
2
+ 1,GPT4-Turbo,OpenAI,API,81.93,93.32,96.57,99.14,89.83,97.57,86.14,83.57,91.01
3
+ 2,Gemini Pro,Google,网页,90.79,90.16,95.93,89.14,49.15,99.71,87.8,85.02,85.96
4
+ 3,GPT4,OpenAI,API,73.14,86.07,89.29,95.29,62.71,97.86,83.29,84.29,83.99
5
+ 4,GPT3.5-Turbo,OpenAI,API,77,87.27,89.93,97.96,57.63,93.29,82.29,79.57,83.12
6
+ 5,文心一言4 (ERNIEBot4),百度,API,66.57,78.23,84.07,92.57,67.8,96.57,84.57,83.86,81.78
7
+ 6,Llama2 70B,Meta,API,84.64,78.21,82.14,93.29,37.29,96.86,84.77,83.48,80.09
8
+ 7,Claude2,Anthropic,网页,78,73.68,78,98.71,32.2,98.14,77.26,87.05,77.88
9
+ 8,通义千问2.0 (qwen-max),阿里巴巴,API,70.36,68.64,73.43,97.43,59.32,93.29,79.86,68.77,76.39
10
+ 9,商汤日日新 (SenseNova),商汤科技,API,69.43,71.43,75.71,95.86,57.63,70.71,78.17,73.91,74.11
11
+ 10,MiniMax (abab5.5-chat),MiniMax,API,62.64,62.95,67.86,86.71,49.15,90.43,71.4,75.38,70.81
12
+ 11,智谱清言 (ChatGLM3),清华&智谱,API,67.64,61.34,66.79,84.29,37.29,97.86,74.71,75.38,70.66
13
+ 12,讯飞星火v3.0,科大讯飞,API,58,62.54,68.14,85.71,55.93,76.86,79.11,75.61,70.24
14
+ 13,360智脑 (360GPT_S2_V9),360,API,54.5,55.55,62.64,84.86,67.8,89.29,66.51,70.46,68.95
15
+ 14,百川大模型 (Baichuan2),百川智能,API,59,64.89,70.57,73.43,30.51,80.43,67.51,63.01,63.67
16
+ 15,悟道·天鹰(AquilaChat-7B),智源研究院,API,56.71,56.73,62.07,58.57,30.51,70.71,56.26,62.98,56.82
17
+ 16,BLOOMZ-7B,BigScience,API,52.86,45.34,50.93,63.71,22.03,65.71,50.69,60.23,51.44