x54-729 commited on
Commit
ae3bb61
1 Parent(s): 8eec32b

update README

Browse files
Files changed (1) hide show
  1. README.md +27 -31
README.md CHANGED
@@ -20,18 +20,17 @@ license: other
20
 
21
  [![evaluation](https://github.com/InternLM/InternLM/assets/22529082/f80a2a58-5ddf-471a-8da4-32ab65c8fd3b)](https://github.com/internLM/OpenCompass/)
22
 
23
- [💻Github Repo](https://github.com/InternLM/InternLM) • [🤔Reporting Issues](https://github.com/InternLM/InternLM/issues/new)
24
 
25
  </div>
26
 
27
 
28
  ## Introduction
29
- The second generation of the InternLM model, InternLM2, includes models at two scales: 7B and 20B. For the convenience of users and researchers, we have open-sourced four versions of each scale of the model, which are:
30
 
31
- - internlm2-base: A high-quality and highly adaptable model base, serving as an excellent starting point for deep domain adaptation.
32
- - internlm2 (**recommended**): Built upon the internlm2-base, this version has further pretrained on domain-specific corpus. It shows outstanding performance in evaluations while maintaining robust general language abilities, making it our recommended choice for most applications.
33
- - internlm2-chat-sft: Based on the Base model, it undergoes supervised human alignment training.
34
- - internlm2-chat (**recommended**): Optimized for conversational interaction on top of the internlm2-chat-sft through RLHF, it excels in instruction adherence, empathetic chatting, and tool invocation.
35
 
36
  The base model of InternLM2 has the following technical features:
37
 
@@ -45,15 +44,15 @@ The base model of InternLM2 has the following technical features:
45
 
46
  We have evaluated InternLM2 on several important benchmarks using the open-source evaluation tool [OpenCompass](https://github.com/open-compass/opencompass). Some of the evaluation results are shown in the table below. You are welcome to visit the [OpenCompass Leaderboard](https://opencompass.org.cn/rank) for more evaluation results.
47
 
48
- | Dataset\Models | InternLM2-7B | InternLM2-Chat-7B | InternLM2-20B | InternLM2-Chat-20B | ChatGPT | GPT-4 |
49
- | --- | --- | --- | --- | --- | --- | --- |
50
- | MMLU | 65.8 | 63.7 | 67.7 | 66.5 | 69.1 | 83.0 |
51
- | AGIEval | 49.9 | 47.2 | 53.0 | 50.3 | 39.9 | 55.1 |
52
- | BBH | 65.0 | 61.2 | 72.1 | 68.3 | 70.1 | 86.7 |
53
- | GSM8K | 70.8 | 70.7 | 76.1 | 79.6 | 78.2 | 91.4 |
54
- | MATH | 20.2 | 23.0 | 25.5 | 31.9 | 28.0 | 45.8 |
55
- | HumanEval | 43.3 | 59.8 | 48.8 | 67.1 | 73.2 | 74.4 |
56
- | MBPP(Sanitized) | 51.8 | 51.4 | 63.0 | 65.8 | 78.9 | 79.0 |
57
 
58
 
59
  - The evaluation results were obtained from [OpenCompass](https://github.com/open-compass/opencompass) , and evaluation configuration can be found in the configuration files provided by [OpenCompass](https://github.com/open-compass/opencompass).
@@ -92,34 +91,31 @@ print(output)
92
  The code is licensed under Apache-2.0, while model weights are fully open for academic research and also allow **free** commercial usage. To apply for a commercial license, please fill in the [application form (English)](https://wj.qq.com/s2/12727483/5dba/)/[申请表(中文)](https://wj.qq.com/s2/12725412/f7c1/). For other questions or collaborations, please contact <internlm@pjlab.org.cn>.
93
 
94
  ## 简介
95
- 第二代浦语模型, InternLM2 包含 7B 和 20B 两个量级的模型。为了方便用户使用和研究,每个量级的模型我们总共开源了四个版本的模型,他们分别是
96
 
97
- - internlm2-base: 高质量和具有很强可塑性的模型基座,是模型进行深度领域适配的高质量起点;
98
- - internlm2(**推荐**): 在internlm2-base基础上,进一步在特定领域的语料上进行预训练,在评测中成绩优异,同时保持了很好的通用语言能力,是我们推荐的在大部分应用中考虑选用的优秀基座;
99
- - internlm2-chat-sft:在Base基础上,进行有监督的人类对齐训练;
100
- - internlm2-chat(**推荐**):在internlm2-chat-sft基础上,经过RLHF,面向对话交互进行了优化,具有很好的指令遵循、共情聊天和调用工具等的能力。
101
 
102
  InternLM2 的基础模型具备以下的技术特点
103
 
104
  - 有效支持20万字超长上下文:模型在20万字长输入中几乎完美地实现长文“大海捞针”,而且在 LongBench 和 L-Eval 等长文任务中的表现也达到开源模型中的领先水平。
105
  - 综合性能全面提升:各能力维度相比上一代模型全面进步,在推理、数学、代码等方面的能力提升显著。
106
 
107
-
108
  ## InternLM2-1.8B
109
 
110
  ### 性能评测
111
 
112
  我们使用开源评测工具 [OpenCompass](https://github.com/internLM/OpenCompass/) 对 InternLM2 在几个重要的评测集进行了评测 ,部分评测结果如下表所示,欢迎访问[ OpenCompass 榜单 ](https://opencompass.org.cn/rank)获取更多的评测结果。
113
 
114
- | 评测集 | InternLM2-7B | InternLM2-Chat-7B | InternLM2-20B | InternLM2-Chat-20B | ChatGPT | GPT-4 |
115
- | --- | --- | --- | --- | --- | --- | --- |
116
- | MMLU | 65.8 | 63.7 | 67.7 | 66.5 | 69.1 | 83.0 |
117
- | AGIEval | 49.9 | 47.2 | 53.0 | 50.3 | 39.9 | 55.1 |
118
- | BBH | 65.0 | 61.2 | 72.1 | 68.3 | 70.1 | 86.7 |
119
- | GSM8K | 70.8 | 70.7 | 76.1 | 79.6 | 78.2 | 91.4 |
120
- | MATH | 20.2 | 23.0 | 25.5 | 31.9 | 28.0 | 45.8 |
121
- | HumanEval | 43.3 | 59.8 | 48.8 | 67.1 | 73.2 | 74.4 |
122
- | MBPP(Sanitized) | 51.8 | 51.4 | 63.0 | 65.8 | 78.9 | 79.0 |
123
 
124
  - 以上评测结果基于 [OpenCompass](https://github.com/open-compass/opencompass) 获得(部分数据标注`*`代表数据来自原始论文),具体测试细节可参见 [OpenCompass](https://github.com/open-compass/opencompass) 中提供的配置文件。
125
  - 评测数据会因 [OpenCompass](https://github.com/open-compass/opencompass) 的版本迭代而存在数值差异,请以 [OpenCompass](https://github.com/open-compass/opencompass) 最新版的评测结果为主。
@@ -149,4 +145,4 @@ print(output)
149
 
150
  ## 开源许可证
151
 
152
- 本仓库的代码依照 Apache-2.0 协议开源。模型权重对学术研究完全开放,也可申请免费的商业使用授权([申请表](https://wj.qq.com/s2/12725412/f7c1/))。其他问题与合作请联系 <internlm@pjlab.org.cn>。
 
20
 
21
  [![evaluation](https://github.com/InternLM/InternLM/assets/22529082/f80a2a58-5ddf-471a-8da4-32ab65c8fd3b)](https://github.com/internLM/OpenCompass/)
22
 
23
+ [[表情]Github Repo](https://github.com/InternLM/InternLM) • [[表情]Reporting Issues](https://github.com/InternLM/InternLM/issues/new)
24
 
25
  </div>
26
 
27
 
28
  ## Introduction
29
+ InternLM2-1.8B is the 1.8 billion parameter version of the second generation InternLM series. In order to facilitate user use and research, InternLM2-1.8B has two versions of open-source models. They are:
30
 
31
+
32
+ - internlm2: Built upon the internlm2-base, this version has further pretrained on domain-specific corpus. It shows outstanding performance in evaluations while maintaining robust general language abilities, making it our recommended choice for most applications.
33
+ - internlm2-chat: Optimized for conversational interaction on top of the internlm2-chat-sft through RLHF, it excels in instruction adherence, empathetic chatting, and tool invocation.
 
34
 
35
  The base model of InternLM2 has the following technical features:
36
 
 
44
 
45
  We have evaluated InternLM2 on several important benchmarks using the open-source evaluation tool [OpenCompass](https://github.com/open-compass/opencompass). Some of the evaluation results are shown in the table below. You are welcome to visit the [OpenCompass Leaderboard](https://opencompass.org.cn/rank) for more evaluation results.
46
 
47
+ | Dataset\Models | InternLM2-1.8B | InternLM2-Chat-1.8B | InternLM2-7B | InternLM2-Chat-7B |
48
+ | --- | --- | --- | --- | --- |
49
+ | MMLU | 46.9 | 47.1 | 65.8 | 63.7 |
50
+ | AGIEval | 33.4 | 38.8 | 49.9 | 47.2 |
51
+ | BBH | 37.5 | 35.2 | 65.0 | 61.2 |
52
+ | GSM8K | 31.2 | 39.7 | 70.8 | 70.7 |
53
+ | MATH | 5.6 | 11.8 | 20.2 | 23.0 |
54
+ | HumanEval | 25.0 | 32.9 | 43.3 | 59.8 |
55
+ | MBPP(Sanitized) | 22.2 | 23.2 | 51.8 | 51.4 |
56
 
57
 
58
  - The evaluation results were obtained from [OpenCompass](https://github.com/open-compass/opencompass) , and evaluation configuration can be found in the configuration files provided by [OpenCompass](https://github.com/open-compass/opencompass).
 
91
  The code is licensed under Apache-2.0, while model weights are fully open for academic research and also allow **free** commercial usage. To apply for a commercial license, please fill in the [application form (English)](https://wj.qq.com/s2/12727483/5dba/)/[申请表(中文)](https://wj.qq.com/s2/12725412/f7c1/). For other questions or collaborations, please contact <internlm@pjlab.org.cn>.
92
 
93
  ## 简介
94
+ 书生·浦语-1.8B (InternLM2-1.8B) 是第二代浦语模型系列的18亿参数版本。为了方便用户使用和研究,书生·浦语-1.8B (InternLM2-1.8B) 共有两个版本的开源模型,他们分别是:
95
 
96
+ - internlm2: 在internlm2-base基础上,进一步在特定领域的语料上进行预训练,在评测中成绩优异,同时保持了很好的通用语言能力,是我们推荐的在大部分应用中考虑选用的优秀基座;
97
+ - internlm2-chat:在internlm2-chat-sft基础上,经过RLHF,面向对话交互进行了优化,具有很好的指令遵循、共情聊天和调用工具等的能力。
 
 
98
 
99
  InternLM2 的基础模型具备以下的技术特点
100
 
101
  - 有效支持20万字超长上下文:模型在20万字长输入中几乎完美地实现长文“大海捞针”,而且在 LongBench 和 L-Eval 等长文任务中的表现也达到开源模型中的领先水平。
102
  - 综合性能全面提升:各能力维度相比上一代模型全面进步,在推理、数学、代码等方面的能力提升显著。
103
 
 
104
  ## InternLM2-1.8B
105
 
106
  ### 性能评测
107
 
108
  我们使用开源评测工具 [OpenCompass](https://github.com/internLM/OpenCompass/) 对 InternLM2 在几个重要的评测集进行了评测 ,部分评测结果如下表所示,欢迎访问[ OpenCompass 榜单 ](https://opencompass.org.cn/rank)获取更多的评测结果。
109
 
110
+ | 评测集 | InternLM2-1.8B | InternLM2-Chat-1.8B | InternLM2-7B | InternLM2-Chat-7B |
111
+ | --- | --- | --- | --- | --- |
112
+ | MMLU | 46.9 | 47.1 | 65.8 | 63.7 |
113
+ | AGIEval | 33.4 | 38.8 | 49.9 | 47.2 |
114
+ | BBH | 37.5 | 35.2 | 65.0 | 61.2 |
115
+ | GSM8K | 31.2 | 39.7 | 70.8 | 70.7 |
116
+ | MATH | 5.6 | 11.8 | 20.2 | 23.0 |
117
+ | HumanEval | 25.0 | 32.9 | 43.3 | 59.8 |
118
+ | MBPP(Sanitized) | 22.2 | 23.2 | 51.8 | 51.4 |
119
 
120
  - 以上评测结果基于 [OpenCompass](https://github.com/open-compass/opencompass) 获得(部分数据标注`*`代表数据来自原始论文),具体测试细节可参见 [OpenCompass](https://github.com/open-compass/opencompass) 中提供的配置文件。
121
  - 评测数据会因 [OpenCompass](https://github.com/open-compass/opencompass) 的版本迭代而存在数值差异,请以 [OpenCompass](https://github.com/open-compass/opencompass) 最新版的评测结果为主。
 
145
 
146
  ## 开源许可证
147
 
148
+ 本仓库的代码依照 Apache-2.0 协议开源。模型权重对学术研究完全开放,也可申请免费的商业使用授权([申请表](https://wj.qq.com/s2/12725412/f7c1/))。其他问题与合作请联系 <internlm@pjlab.org.cn>。