ZwwWayne commited on
Commit
9dc8536
1 Parent(s): 6a4dae2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +16 -16
README.md CHANGED
@@ -46,14 +46,14 @@ InternLM2.5 has open-sourced a 7 billion parameter base model and a chat model t
46
 
47
  We conducted a comprehensive evaluation of InternLM using the open-source evaluation tool [OpenCompass](https://github.com/internLM/OpenCompass/). The evaluation covered five dimensions of capabilities: disciplinary competence, language competence, knowledge competence, inference competence, and comprehension competence. Here are some of the evaluation results, and you can visit the [OpenCompass leaderboard](https://rank.opencompass.org.cn) for more evaluation results.
48
 
49
- | Dataset\Models |Qwen2-7B-Instruct | Yi-1.5-9B-Chat | GLM-4-9B-Chat | Llama-3-8B-Instruct | Gemma2-9B-IT | InternLM2.5-7B-Chat | Llama-3-70B-Instruct
50
- | --- | --- | --- | --- | --- | --- | --- | --- |
51
- |MMLU | 70.8 | 71.0 | 71.4 | 68.4 | 70.9 | 72.8 | 80.5
52
- |CMMLU | 80.9 | 74.5 | 74.5 | 53.3 | 60.3 | 78.0 | 70.1
53
- |BBH |65 |69.6 |69.6 |65.4 |68.2 |71.6 |80.5
54
- |MATH | 48.6 | 51.1 | 51.1 | 27.9 | 46.9 | 60.7 | 47.1
55
- | GSM8K | 82.9 | 80.1 | 85.3 | 72.9 | 88.9 | 86.0 | 92.8
56
- |GPQA | 38.4 | 37.9 | 36.9 | 26.3 | 33.8 | 38.4 | 38.9
57
 
58
 
59
  - The evaluation results were obtained from [OpenCompass](https://github.com/internLM/OpenCompass/) (some data marked with *, which means come from the original papers), and evaluation configuration can be found in the configuration files provided by [OpenCompass](https://github.com/internLM/OpenCompass/).
@@ -197,14 +197,14 @@ InternLM2.5 ,即书生·浦语大模型第 2.5 代,开源了面向实用场
197
 
198
  我们使用开源评测工具 [OpenCompass](https://github.com/internLM/OpenCompass/) 从学科综合能力、语言能力、知识能力、推理能力、理解能力五大能力维度对InternLM开展全面评测,部分评测结果如下表所示,欢迎访问[ OpenCompass 榜单 ](https://rank.opencompass.org.cn)获取更多的评测结果。
199
 
200
- | 评测集\模型 |Qwen2-7B-Instruct | Yi-1.5-9B-Chat | GLM-4-9B-Chat | Llama-3-8B-Instruct | Gemma2-9B-IT | InternLM2.5-7B-Chat | Llama-3-70B-Instruct
201
- | --- | --- | --- | --- | --- | --- | --- | --- |
202
- |MMLU | 70.8 | 71.0 | 71.4 | 68.4 | 70.9 | 72.0 | 80.5
203
- |CMMLU | 80.9 | 74.5 | 74.5 | 53.3 | 60.3 | 78.0 | 70.1
204
- |BBH |65 |69.6 |69.6 |65.4 |68.2 |69.2 |80.5
205
- |MATH | 48.6 | 51.1 | 51.1 | 27.9 | 46.9 | 60.1 | 47.1
206
- | GSM8K | 82.9 | 80.1 | 85.3 | 72.9 | 88.9 | 86.0 | 86.6
207
- |GPQA | 38.4 | 37.9 | 36.9 | 26.3 | 33.8 | 38.4 | 38.9
208
 
209
  - 以上评测结果基于 [OpenCompass](https://github.com/internLM/OpenCompass/) 获得(部分数据标注`*`代表数据来自原始论文),具体测试细节可参见 [OpenCompass](https://github.com/internLM/OpenCompass/) 中提供的配置文件。
210
  - 评测数据会因 [OpenCompass](https://github.com/internLM/OpenCompass/) 的版本迭代而存在数值差异,请以 [OpenCompass](https://github.com/internLM/OpenCompass/) 最新版的评测结果为主。
 
46
 
47
  We conducted a comprehensive evaluation of InternLM using the open-source evaluation tool [OpenCompass](https://github.com/internLM/OpenCompass/). The evaluation covered five dimensions of capabilities: disciplinary competence, language competence, knowledge competence, inference competence, and comprehension competence. Here are some of the evaluation results, and you can visit the [OpenCompass leaderboard](https://rank.opencompass.org.cn) for more evaluation results.
48
 
49
+ | Benchmark | InternLM2.5-7B-Chat | Llama3-8B-Instruct | Gemma2-9B-IT | Yi-1.5-9B-Chat | GLM-4-9B-Chat | Qwen2-7B-Instruct |
50
+ | ------------------ | ------------------- | ------------------ | ------------ | -------------- | ------------- | ----------------- |
51
+ | MMLU (5-shot) | **72.8** | 68.4 | 70.9 | 71.0 | 71.4 | 70.8 |
52
+ | CMMLU (5-shot) | 78.0 | 53.3 | 60.3 | 74.5 | 74.5 | 80.9 |
53
+ | BBH (3-shot CoT) | **71.6** | 54.4 | 68.2\* | 69.6 | 69.6 | 65.0 |
54
+ | MATH (0-shot CoT) | **60.1** | 27.9 | 46.9 | 51.1 | 51.1 | 48.6 |
55
+ | GSM8K (0-shot CoT) | 86.0 | 72.9 | 88.9 | 80.1 | 85.3 | 82.9 |
56
+ | GPQA (0-shot) | **38.4** | 26.1 | 33.8 | 37.9 | 36.9 | 38.4 |
57
 
58
 
59
  - The evaluation results were obtained from [OpenCompass](https://github.com/internLM/OpenCompass/) (some data marked with *, which means come from the original papers), and evaluation configuration can be found in the configuration files provided by [OpenCompass](https://github.com/internLM/OpenCompass/).
 
197
 
198
  我们使用开源评测工具 [OpenCompass](https://github.com/internLM/OpenCompass/) 从学科综合能力、语言能力、知识能力、推理能力、理解能力五大能力维度对InternLM开展全面评测,部分评测结果如下表所示,欢迎访问[ OpenCompass 榜单 ](https://rank.opencompass.org.cn)获取更多的评测结果。
199
 
200
+ | 评测集\模型 | InternLM2.5-7B-Chat | Llama3-8B-Instruct | Gemma2-9B-IT | Yi-1.5-9B-Chat | GLM-4-9B-Chat | Qwen2-7B-Instruct |
201
+ | ------------------ | ------------------- | ------------------ | ------------ | -------------- | ------------- | ----------------- |
202
+ | MMLU (5-shot) | **72.8** | 68.4 | 70.9 | 71.0 | 71.4 | 70.8 |
203
+ | CMMLU (5-shot) | 78.0 | 53.3 | 60.3 | 74.5 | 74.5 | 80.9 |
204
+ | BBH (3-shot CoT) | **71.6** | 54.4 | 68.2\* | 69.6 | 69.6 | 65.0 |
205
+ | MATH (0-shot CoT) | **60.1** | 27.9 | 46.9 | 51.1 | 51.1 | 48.6 |
206
+ | GSM8K (0-shot CoT) | 86.0 | 72.9 | 88.9 | 80.1 | 85.3 | 82.9 |
207
+ | GPQA (0-shot) | **38.4** | 26.1 | 33.8 | 37.9 | 36.9 | 38.4 |
208
 
209
  - 以上评测结果基于 [OpenCompass](https://github.com/internLM/OpenCompass/) 获得(部分数据标注`*`代表数据来自原始论文),具体测试细节可参见 [OpenCompass](https://github.com/internLM/OpenCompass/) 中提供的配置文件。
210
  - 评测数据会因 [OpenCompass](https://github.com/internLM/OpenCompass/) 的版本迭代而存在数值差异,请以 [OpenCompass](https://github.com/internLM/OpenCompass/) 最新版的评测结果为主。