internlm
/

internlm2_5-7b-chat

@@ -46,14 +46,14 @@ InternLM2.5 has open-sourced a 7 billion parameter base model and a chat model t
 We conducted a comprehensive evaluation of InternLM using the open-source evaluation tool [OpenCompass](https://github.com/internLM/OpenCompass/). The evaluation covered five dimensions of capabilities: disciplinary competence, language competence, knowledge competence, inference competence, and comprehension competence. Here are some of the evaluation results, and you can visit the [OpenCompass leaderboard](https://rank.opencompass.org.cn) for more evaluation results.
-| Dataset\Models |Qwen2-7B-Instruct | Yi-1.5-9B-Chat | GLM-4-9B-Chat | Llama-3-8B-Instruct | Gemma2-9B-IT | InternLM2.5-7B-Chat | Llama-3-70B-Instruct
-| --- | --- | --- | --- | --- | --- | --- | --- |
-|MMLU | 70.8 | 71.0 | 71.4 | 68.4 | 70.9 | 72.8 | 80.5
-|CMMLU | 80.9 | 74.5 | 74.5 | 53.3 | 60.3 | 78.0 | 70.1
-|BBH |65 |69.6 |69.6 |65.4 |68.2 |71.6 |80.5
-|MATH | 48.6 | 51.1 | 51.1 | 27.9 | 46.9 | 60.7 | 47.1
-| GSM8K | 82.9 | 80.1 | 85.3 | 72.9 | 88.9 | 86.0 | 92.8
-|GPQA | 38.4 | 37.9 | 36.9 | 26.3 | 33.8 | 38.4 | 38.9
 - The evaluation results were obtained from [OpenCompass](https://github.com/internLM/OpenCompass/) (some data marked with *, which means come from the original papers), and evaluation configuration can be found in the configuration files provided by [OpenCompass](https://github.com/internLM/OpenCompass/).
@@ -197,14 +197,14 @@ InternLM2.5 ，即书生·浦语大模型第 2.5 代，开源了面向实用场
 我们使用开源评测工具 [OpenCompass](https://github.com/internLM/OpenCompass/) 从学科综合能力、语言能力、知识能力、推理能力、理解能力五大能力维度对InternLM开展全面评测，部分评测结果如下表所示，欢迎访问[ OpenCompass 榜单 ](https://rank.opencompass.org.cn)获取更多的评测结果。
-| 评测集\模型 |Qwen2-7B-Instruct | Yi-1.5-9B-Chat | GLM-4-9B-Chat | Llama-3-8B-Instruct | Gemma2-9B-IT | InternLM2.5-7B-Chat | Llama-3-70B-Instruct
-| --- | --- | --- | --- | --- | --- | --- | --- |
-|MMLU | 70.8 | 71.0 | 71.4 | 68.4 | 70.9 | 72.0 | 80.5
-|CMMLU | 80.9 | 74.5 | 74.5 | 53.3 | 60.3 | 78.0 | 70.1
-|BBH |65 |69.6 |69.6 |65.4 |68.2 |69.2 |80.5
-|MATH | 48.6 | 51.1 | 51.1 | 27.9 | 46.9 | 60.1 | 47.1
-| GSM8K | 82.9 | 80.1 | 85.3 | 72.9 | 88.9 | 86.0 | 86.6
-|GPQA | 38.4 | 37.9 | 36.9 | 26.3 | 33.8 | 38.4 | 38.9
 - 以上评测结果基于 [OpenCompass](https://github.com/internLM/OpenCompass/) 获得（部分数据标注`*`代表数据来自原始论文），具体测试细节可参见 [OpenCompass](https://github.com/internLM/OpenCompass/) 中提供的配置文件。
 - 评测数据会因 [OpenCompass](https://github.com/internLM/OpenCompass/) 的版本迭代而存在数值差异，请以 [OpenCompass](https://github.com/internLM/OpenCompass/) 最新版的评测结果为主。

 We conducted a comprehensive evaluation of InternLM using the open-source evaluation tool [OpenCompass](https://github.com/internLM/OpenCompass/). The evaluation covered five dimensions of capabilities: disciplinary competence, language competence, knowledge competence, inference competence, and comprehension competence. Here are some of the evaluation results, and you can visit the [OpenCompass leaderboard](https://rank.opencompass.org.cn) for more evaluation results.
+| Benchmark          | InternLM2.5-7B-Chat | Llama3-8B-Instruct | Gemma2-9B-IT | Yi-1.5-9B-Chat | GLM-4-9B-Chat | Qwen2-7B-Instruct |
+| ------------------ | ------------------- | ------------------ | ------------ | -------------- | ------------- | ----------------- |
+| MMLU (5-shot)      | **72.8**            | 68.4               | 70.9         | 71.0           | 71.4          | 70.8              |
+| CMMLU (5-shot)     | 78.0                | 53.3               | 60.3         | 74.5           | 74.5          | 80.9              |
+| BBH (3-shot CoT)   | **71.6**            | 54.4               | 68.2\*       | 69.6           | 69.6          | 65.0              |
+| MATH (0-shot CoT)  | **60.1**            | 27.9               | 46.9         | 51.1           | 51.1          | 48.6              |
+| GSM8K (0-shot CoT) | 86.0                | 72.9               | 88.9         | 80.1           | 85.3          | 82.9              |
+| GPQA (0-shot)      | **38.4**            | 26.1               | 33.8         | 37.9           | 36.9          | 38.4              |
 - The evaluation results were obtained from [OpenCompass](https://github.com/internLM/OpenCompass/) (some data marked with *, which means come from the original papers), and evaluation configuration can be found in the configuration files provided by [OpenCompass](https://github.com/internLM/OpenCompass/).
 我们使用开源评测工具 [OpenCompass](https://github.com/internLM/OpenCompass/) 从学科综合能力、语言能力、知识能力、推理能力、理解能力五大能力维度对InternLM开展全面评测，部分评测结果如下表所示，欢迎访问[ OpenCompass 榜单 ](https://rank.opencompass.org.cn)获取更多的评测结果。
+| 评测集\模型         | InternLM2.5-7B-Chat | Llama3-8B-Instruct | Gemma2-9B-IT | Yi-1.5-9B-Chat | GLM-4-9B-Chat | Qwen2-7B-Instruct |
+| ------------------ | ------------------- | ------------------ | ------------ | -------------- | ------------- | ----------------- |
+| MMLU (5-shot)      | **72.8**            | 68.4               | 70.9         | 71.0           | 71.4          | 70.8              |
+| CMMLU (5-shot)     | 78.0                | 53.3               | 60.3         | 74.5           | 74.5          | 80.9              |
+| BBH (3-shot CoT)   | **71.6**            | 54.4               | 68.2\*       | 69.6           | 69.6          | 65.0              |
+| MATH (0-shot CoT)  | **60.1**            | 27.9               | 46.9         | 51.1           | 51.1          | 48.6              |
+| GSM8K (0-shot CoT) | 86.0                | 72.9               | 88.9         | 80.1           | 85.3          | 82.9              |
+| GPQA (0-shot)      | **38.4**            | 26.1               | 33.8         | 37.9           | 36.9          | 38.4              |
 - 以上评测结果基于 [OpenCompass](https://github.com/internLM/OpenCompass/) 获得（部分数据标注`*`代表数据来自原始论文），具体测试细节可参见 [OpenCompass](https://github.com/internLM/OpenCompass/) 中提供的配置文件。
 - 评测数据会因 [OpenCompass](https://github.com/internLM/OpenCompass/) 的版本迭代而存在数值差异，请以 [OpenCompass](https://github.com/internLM/OpenCompass/) 最新版的评测结果为主。