template-model

InternLM-7BInternLM-7BInternLM-7BInternLM-7BInternLM-7BInternLM-7BInternLM-7BInternLM-7BInternLM-7BInternLM-7BInternLM-7BInternLM-7B

Performance Evaluation

Introduction

Performance Evaluation

Introduction

Performance Evaluation

Introduction

Performance Evaluation

Introduction

Performance Evaluation

Introduction

Performance Evaluation

Introduction

InternLM has open-sourced a 7 billion parameter base model and a chat model tailored for practical scenarios. The model has the following characteristics:

It leverages trillions of high-quality tokens for training to establish a powerful knowledge base.
It supports an 8k context window length, enabling longer input sequences and stronger reasoning capabilities.
It provides a versatile toolset for users to flexibly build their own workflows.

沙发打水地方擦撒大声地

未收到卡我弄到

We conducted a comprehensive evaluation of InternLM using the open-source evaluation tool OpenCompass. The evaluation covered five dimensions of capabilities: disciplinary competence, language competence, knowledge competence, inference competence, and comprehension competence. Here are some of the evaluation results, and you can visit the OpenCompass leaderboard for more evaluation results.

Datasets\Models	InternLM-Chat-7B	InternLM-7B	LLaMA-7B	Baichuan-7B	ChatGLM2-6B	Alpaca-7B	Vicuna-7B
C-Eval(Val)	53.2	53.4	24.2	42.7	50.9	28.9	31.2
MMLU	50.8	51.0	35.2*	41.5	46.0	39.7	47.3
AGIEval	42.5	37.6	20.8	24.6	39.0	24.1	26.4
CommonSenseQA	75.2	59.5	65.0	58.8	60.0	68.7	66.7
BUSTM	74.3	50.6	48.5	51.3	55.0	48.8	62.5
CLUEWSC	78.6	59.1	50.3	52.8	59.8	50.3	52.2
MATH	6.4	7.1	2.8	3.0	6.6	2.2	2.8
GSM8K	34.5	31.2	10.1	9.7	29.2	6.0	15.3
HumanEval	14.0	10.4	14.0	9.2	9.2	9.2	11.0
RACE(High)	76.3	57.4	46.9*	28.1	66.3	40.7	54.0

The evaluation results were obtained from OpenCompass 20230706 (some data marked with *, which means come from the original papers), and evaluation configuration can be found in the configuration files provided by OpenCompass.
The evaluation data may have numerical differences due to the version iteration of OpenCompass, so please refer to the latest evaluation results of OpenCompass.

Limitations: Although we have made efforts to ensure the safety of the model during the training process and to encourage the model to generate text that complies with ethical and legal requirements, the model may still produce unexpected outputs due to its size and probabilistic generation paradigm. For example, the generated responses may contain biases, discrimination, or other harmful content. Please do not propagate such content. We are not responsible for any consequences resulting from the dissemination of harmful information.

Import from Transformers

To load the InternLM 7B Chat model using Transformers, use the following code:

from modelscope import snapshot_download, AutoTokenizer, AutoModelForCausalLM
import torch
model_dir = snapshot_download('Shanghai_AI_Laboratory/internlm-chat-7b-v1_1', revision='v1.0.0')
tokenizer = AutoTokenizer.from_pretrained(model_dir, device_map="auto", trust_remote_code=True,torch_dtype=torch.float16)
model = AutoModelForCausalLM.from_pretrained(model_dir,device_map="auto",  trust_remote_code=True,torch_dtype=torch.float16)
model = model.eval()
response, history = model.chat(tokenizer, "hello", history=[])
print(response)
response, history = model.chat(tokenizer, "please provide three suggestions about time management", history=history)

print(response)

Dialogue

You can interact with the InternLM Chat 7B model through a frontend interface by running the following code:

pip install streamlit==1.24.0
pip install transformers==4.30.2
streamlit run web_demo.py

The effect is as follows

Open Source License

The code is licensed under Apache-2.0, while model weights are fully open for academic research and also allow free commercial usage. To apply for a commercial license, please fill in the application form (English)/申请表（中文）. For other questions or collaborations, please contact internlm@pjlab.org.cn.

简介

InternLM ，即书生·浦语大模型，包含面向实用场景的70亿参数基础模型与对话模型（InternLM-7B）。模型具有以下特点：

使用上万亿高质量预料，建立模型超强知识体系；
支持8k语境窗口长度，实现更长输入与更强推理体验；
通用工具调用能力，支持用户灵活自助搭建流程；

InternLM-7B

性能评测

我们使用开源评测工具 OpenCompass 从学科综合能力、语言能力、知识能力、推理能力、理解能力五大能力维度对InternLM开展全面评测，部分评测结果如下表所示，欢迎访问 OpenCompass 榜单获取更多的评测结果。

数据集\模型	InternLM-Chat-7B	InternLM-7B	LLaMA-7B	Baichuan-7B	ChatGLM2-6B	Alpaca-7B	Vicuna-7B
C-Eval(Val)	53.2	53.4	24.2	42.7	50.9	28.9	31.2
MMLU	50.8	51.0	35.2*	41.5	46.0	39.7	47.3
AGIEval	42.5	37.6	20.8	24.6	39.0	24.1	26.4
CommonSenseQA	75.2	59.5	65.0	58.8	60.0	68.7	66.7
BUSTM	74.3	50.6	48.5	51.3	55.0	48.8	62.5
CLUEWSC	78.6	59.1	50.3	52.8	59.8	50.3	52.2
MATH	6.4	7.1	2.8	3.0	6.6	2.2	2.8
GSM8K	34.5	31.2	10.1	9.7	29.2	6.0	15.3
HumanEval	14.0	10.4	14.0	9.2	9.2	9.2	11.0
RACE(High)	76.3	57.4	46.9*	28.1	66.3	40.7	54.0

以上评测结果基于 OpenCompass 20230706 获得（部分数据标注*代表数据来自原始论文），具体测试细节可参见 OpenCompass 中提供的配置文件。
评测数据会因 OpenCompass 的版本迭代而存在数值差异，请以 OpenCompass 最新版的评测结果为主。

局限性： 尽管在训练过程中我们非常注重模型的安全性，尽力促使模型输出符合伦理和法律要求的文本，但受限于模型大小以及概率生成范式，模型可能会产生各种不符合预期的输出，例如回复内容包含偏见、歧视等有害内容，请勿传播这些内容。由于传播不良信息导致的任何后果，本项目不承担责任。

通过 Transformers 加载

通过以下的代码加载 InternLM 7B Chat 模型

from modelscope import snapshot_download, AutoTokenizer, AutoModelForCausalLM
import torch
model_dir = snapshot_download('Shanghai_AI_Laboratory/internlm-chat-7b-v1_1', revision='v1.0.0')
tokenizer = AutoTokenizer.from_pretrained(model_dir, device_map="auto", trust_remote_code=True,torch_dtype=torch.float16)
model = AutoModelForCausalLM.from_pretrained(model_dir,device_map="auto",  trust_remote_code=True,torch_dtype=torch.float16)
model = model.eval()
response, history = model.chat(tokenizer, "你好", history=[])
print(response)
response, history = model.chat(tokenizer, "请提供三个管理时间的建议。", history=history)
print(response)

通过前端网页对话

可以通过以下代码启动一个前端的界面来与 InternLM Chat 7B 模型进行交互

pip install streamlit==1.24.0
pip install transformers==4.30.2
streamlit run web_demo.py

通过前端网页对话

效果如下

通过前端网页对话

开源许可证

本仓库的代码依照 Apache-2.0 协议开源。模型权重对学术研究完全开放，也可申请免费的商业使用授权（申请表）。其他问题与合作请联系 internlm@pjlab.org.cn。