internlm-chat-7b / README.md
x54-729
update opencompass url
4e2d2e1
|
raw
history blame
12.6 kB
metadata
pipeline_tag: text-generation

InternLM

Introduction

InternLM has open-sourced a 7 billion parameter base model and a chat model tailored for practical scenarios. The model has the following characteristics:

  • It leverages trillions of high-quality tokens for training to establish a powerful knowledge base.
  • It supports an 8k context window length, enabling longer input sequences and stronger reasoning capabilities.
  • It provides a versatile toolset for users to flexibly build their own workflows.

InternLM-7B

Performance Evaluation

We conducted a comprehensive evaluation of InternLM using the open-source evaluation tool OpenCompass. The evaluation covered five dimensions of capabilities: disciplinary competence, language competence, knowledge competence, inference competence, and comprehension competence. Here are some of the evaluation results, and you can visit the OpenCompass leaderboard for more evaluation results.

Datasets\Models InternLM-Chat-7B InternLM-7B LLaMA-7B Baichuan-7B ChatGLM2-6B Alpaca-7B Vicuna-7B
C-Eval(Val) 53.2 53.4 24.2 42.7 50.9 28.9 31.2
MMLU 50.8 51.0 35.2* 41.5 46.0 39.7 47.3
AGIEval 42.5 37.6 20.8 24.6 39.0 24.1 26.4
CommonSenseQA 75.2 59.5 65.0 58.8 60.0 68.7 66.7
BUSTM 74.3 50.6 48.5 51.3 55.0 48.8 62.5
CLUEWSC 78.6 59.1 50.3 52.8 59.8 50.3 52.2
MATH 6.4 7.1 2.8 3.0 6.6 2.2 2.8
GSM8K 34.5 31.2 10.1 9.7 29.2 6.0 15.3
HumanEval 14.0 10.4 14.0 9.2 9.2 9.2 11.0
RACE(High) 76.3 57.4 46.9* 28.1 66.3 40.7 54.0
  • The evaluation results were obtained from OpenCompass 20230706 (some data marked with *, which means come from the original papers), and evaluation configuration can be found in the configuration files provided by OpenCompass.
  • The evaluation data may have numerical differences due to the version iteration of OpenCompass, so please refer to the latest evaluation results of OpenCompass.

Limitations: Although we have made efforts to ensure the safety of the model during the training process and to encourage the model to generate text that complies with ethical and legal requirements, the model may still produce unexpected outputs due to its size and probabilistic generation paradigm. For example, the generated responses may contain biases, discrimination, or other harmful content. Please do not propagate such content. We are not responsible for any consequences resulting from the dissemination of harmful information.

Import from Transformers

To load the InternLM 7B Chat model using Transformers, use the following code:

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("internlm/internlm-chat-7b", trust_remote_code=True)
# Set `torch_dtype=torch.float16` to load model in float16, otherwise it will be loaded as float32 and cause OOM Error.
model = AutoModelForCausalLM.from_pretrained("internlm/internlm-chat-7b", torch_dtype=torch.float16, trust_remote_code=True).cuda()
model = model.eval()
response, history = model.chat(tokenizer, "hello", history=[])
print(response)
# Hello! How can I help you today?
response, history = model.chat(tokenizer, "please provide three suggestions about time management", history=history)
print(response)
# Sure, here are three tips for effective time management:
#
# 1. Prioritize tasks based on importance and urgency: Make a list of all your tasks and categorize them into "important and urgent," "important but not urgent," and "not important but urgent." Focus on completing the tasks in the first category before moving on to the others.
# 2. Use a calendar or planner: Write down deadlines and appointments in a calendar or planner so you don't forget them. This will also help you schedule your time more effectively and avoid overbooking yourself.
# 3. Minimize distractions: Try to eliminate any potential distractions when working on important tasks. Turn off notifications on your phone, close unnecessary tabs on your computer, and find a quiet place to work if possible.
# 
# Remember, good time management skills take practice and patience. Start with small steps and gradually incorporate these habits into your daily routine.

The responses can be streamed using stream_chat:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_path = "internlm/internlm-chat-7b"
model = AutoModelForCausalLM.from_pretrained(model_path, torch_dtype=torch.float16, trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)

model = model.eval()
length = 0
for response, history in model.stream_chat(tokenizer, "Hello", history=[]):
    print(response[length:], flush=True, end="")
    length = len(response)

Open Source License

The code is licensed under Apache-2.0, while model weights are fully open for academic research and also allow free commercial usage. To apply for a commercial license, please fill in the application form (English)/申请表(中文). For other questions or collaborations, please contact internlm@pjlab.org.cn.

简介

InternLM ,即书生·浦语大模型,包含面向实用场景的70亿参数基础模型与对话模型 (InternLM-7B)。模型具有以下特点:

  • 使用上万亿高质量预料,建立模型超强知识体系;
  • 支持8k语境窗口长度,实现更长输入与更强推理体验;
  • 通用工具调用能力,支持用户灵活自助搭建流程;

InternLM-7B

性能评测

我们使用开源评测工具 OpenCompass 从学科综合能力、语言能力、知识能力、推理能力、理解能力五大能力维度对InternLM开展全面评测,部分评测结果如下表所示,欢迎访问 OpenCompass 榜单 获取更多的评测结果。

数据集\模型 InternLM-Chat-7B InternLM-7B LLaMA-7B Baichuan-7B ChatGLM2-6B Alpaca-7B Vicuna-7B
C-Eval(Val) 53.2 53.4 24.2 42.7 50.9 28.9 31.2
MMLU 50.8 51.0 35.2* 41.5 46.0 39.7 47.3
AGIEval 42.5 37.6 20.8 24.6 39.0 24.1 26.4
CommonSenseQA 75.2 59.5 65.0 58.8 60.0 68.7 66.7
BUSTM 74.3 50.6 48.5 51.3 55.0 48.8 62.5
CLUEWSC 78.6 59.1 50.3 52.8 59.8 50.3 52.2
MATH 6.4 7.1 2.8 3.0 6.6 2.2 2.8
GSM8K 34.5 31.2 10.1 9.7 29.2 6.0 15.3
HumanEval 14.0 10.4 14.0 9.2 9.2 9.2 11.0
RACE(High) 76.3 57.4 46.9* 28.1 66.3 40.7 54.0
  • 以上评测结果基于 OpenCompass 20230706 获得(部分数据标注*代表数据来自原始论文),具体测试细节可参见 OpenCompass 中提供的配置文件。
  • 评测数据会因 OpenCompass 的版本迭代而存在数值差异,请以 OpenCompass 最新版的评测结果为主。

局限性: 尽管在训练过程中我们非常注重模型的安全性,尽力促使模型输出符合伦理和法律要求的文本,但受限于模型大小以及概率生成范式,模型可能会产生各种不符合预期的输出,例如回复内容包含偏见、歧视等有害内容,请勿传播这些内容。由于传播不良信息导致的任何后果,本项目不承担责任。

通过 Transformers 加载

通过以下的代码加载 InternLM 7B Chat 模型

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("internlm/internlm-chat-7b", trust_remote_code=True)
# `torch_dtype=torch.float16` 可以令模型以 float16 精度加载,否则 transformers 会将模型加载为 float32,导致显存不足
model = AutoModelForCausalLM.from_pretrained("internlm/internlm-chat-7b", torch_dtype=torch.float16, trust_remote_code=True).cuda()
model = model.eval()
response, history = model.chat(tokenizer, "你好", history=[])
print(response)
# 你好!有什么我可以帮助你的吗?
response, history = model.chat(tokenizer, "请提供三个管理时间的建议。", history=history)
print(response)
# 当然可以!以下是三个管理时间的建议:
# 1. 制定计划:制定一个详细的计划,包括每天要完成的任务和活动。这将有助于您更好地组织时间,并确保您能够按时完成任务。
# 2. 优先级:将任务按照优先级排序,先完成最重要的任务。这将确保您能够在最短的时间内完成最重要的任务,从而节省时间。
# 3. 集中注意力:避免分心,集中注意力完成任务。关闭社交媒体和电子邮件通知,专注于任务,这将帮助您更快地完成任务,并减少错误的可能性。

如果想进行流式生成,则可以使用 stream_chat 接口:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_path = "internlm/internlm-chat-7b"
model = AutoModelForCausalLM.from_pretrained(model_path, torch_dype=torch.float16, trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)

model = model.eval()
length = 0
for response, history in model.stream_chat(tokenizer, "你好", history=[]):
    print(response[length:], flush=True, end="")
    length = len(response)

开源许可证

本仓库的代码依照 Apache-2.0 协议开源。模型权重对学术研究完全开放,也可申请免费的商业使用授权(申请表)。其他问题与合作请联系 internlm@pjlab.org.cn