Converted using Chales Goddards script
Browse files- README.md +169 -0
- config.json +25 -0
- model-00001-of-00004.safetensors +3 -0
- model-00002-of-00004.safetensors +3 -0
- model-00003-of-00004.safetensors +3 -0
- model-00004-of-00004.safetensors +3 -0
- model.safetensors.index.json +1 -0
- original_repo_url.txt +1 -0
- special_tokens_map.json +30 -0
- tokenizer.model +3 -0
- tokenizer_config.json +48 -0
README.md
ADDED
@@ -0,0 +1,169 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
pipeline_tag: text-generation
|
3 |
+
---
|
4 |
+
# InternLM
|
5 |
+
|
6 |
+
<div align="center">
|
7 |
+
|
8 |
+
<img src="https://github.com/InternLM/InternLM/assets/22529082/b9788105-8892-4398-8b47-b513a292378e" width="200"/>
|
9 |
+
<div> </div>
|
10 |
+
<div align="center">
|
11 |
+
<b><font size="5">InternLM</font></b>
|
12 |
+
<sup>
|
13 |
+
<a href="https://internlm.intern-ai.org.cn/">
|
14 |
+
<i><font size="4">HOT</font></i>
|
15 |
+
</a>
|
16 |
+
</sup>
|
17 |
+
<div> </div>
|
18 |
+
</div>
|
19 |
+
|
20 |
+
[](https://github.com/internLM/OpenCompass/)
|
21 |
+
|
22 |
+
[💻Github Repo](https://github.com/InternLM/InternLM) • [🤔Reporting Issues](https://github.com/InternLM/InternLM/issues/new)
|
23 |
+
|
24 |
+
</div>
|
25 |
+
|
26 |
+
|
27 |
+
## Introduction
|
28 |
+
|
29 |
+
InternLM2 has open-sourced a 20 billion parameter base model and a chat model tailored for practical scenarios. The model has the following characteristics:
|
30 |
+
|
31 |
+
- **200K Context window**: Nearly perfect at finding needles in the haystack with 200K-long context, with leading performance on long-context tasks like LongBench and L-Eval. Try it with [LMDeploy](https://github.com/InternLM/lmdeploy) for 200K-context inference.
|
32 |
+
|
33 |
+
- **Outstanding comprehensive performance**: Significantly better than the last generation in all dimensions, especially in reasoning, math, code, chat experience, instruction following, and creative writing, with leading performance among open-source models in similar sizes. In some evaluations, InternLM2-Chat-20B may match or even surpass ChatGPT (GPT-3.5).
|
34 |
+
|
35 |
+
- **Code interpreter & Data analysis**: With code interpreter, InternLM2-Chat-20B obtains compatible performance with GPT-4 on GSM8K and MATH. InternLM2-Chat also provides data analysis capability.
|
36 |
+
|
37 |
+
- **Stronger tool use**: Based on better tool utilization-related capabilities in instruction following, tool selection and reflection, InternLM2 can support more kinds of agents and multi-step tool calling for complex tasks. See [examples](https://github.com/InternLM/lagent).
|
38 |
+
|
39 |
+
|
40 |
+
## InternLM2-Chat-20B
|
41 |
+
|
42 |
+
### Performance Evaluation
|
43 |
+
|
44 |
+
We conducted a comprehensive evaluation of InternLM using the open-source evaluation tool [OpenCompass](https://github.com/internLM/OpenCompass/). The evaluation covered five dimensions of capabilities: disciplinary competence, language competence, knowledge competence, inference competence, and comprehension competence. Here are some of the evaluation results, and you can visit the [OpenCompass leaderboard](https://opencompass.org.cn/rank) for more evaluation results.
|
45 |
+
|
46 |
+
| Dataset\Models | InternLM2-7B | InternLM2-Chat-7B | InternLM2-20B | InternLM2-Chat-20B | ChatGPT | GPT-4 |
|
47 |
+
| --- | --- | --- | --- | --- | --- | --- |
|
48 |
+
| MMLU | 65.8 | 63.7 | 67.7 | 66.5 | 69.1 | 83.0 |
|
49 |
+
| AGIEval | 49.9 | 47.2 | 53.0 | 50.3 | 39.9 | 55.1 |
|
50 |
+
| BBH | 65.0 | 61.2 | 72.1 | 68.3 | 70.1 | 86.7 |
|
51 |
+
| GSM8K | 70.8 | 70.7 | 76.1 | 79.6 | 78.2 | 91.4 |
|
52 |
+
| MATH | 20.2 | 23.0 | 25.5 | 31.9 | 28.0 | 45.8 |
|
53 |
+
| HumanEval | 43.3 | 59.8 | 48.8 | 67.1 | 73.2 | 74.4 |
|
54 |
+
| MBPP(Sanitized) | 51.8 | 51.4 | 63.0 | 65.8 | 78.9 | 79.0 |
|
55 |
+
|
56 |
+
- The evaluation results were obtained from [OpenCompass](https://github.com/internLM/OpenCompass/) (some data marked with *, which means come from the original papers), and evaluation configuration can be found in the configuration files provided by [OpenCompass](https://github.com/internLM/OpenCompass/).
|
57 |
+
- The evaluation data may have numerical differences due to the version iteration of [OpenCompass](https://github.com/internLM/OpenCompass/), so please refer to the latest evaluation results of [OpenCompass](https://github.com/internLM/OpenCompass/).
|
58 |
+
|
59 |
+
|
60 |
+
**Limitations:** Although we have made efforts to ensure the safety of the model during the training process and to encourage the model to generate text that complies with ethical and legal requirements, the model may still produce unexpected outputs due to its size and probabilistic generation paradigm. For example, the generated responses may contain biases, discrimination, or other harmful content. Please do not propagate such content. We are not responsible for any consequences resulting from the dissemination of harmful information.
|
61 |
+
|
62 |
+
### Import from Transformers
|
63 |
+
|
64 |
+
To load the InternLM 20B Chat model using Transformers, use the following code:
|
65 |
+
|
66 |
+
```python
|
67 |
+
import torch
|
68 |
+
from transformers import AutoTokenizer, AutoModelForCausalLM
|
69 |
+
tokenizer = AutoTokenizer.from_pretrained("internlm/internlm2-chat-20b", trust_remote_code=True)
|
70 |
+
# Set `torch_dtype=torch.float16` to load model in float16, otherwise it will be loaded as float32 and cause OOM Error.
|
71 |
+
model = AutoModelForCausalLM.from_pretrained("internlm/internlm2-chat-20b", torch_dtype=torch.float16, trust_remote_code=True).cuda()
|
72 |
+
model = model.eval()
|
73 |
+
response, history = model.chat(tokenizer, "hello", history=[])
|
74 |
+
print(response)
|
75 |
+
# Hello! How can I help you today?
|
76 |
+
response, history = model.chat(tokenizer, "please provide three suggestions about time management", history=history)
|
77 |
+
print(response)
|
78 |
+
```
|
79 |
+
|
80 |
+
The responses can be streamed using `stream_chat`:
|
81 |
+
|
82 |
+
```python
|
83 |
+
import torch
|
84 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
85 |
+
|
86 |
+
model_path = "internlm/internlm2-chat-20b"
|
87 |
+
model = AutoModelForCausalLM.from_pretrained(model_path, torch_dtype=torch.float16, trust_remote_code=True).cuda()
|
88 |
+
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
|
89 |
+
|
90 |
+
model = model.eval()
|
91 |
+
length = 0
|
92 |
+
for response, history in model.stream_chat(tokenizer, "Hello", history=[]):
|
93 |
+
print(response[length:], flush=True, end="")
|
94 |
+
length = len(response)
|
95 |
+
```
|
96 |
+
|
97 |
+
## Open Source License
|
98 |
+
|
99 |
+
The code is licensed under Apache-2.0, while model weights are fully open for academic research and also allow **free** commercial usage. To apply for a commercial license, please fill in the [application form (English)](https://wj.qq.com/s2/12727483/5dba/)/[申请表(中文)](https://wj.qq.com/s2/12725412/f7c1/). For other questions or collaborations, please contact <internlm@pjlab.org.cn>.
|
100 |
+
|
101 |
+
## 简介
|
102 |
+
|
103 |
+
InternLM2 ,即书生·浦语大模型第二代,开源了面向实用场景的200亿参数基础模型与对话模型 (InternLM2-Chat-20B)。模型具有以下特点:
|
104 |
+
|
105 |
+
- 有效支持20万字超长上下文:模型在20万字长输入中几乎完美地实现长文“大海捞针”,而且在 LongBench 和 L-Eval 等长文任务中的表现也达到开源模型中的领先水平。 可以通过 [LMDeploy](https://github.com/InternLM/lmdeploy) 尝试20万字超长上下文推理。
|
106 |
+
- 综合性能全面提升:各能力维度相比上一代模型全面进步,在推理、数学、代码、对话体验、指令遵循和创意写作等方面的能力提升尤为显著,综合性能达到同量级开源模型的领先水平,在重点能力评测上 InternLM2-Chat-20B 能比肩甚至超越 ChatGPT (GPT-3.5)。
|
107 |
+
- 代码解释器与数据分析:在配合代码解释器(code-interpreter)的条件下,InternLM2-Chat-20B 在 GSM8K 和 MATH 上可以达到和 GPT-4 相仿的水平。基于在数理和工具方面强大的基础能力,InternLM2-Chat 提供了实用的数据分析能力。
|
108 |
+
- 工具调用能力整体升级:基于更强和更具有泛化性的指令理解、工具筛选与结果反思等能力,新版模型可以更可靠地支持复杂智能体的搭建,支持对工具进行有效的多轮调用,完成较复杂的任务。可以查看更多[样例](https://github.com/InternLM/lagent)。
|
109 |
+
|
110 |
+
|
111 |
+
## InternLM2-Chat-20B
|
112 |
+
|
113 |
+
### 性能评测
|
114 |
+
|
115 |
+
我们使用开源评测工具 [OpenCompass](https://github.com/internLM/OpenCompass/) 从学科综合能力、语言能力、知识能力、推理能力、理解能力五大能力维度对InternLM开展全面评测,部分评测结果如下表所示,欢迎访问[ OpenCompass 榜单 ](https://opencompass.org.cn/rank)获取更多的评测结果。
|
116 |
+
|
117 |
+
| 评测集 | InternLM2-7B | InternLM2-Chat-7B | InternLM2-20B | InternLM2-Chat-20B | ChatGPT | GPT-4 |
|
118 |
+
| --- | --- | --- | --- | --- | --- | --- |
|
119 |
+
| MMLU | 65.8 | 63.7 | 67.7 | 66.5 | 69.1 | 83.0 |
|
120 |
+
| AGIEval | 49.9 | 47.2 | 53.0 | 50.3 | 39.9 | 55.1 |
|
121 |
+
| BBH | 65.0 | 61.2 | 72.1 | 68.3 | 70.1 | 86.7 |
|
122 |
+
| GSM8K | 70.8 | 70.7 | 76.1 | 79.6 | 78.2 | 91.4 |
|
123 |
+
| MATH | 20.2 | 23.0 | 25.5 | 31.9 | 28.0 | 45.8 |
|
124 |
+
| HumanEval | 43.3 | 59.8 | 48.8 | 67.1 | 73.2 | 74.4 |
|
125 |
+
| MBPP(Sanitized) | 51.8 | 51.4 | 63.0 | 65.8 | 78.9 | 79.0 |
|
126 |
+
|
127 |
+
- 以上评测结果基于 [OpenCompass](https://github.com/internLM/OpenCompass/) 获得(部分数据标注`*`代表数据来自原始论文),具体测试细节可参见 [OpenCompass](https://github.com/internLM/OpenCompass/) 中提供的配置文件。
|
128 |
+
- 评测数据会因 [OpenCompass](https://github.com/internLM/OpenCompass/) 的版本迭代而存在数值差异,请以 [OpenCompass](https://github.com/internLM/OpenCompass/) 最新版的评测结果为主。
|
129 |
+
|
130 |
+
**局限性:** 尽管在训练过程中我们非常注重模型的安全性,尽力促使模型输出符合伦理和法律要求的文本,但受限于模型大小以及概率生成范式,模型可能会产生各种不符合预期的输出,例如回复内容包含偏见、歧视等有害内容,请勿传播这些内容。由于传播不良信息导致的任何后果,本项目不承担责任。
|
131 |
+
|
132 |
+
### 通过 Transformers 加载
|
133 |
+
|
134 |
+
通过以下的代码加载 InternLM2 20B Chat 模型
|
135 |
+
|
136 |
+
```python
|
137 |
+
import torch
|
138 |
+
from transformers import AutoTokenizer, AutoModelForCausalLM
|
139 |
+
tokenizer = AutoTokenizer.from_pretrained("internlm/internlm2-chat-20b", trust_remote_code=True)
|
140 |
+
# `torch_dtype=torch.float16` 可以令模型以 float16 精度加载,否则 transformers 会将模型加载为 float32,导致显存不足
|
141 |
+
model = AutoModelForCausalLM.from_pretrained("internlm/internlm2-chat-20b", torch_dtype=torch.float16, trust_remote_code=True).cuda()
|
142 |
+
model = model.eval()
|
143 |
+
response, history = model.chat(tokenizer, "你好", history=[])
|
144 |
+
print(response)
|
145 |
+
# 你好!有什么我可以帮助你的吗?
|
146 |
+
response, history = model.chat(tokenizer, "请提供三个管理时间的建议。", history=history)
|
147 |
+
print(response)
|
148 |
+
```
|
149 |
+
|
150 |
+
如果想进行��式生成,则可以使用 `stream_chat` 接口:
|
151 |
+
|
152 |
+
```python
|
153 |
+
import torch
|
154 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
155 |
+
|
156 |
+
model_path = "internlm/internlm2-chat-20b"
|
157 |
+
model = AutoModelForCausalLM.from_pretrained(model_path, torch_dype=torch.float16, trust_remote_code=True).cuda()
|
158 |
+
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
|
159 |
+
|
160 |
+
model = model.eval()
|
161 |
+
length = 0
|
162 |
+
for response, history in model.stream_chat(tokenizer, "你好", history=[]):
|
163 |
+
print(response[length:], flush=True, end="")
|
164 |
+
length = len(response)
|
165 |
+
```
|
166 |
+
|
167 |
+
## 开源许可证
|
168 |
+
|
169 |
+
本仓库的代码依照 Apache-2.0 协议开源。模型权重对学术研究完全开放,也可申请免费的商业使用授权([申请表](https://wj.qq.com/s2/12725412/f7c1/))。其他问题与合作请联系 <internlm@pjlab.org.cn>。
|
config.json
ADDED
@@ -0,0 +1,25 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"_name_or_path": "/models/internlm2-chat-20b",
|
3 |
+
"architectures": "LlamaForCausalLM",
|
4 |
+
"attn_implementation": "eager",
|
5 |
+
"bias": false,
|
6 |
+
"bos_token_id": 1,
|
7 |
+
"eos_token_id": 2,
|
8 |
+
"hidden_act": "silu",
|
9 |
+
"hidden_size": 6144,
|
10 |
+
"initializer_range": 0.02,
|
11 |
+
"intermediate_size": 16384,
|
12 |
+
"max_position_embeddings": 32768,
|
13 |
+
"model_type": "llama",
|
14 |
+
"num_attention_heads": 48,
|
15 |
+
"num_hidden_layers": 48,
|
16 |
+
"num_key_value_heads": 8,
|
17 |
+
"pad_token_id": 2,
|
18 |
+
"rms_norm_eps": 1e-05,
|
19 |
+
"rope_theta": 1000000,
|
20 |
+
"tie_word_embeddings": false,
|
21 |
+
"torch_dtype": "float16",
|
22 |
+
"transformers_version": "4.36.2",
|
23 |
+
"use_cache": true,
|
24 |
+
"vocab_size": 92544
|
25 |
+
}
|
model-00001-of-00004.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:1242a1569d2ece134d1d6a83aa67867e01dbcf6404ae8ce75423ef5562c7b14b
|
3 |
+
size 9940820976
|
model-00002-of-00004.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:17a2def607ce03892bd9908aa9f30665bfcf5a82a45fe2c9080778b678e36a20
|
3 |
+
size 9940833400
|
model-00003-of-00004.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:14b651329f4c697457de9352f348771477602a3b4df40900603d977cdbecbbf6
|
3 |
+
size 9940833416
|
model-00004-of-00004.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:244d8af0a1556eacf58dabdbe1dee9927a28f1dc41d73fde007743486b79cee2
|
3 |
+
size 9899861896
|
model.safetensors.index.json
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
{"metadata": {"mergekit_version": "0.0.3.2"}, "weight_map": {"model.layers.0.self_attn.o_proj.weight": "model-00001-of-00004.safetensors", "model.layers.0.self_attn.q_proj.weight": "model-00001-of-00004.safetensors", "model.layers.0.self_attn.k_proj.weight": "model-00001-of-00004.safetensors", "model.layers.0.self_attn.v_proj.weight": "model-00001-of-00004.safetensors", "model.layers.0.input_layernorm.weight": "model-00001-of-00004.safetensors", "model.layers.0.mlp.gate_proj.weight": "model-00001-of-00004.safetensors", "model.layers.0.mlp.down_proj.weight": "model-00001-of-00004.safetensors", "model.layers.0.mlp.up_proj.weight": "model-00001-of-00004.safetensors", "model.layers.0.post_attention_layernorm.weight": "model-00001-of-00004.safetensors", "model.layers.1.self_attn.o_proj.weight": "model-00001-of-00004.safetensors", "model.layers.1.self_attn.q_proj.weight": "model-00001-of-00004.safetensors", "model.layers.1.self_attn.k_proj.weight": "model-00001-of-00004.safetensors", "model.layers.1.self_attn.v_proj.weight": "model-00001-of-00004.safetensors", "model.layers.1.input_layernorm.weight": "model-00001-of-00004.safetensors", "model.layers.1.mlp.gate_proj.weight": "model-00001-of-00004.safetensors", "model.layers.1.mlp.down_proj.weight": "model-00001-of-00004.safetensors", "model.layers.1.mlp.up_proj.weight": "model-00001-of-00004.safetensors", "model.layers.1.post_attention_layernorm.weight": "model-00001-of-00004.safetensors", "model.layers.10.self_attn.o_proj.weight": "model-00001-of-00004.safetensors", "model.layers.10.self_attn.q_proj.weight": "model-00001-of-00004.safetensors", "model.layers.10.self_attn.k_proj.weight": "model-00001-of-00004.safetensors", "model.layers.10.self_attn.v_proj.weight": "model-00001-of-00004.safetensors", "model.layers.10.input_layernorm.weight": "model-00001-of-00004.safetensors", "model.layers.10.mlp.gate_proj.weight": "model-00001-of-00004.safetensors", "model.layers.10.mlp.down_proj.weight": "model-00001-of-00004.safetensors", "model.layers.10.mlp.up_proj.weight": "model-00001-of-00004.safetensors", "model.layers.10.post_attention_layernorm.weight": "model-00001-of-00004.safetensors", "model.layers.11.self_attn.o_proj.weight": "model-00001-of-00004.safetensors", "model.layers.11.self_attn.q_proj.weight": "model-00001-of-00004.safetensors", "model.layers.11.self_attn.k_proj.weight": "model-00001-of-00004.safetensors", "model.layers.11.self_attn.v_proj.weight": "model-00001-of-00004.safetensors", "model.layers.11.input_layernorm.weight": "model-00001-of-00004.safetensors", "model.layers.11.mlp.gate_proj.weight": "model-00001-of-00004.safetensors", "model.layers.11.mlp.down_proj.weight": "model-00001-of-00004.safetensors", "model.layers.11.mlp.up_proj.weight": "model-00001-of-00004.safetensors", "model.layers.11.post_attention_layernorm.weight": "model-00001-of-00004.safetensors", "model.layers.12.self_attn.o_proj.weight": "model-00001-of-00004.safetensors", "model.layers.12.self_attn.q_proj.weight": "model-00001-of-00004.safetensors", "model.layers.12.self_attn.k_proj.weight": "model-00001-of-00004.safetensors", "model.layers.12.self_attn.v_proj.weight": "model-00001-of-00004.safetensors", "model.layers.12.input_layernorm.weight": "model-00001-of-00004.safetensors", "model.layers.12.mlp.gate_proj.weight": "model-00001-of-00004.safetensors", "model.layers.12.mlp.down_proj.weight": "model-00001-of-00004.safetensors", "model.layers.12.mlp.up_proj.weight": "model-00001-of-00004.safetensors", "model.layers.12.post_attention_layernorm.weight": "model-00001-of-00004.safetensors", "model.layers.13.self_attn.o_proj.weight": "model-00001-of-00004.safetensors", "model.layers.13.self_attn.q_proj.weight": "model-00001-of-00004.safetensors", "model.layers.13.self_attn.k_proj.weight": "model-00001-of-00004.safetensors", "model.layers.13.self_attn.v_proj.weight": "model-00001-of-00004.safetensors", "model.layers.13.input_layernorm.weight": "model-00001-of-00004.safetensors", "model.layers.13.mlp.gate_proj.weight": "model-00001-of-00004.safetensors", "model.layers.13.mlp.down_proj.weight": "model-00001-of-00004.safetensors", "model.layers.13.mlp.up_proj.weight": "model-00001-of-00004.safetensors", "model.layers.13.post_attention_layernorm.weight": "model-00001-of-00004.safetensors", "model.layers.14.self_attn.o_proj.weight": "model-00001-of-00004.safetensors", "model.layers.14.self_attn.q_proj.weight": "model-00001-of-00004.safetensors", "model.layers.14.self_attn.k_proj.weight": "model-00001-of-00004.safetensors", "model.layers.14.self_attn.v_proj.weight": "model-00001-of-00004.safetensors", "model.layers.14.input_layernorm.weight": "model-00001-of-00004.safetensors", "model.layers.14.mlp.gate_proj.weight": "model-00001-of-00004.safetensors", "model.layers.14.mlp.down_proj.weight": "model-00001-of-00004.safetensors", "model.layers.14.mlp.up_proj.weight": "model-00001-of-00004.safetensors", "model.layers.14.post_attention_layernorm.weight": "model-00001-of-00004.safetensors", "model.layers.15.self_attn.o_proj.weight": "model-00001-of-00004.safetensors", "model.layers.15.self_attn.q_proj.weight": "model-00001-of-00004.safetensors", "model.layers.15.self_attn.k_proj.weight": "model-00001-of-00004.safetensors", "model.layers.15.self_attn.v_proj.weight": "model-00001-of-00004.safetensors", "model.layers.15.input_layernorm.weight": "model-00001-of-00004.safetensors", "model.layers.15.mlp.gate_proj.weight": "model-00001-of-00004.safetensors", "model.layers.15.mlp.down_proj.weight": "model-00001-of-00004.safetensors", "model.layers.15.mlp.up_proj.weight": "model-00001-of-00004.safetensors", "model.layers.15.post_attention_layernorm.weight": "model-00001-of-00004.safetensors", "model.layers.16.self_attn.o_proj.weight": "model-00001-of-00004.safetensors", "model.layers.16.self_attn.q_proj.weight": "model-00001-of-00004.safetensors", "model.layers.16.self_attn.k_proj.weight": "model-00001-of-00004.safetensors", "model.layers.16.self_attn.v_proj.weight": "model-00001-of-00004.safetensors", "model.layers.16.input_layernorm.weight": "model-00001-of-00004.safetensors", "model.layers.16.mlp.gate_proj.weight": "model-00001-of-00004.safetensors", "model.layers.16.mlp.down_proj.weight": "model-00001-of-00004.safetensors", "model.layers.16.mlp.up_proj.weight": "model-00001-of-00004.safetensors", "model.layers.16.post_attention_layernorm.weight": "model-00001-of-00004.safetensors", "model.layers.17.self_attn.o_proj.weight": "model-00001-of-00004.safetensors", "model.layers.17.self_attn.q_proj.weight": "model-00001-of-00004.safetensors", "model.layers.17.self_attn.k_proj.weight": "model-00001-of-00004.safetensors", "model.layers.17.self_attn.v_proj.weight": "model-00001-of-00004.safetensors", "model.layers.17.input_layernorm.weight": "model-00001-of-00004.safetensors", "model.layers.17.mlp.gate_proj.weight": "model-00001-of-00004.safetensors", "model.layers.17.mlp.down_proj.weight": "model-00001-of-00004.safetensors", "model.layers.17.mlp.up_proj.weight": "model-00001-of-00004.safetensors", "model.layers.17.post_attention_layernorm.weight": "model-00001-of-00004.safetensors", "model.layers.18.self_attn.o_proj.weight": "model-00001-of-00004.safetensors", "model.layers.18.self_attn.q_proj.weight": "model-00001-of-00004.safetensors", "model.layers.18.self_attn.k_proj.weight": "model-00001-of-00004.safetensors", "model.layers.18.self_attn.v_proj.weight": "model-00001-of-00004.safetensors", "model.layers.18.input_layernorm.weight": "model-00001-of-00004.safetensors", "model.layers.18.mlp.gate_proj.weight": "model-00001-of-00004.safetensors", "model.layers.18.mlp.down_proj.weight": "model-00001-of-00004.safetensors", "model.layers.18.mlp.up_proj.weight": "model-00001-of-00004.safetensors", "model.layers.18.post_attention_layernorm.weight": "model-00001-of-00004.safetensors", "model.layers.19.self_attn.o_proj.weight": "model-00001-of-00004.safetensors", "model.layers.19.self_attn.q_proj.weight": "model-00001-of-00004.safetensors", "model.layers.19.self_attn.k_proj.weight": "model-00001-of-00004.safetensors", "model.layers.19.self_attn.v_proj.weight": "model-00001-of-00004.safetensors", "model.layers.19.input_layernorm.weight": "model-00001-of-00004.safetensors", "model.layers.19.mlp.gate_proj.weight": "model-00001-of-00004.safetensors", "model.layers.19.mlp.down_proj.weight": "model-00001-of-00004.safetensors", "model.layers.19.mlp.up_proj.weight": "model-00001-of-00004.safetensors", "model.layers.19.post_attention_layernorm.weight": "model-00001-of-00004.safetensors", "model.layers.2.self_attn.o_proj.weight": "model-00001-of-00004.safetensors", "model.layers.2.self_attn.q_proj.weight": "model-00001-of-00004.safetensors", "model.layers.2.self_attn.k_proj.weight": "model-00001-of-00004.safetensors", "model.layers.2.self_attn.v_proj.weight": "model-00001-of-00004.safetensors", "model.layers.2.input_layernorm.weight": "model-00001-of-00004.safetensors", "model.layers.2.mlp.gate_proj.weight": "model-00001-of-00004.safetensors", "model.layers.2.mlp.down_proj.weight": "model-00001-of-00004.safetensors", "model.layers.2.mlp.up_proj.weight": "model-00002-of-00004.safetensors", "model.layers.2.post_attention_layernorm.weight": "model-00002-of-00004.safetensors", "model.layers.20.self_attn.o_proj.weight": "model-00002-of-00004.safetensors", "model.layers.20.self_attn.q_proj.weight": "model-00002-of-00004.safetensors", "model.layers.20.self_attn.k_proj.weight": "model-00002-of-00004.safetensors", "model.layers.20.self_attn.v_proj.weight": "model-00002-of-00004.safetensors", "model.layers.20.input_layernorm.weight": "model-00002-of-00004.safetensors", "model.layers.20.mlp.gate_proj.weight": "model-00002-of-00004.safetensors", "model.layers.20.mlp.down_proj.weight": "model-00002-of-00004.safetensors", "model.layers.20.mlp.up_proj.weight": "model-00002-of-00004.safetensors", "model.layers.20.post_attention_layernorm.weight": "model-00002-of-00004.safetensors", "model.layers.21.self_attn.o_proj.weight": "model-00002-of-00004.safetensors", "model.layers.21.self_attn.q_proj.weight": "model-00002-of-00004.safetensors", "model.layers.21.self_attn.k_proj.weight": "model-00002-of-00004.safetensors", "model.layers.21.self_attn.v_proj.weight": "model-00002-of-00004.safetensors", "model.layers.21.input_layernorm.weight": "model-00002-of-00004.safetensors", "model.layers.21.mlp.gate_proj.weight": "model-00002-of-00004.safetensors", "model.layers.21.mlp.down_proj.weight": "model-00002-of-00004.safetensors", "model.layers.21.mlp.up_proj.weight": "model-00002-of-00004.safetensors", "model.layers.21.post_attention_layernorm.weight": "model-00002-of-00004.safetensors", "model.layers.22.self_attn.o_proj.weight": "model-00002-of-00004.safetensors", "model.layers.22.self_attn.q_proj.weight": "model-00002-of-00004.safetensors", "model.layers.22.self_attn.k_proj.weight": "model-00002-of-00004.safetensors", "model.layers.22.self_attn.v_proj.weight": "model-00002-of-00004.safetensors", "model.layers.22.input_layernorm.weight": "model-00002-of-00004.safetensors", "model.layers.22.mlp.gate_proj.weight": "model-00002-of-00004.safetensors", "model.layers.22.mlp.down_proj.weight": "model-00002-of-00004.safetensors", "model.layers.22.mlp.up_proj.weight": "model-00002-of-00004.safetensors", "model.layers.22.post_attention_layernorm.weight": "model-00002-of-00004.safetensors", "model.layers.23.self_attn.o_proj.weight": "model-00002-of-00004.safetensors", "model.layers.23.self_attn.q_proj.weight": "model-00002-of-00004.safetensors", "model.layers.23.self_attn.k_proj.weight": "model-00002-of-00004.safetensors", "model.layers.23.self_attn.v_proj.weight": "model-00002-of-00004.safetensors", "model.layers.23.input_layernorm.weight": "model-00002-of-00004.safetensors", "model.layers.23.mlp.gate_proj.weight": "model-00002-of-00004.safetensors", "model.layers.23.mlp.down_proj.weight": "model-00002-of-00004.safetensors", "model.layers.23.mlp.up_proj.weight": "model-00002-of-00004.safetensors", "model.layers.23.post_attention_layernorm.weight": "model-00002-of-00004.safetensors", "model.layers.24.self_attn.o_proj.weight": "model-00002-of-00004.safetensors", "model.layers.24.self_attn.q_proj.weight": "model-00002-of-00004.safetensors", "model.layers.24.self_attn.k_proj.weight": "model-00002-of-00004.safetensors", "model.layers.24.self_attn.v_proj.weight": "model-00002-of-00004.safetensors", "model.layers.24.input_layernorm.weight": "model-00002-of-00004.safetensors", "model.layers.24.mlp.gate_proj.weight": "model-00002-of-00004.safetensors", "model.layers.24.mlp.down_proj.weight": "model-00002-of-00004.safetensors", "model.layers.24.mlp.up_proj.weight": "model-00002-of-00004.safetensors", "model.layers.24.post_attention_layernorm.weight": "model-00002-of-00004.safetensors", "model.layers.25.self_attn.o_proj.weight": "model-00002-of-00004.safetensors", "model.layers.25.self_attn.q_proj.weight": "model-00002-of-00004.safetensors", "model.layers.25.self_attn.k_proj.weight": "model-00002-of-00004.safetensors", "model.layers.25.self_attn.v_proj.weight": "model-00002-of-00004.safetensors", "model.layers.25.input_layernorm.weight": "model-00002-of-00004.safetensors", "model.layers.25.mlp.gate_proj.weight": "model-00002-of-00004.safetensors", "model.layers.25.mlp.down_proj.weight": "model-00002-of-00004.safetensors", "model.layers.25.mlp.up_proj.weight": "model-00002-of-00004.safetensors", "model.layers.25.post_attention_layernorm.weight": "model-00002-of-00004.safetensors", "model.layers.26.self_attn.o_proj.weight": "model-00002-of-00004.safetensors", "model.layers.26.self_attn.q_proj.weight": "model-00002-of-00004.safetensors", "model.layers.26.self_attn.k_proj.weight": "model-00002-of-00004.safetensors", "model.layers.26.self_attn.v_proj.weight": "model-00002-of-00004.safetensors", "model.layers.26.input_layernorm.weight": "model-00002-of-00004.safetensors", "model.layers.26.mlp.gate_proj.weight": "model-00002-of-00004.safetensors", "model.layers.26.mlp.down_proj.weight": "model-00002-of-00004.safetensors", "model.layers.26.mlp.up_proj.weight": "model-00002-of-00004.safetensors", "model.layers.26.post_attention_layernorm.weight": "model-00002-of-00004.safetensors", "model.layers.27.self_attn.o_proj.weight": "model-00002-of-00004.safetensors", "model.layers.27.self_attn.q_proj.weight": "model-00002-of-00004.safetensors", "model.layers.27.self_attn.k_proj.weight": "model-00002-of-00004.safetensors", "model.layers.27.self_attn.v_proj.weight": "model-00002-of-00004.safetensors", "model.layers.27.input_layernorm.weight": "model-00002-of-00004.safetensors", "model.layers.27.mlp.gate_proj.weight": "model-00002-of-00004.safetensors", "model.layers.27.mlp.down_proj.weight": "model-00002-of-00004.safetensors", "model.layers.27.mlp.up_proj.weight": "model-00002-of-00004.safetensors", "model.layers.27.post_attention_layernorm.weight": "model-00002-of-00004.safetensors", "model.layers.28.self_attn.o_proj.weight": "model-00002-of-00004.safetensors", "model.layers.28.self_attn.q_proj.weight": "model-00002-of-00004.safetensors", "model.layers.28.self_attn.k_proj.weight": "model-00002-of-00004.safetensors", "model.layers.28.self_attn.v_proj.weight": "model-00002-of-00004.safetensors", "model.layers.28.input_layernorm.weight": "model-00002-of-00004.safetensors", "model.layers.28.mlp.gate_proj.weight": "model-00002-of-00004.safetensors", "model.layers.28.mlp.down_proj.weight": "model-00002-of-00004.safetensors", "model.layers.28.mlp.up_proj.weight": "model-00002-of-00004.safetensors", "model.layers.28.post_attention_layernorm.weight": "model-00002-of-00004.safetensors", "model.layers.29.self_attn.o_proj.weight": "model-00002-of-00004.safetensors", "model.layers.29.self_attn.q_proj.weight": "model-00002-of-00004.safetensors", "model.layers.29.self_attn.k_proj.weight": "model-00002-of-00004.safetensors", "model.layers.29.self_attn.v_proj.weight": "model-00002-of-00004.safetensors", "model.layers.29.input_layernorm.weight": "model-00002-of-00004.safetensors", "model.layers.29.mlp.gate_proj.weight": "model-00002-of-00004.safetensors", "model.layers.29.mlp.down_proj.weight": "model-00002-of-00004.safetensors", "model.layers.29.mlp.up_proj.weight": "model-00002-of-00004.safetensors", "model.layers.29.post_attention_layernorm.weight": "model-00002-of-00004.safetensors", "model.layers.3.self_attn.o_proj.weight": "model-00002-of-00004.safetensors", "model.layers.3.self_attn.q_proj.weight": "model-00002-of-00004.safetensors", "model.layers.3.self_attn.k_proj.weight": "model-00002-of-00004.safetensors", "model.layers.3.self_attn.v_proj.weight": "model-00002-of-00004.safetensors", "model.layers.3.input_layernorm.weight": "model-00002-of-00004.safetensors", "model.layers.3.mlp.gate_proj.weight": "model-00002-of-00004.safetensors", "model.layers.3.mlp.down_proj.weight": "model-00002-of-00004.safetensors", "model.layers.3.mlp.up_proj.weight": "model-00002-of-00004.safetensors", "model.layers.3.post_attention_layernorm.weight": "model-00002-of-00004.safetensors", "model.layers.30.self_attn.o_proj.weight": "model-00002-of-00004.safetensors", "model.layers.30.self_attn.q_proj.weight": "model-00002-of-00004.safetensors", "model.layers.30.self_attn.k_proj.weight": "model-00002-of-00004.safetensors", "model.layers.30.self_attn.v_proj.weight": "model-00002-of-00004.safetensors", "model.layers.30.input_layernorm.weight": "model-00002-of-00004.safetensors", "model.layers.30.mlp.gate_proj.weight": "model-00002-of-00004.safetensors", "model.layers.30.mlp.down_proj.weight": "model-00002-of-00004.safetensors", "model.layers.30.mlp.up_proj.weight": "model-00002-of-00004.safetensors", "model.layers.30.post_attention_layernorm.weight": "model-00002-of-00004.safetensors", "model.layers.31.self_attn.o_proj.weight": "model-00002-of-00004.safetensors", "model.layers.31.self_attn.q_proj.weight": "model-00002-of-00004.safetensors", "model.layers.31.self_attn.k_proj.weight": "model-00002-of-00004.safetensors", "model.layers.31.self_attn.v_proj.weight": "model-00002-of-00004.safetensors", "model.layers.31.input_layernorm.weight": "model-00002-of-00004.safetensors", "model.layers.31.mlp.gate_proj.weight": "model-00002-of-00004.safetensors", "model.layers.31.mlp.down_proj.weight": "model-00003-of-00004.safetensors", "model.layers.31.mlp.up_proj.weight": "model-00003-of-00004.safetensors", "model.layers.31.post_attention_layernorm.weight": "model-00003-of-00004.safetensors", "model.layers.32.self_attn.o_proj.weight": "model-00003-of-00004.safetensors", "model.layers.32.self_attn.q_proj.weight": "model-00003-of-00004.safetensors", "model.layers.32.self_attn.k_proj.weight": "model-00003-of-00004.safetensors", "model.layers.32.self_attn.v_proj.weight": "model-00003-of-00004.safetensors", "model.layers.32.input_layernorm.weight": "model-00003-of-00004.safetensors", "model.layers.32.mlp.gate_proj.weight": "model-00003-of-00004.safetensors", "model.layers.32.mlp.down_proj.weight": "model-00003-of-00004.safetensors", "model.layers.32.mlp.up_proj.weight": "model-00003-of-00004.safetensors", "model.layers.32.post_attention_layernorm.weight": "model-00003-of-00004.safetensors", "model.layers.33.self_attn.o_proj.weight": "model-00003-of-00004.safetensors", "model.layers.33.self_attn.q_proj.weight": "model-00003-of-00004.safetensors", "model.layers.33.self_attn.k_proj.weight": "model-00003-of-00004.safetensors", "model.layers.33.self_attn.v_proj.weight": "model-00003-of-00004.safetensors", "model.layers.33.input_layernorm.weight": "model-00003-of-00004.safetensors", "model.layers.33.mlp.gate_proj.weight": "model-00003-of-00004.safetensors", "model.layers.33.mlp.down_proj.weight": "model-00003-of-00004.safetensors", "model.layers.33.mlp.up_proj.weight": "model-00003-of-00004.safetensors", "model.layers.33.post_attention_layernorm.weight": "model-00003-of-00004.safetensors", "model.layers.34.self_attn.o_proj.weight": "model-00003-of-00004.safetensors", "model.layers.34.self_attn.q_proj.weight": "model-00003-of-00004.safetensors", "model.layers.34.self_attn.k_proj.weight": "model-00003-of-00004.safetensors", "model.layers.34.self_attn.v_proj.weight": "model-00003-of-00004.safetensors", "model.layers.34.input_layernorm.weight": "model-00003-of-00004.safetensors", "model.layers.34.mlp.gate_proj.weight": "model-00003-of-00004.safetensors", "model.layers.34.mlp.down_proj.weight": "model-00003-of-00004.safetensors", "model.layers.34.mlp.up_proj.weight": "model-00003-of-00004.safetensors", "model.layers.34.post_attention_layernorm.weight": "model-00003-of-00004.safetensors", "model.layers.35.self_attn.o_proj.weight": "model-00003-of-00004.safetensors", "model.layers.35.self_attn.q_proj.weight": "model-00003-of-00004.safetensors", "model.layers.35.self_attn.k_proj.weight": "model-00003-of-00004.safetensors", "model.layers.35.self_attn.v_proj.weight": "model-00003-of-00004.safetensors", "model.layers.35.input_layernorm.weight": "model-00003-of-00004.safetensors", "model.layers.35.mlp.gate_proj.weight": "model-00003-of-00004.safetensors", "model.layers.35.mlp.down_proj.weight": "model-00003-of-00004.safetensors", "model.layers.35.mlp.up_proj.weight": "model-00003-of-00004.safetensors", "model.layers.35.post_attention_layernorm.weight": "model-00003-of-00004.safetensors", "model.layers.36.self_attn.o_proj.weight": "model-00003-of-00004.safetensors", "model.layers.36.self_attn.q_proj.weight": "model-00003-of-00004.safetensors", "model.layers.36.self_attn.k_proj.weight": "model-00003-of-00004.safetensors", "model.layers.36.self_attn.v_proj.weight": "model-00003-of-00004.safetensors", "model.layers.36.input_layernorm.weight": "model-00003-of-00004.safetensors", "model.layers.36.mlp.gate_proj.weight": "model-00003-of-00004.safetensors", "model.layers.36.mlp.down_proj.weight": "model-00003-of-00004.safetensors", "model.layers.36.mlp.up_proj.weight": "model-00003-of-00004.safetensors", "model.layers.36.post_attention_layernorm.weight": "model-00003-of-00004.safetensors", "model.layers.37.self_attn.o_proj.weight": "model-00003-of-00004.safetensors", "model.layers.37.self_attn.q_proj.weight": "model-00003-of-00004.safetensors", "model.layers.37.self_attn.k_proj.weight": "model-00003-of-00004.safetensors", "model.layers.37.self_attn.v_proj.weight": "model-00003-of-00004.safetensors", "model.layers.37.input_layernorm.weight": "model-00003-of-00004.safetensors", "model.layers.37.mlp.gate_proj.weight": "model-00003-of-00004.safetensors", "model.layers.37.mlp.down_proj.weight": "model-00003-of-00004.safetensors", "model.layers.37.mlp.up_proj.weight": "model-00003-of-00004.safetensors", "model.layers.37.post_attention_layernorm.weight": "model-00003-of-00004.safetensors", "model.layers.38.self_attn.o_proj.weight": "model-00003-of-00004.safetensors", "model.layers.38.self_attn.q_proj.weight": "model-00003-of-00004.safetensors", "model.layers.38.self_attn.k_proj.weight": "model-00003-of-00004.safetensors", "model.layers.38.self_attn.v_proj.weight": "model-00003-of-00004.safetensors", "model.layers.38.input_layernorm.weight": "model-00003-of-00004.safetensors", "model.layers.38.mlp.gate_proj.weight": "model-00003-of-00004.safetensors", "model.layers.38.mlp.down_proj.weight": "model-00003-of-00004.safetensors", "model.layers.38.mlp.up_proj.weight": "model-00003-of-00004.safetensors", "model.layers.38.post_attention_layernorm.weight": "model-00003-of-00004.safetensors", "model.layers.39.self_attn.o_proj.weight": "model-00003-of-00004.safetensors", "model.layers.39.self_attn.q_proj.weight": "model-00003-of-00004.safetensors", "model.layers.39.self_attn.k_proj.weight": "model-00003-of-00004.safetensors", "model.layers.39.self_attn.v_proj.weight": "model-00003-of-00004.safetensors", "model.layers.39.input_layernorm.weight": "model-00003-of-00004.safetensors", "model.layers.39.mlp.gate_proj.weight": "model-00003-of-00004.safetensors", "model.layers.39.mlp.down_proj.weight": "model-00003-of-00004.safetensors", "model.layers.39.mlp.up_proj.weight": "model-00003-of-00004.safetensors", "model.layers.39.post_attention_layernorm.weight": "model-00003-of-00004.safetensors", "model.layers.4.self_attn.o_proj.weight": "model-00003-of-00004.safetensors", "model.layers.4.self_attn.q_proj.weight": "model-00003-of-00004.safetensors", "model.layers.4.self_attn.k_proj.weight": "model-00003-of-00004.safetensors", "model.layers.4.self_attn.v_proj.weight": "model-00003-of-00004.safetensors", "model.layers.4.input_layernorm.weight": "model-00003-of-00004.safetensors", "model.layers.4.mlp.gate_proj.weight": "model-00003-of-00004.safetensors", "model.layers.4.mlp.down_proj.weight": "model-00003-of-00004.safetensors", "model.layers.4.mlp.up_proj.weight": "model-00003-of-00004.safetensors", "model.layers.4.post_attention_layernorm.weight": "model-00003-of-00004.safetensors", "model.layers.40.self_attn.o_proj.weight": "model-00003-of-00004.safetensors", "model.layers.40.self_attn.q_proj.weight": "model-00003-of-00004.safetensors", "model.layers.40.self_attn.k_proj.weight": "model-00003-of-00004.safetensors", "model.layers.40.self_attn.v_proj.weight": "model-00003-of-00004.safetensors", "model.layers.40.input_layernorm.weight": "model-00003-of-00004.safetensors", "model.layers.40.mlp.gate_proj.weight": "model-00003-of-00004.safetensors", "model.layers.40.mlp.down_proj.weight": "model-00003-of-00004.safetensors", "model.layers.40.mlp.up_proj.weight": "model-00003-of-00004.safetensors", "model.layers.40.post_attention_layernorm.weight": "model-00003-of-00004.safetensors", "model.layers.41.self_attn.o_proj.weight": "model-00003-of-00004.safetensors", "model.layers.41.self_attn.q_proj.weight": "model-00003-of-00004.safetensors", "model.layers.41.self_attn.k_proj.weight": "model-00003-of-00004.safetensors", "model.layers.41.self_attn.v_proj.weight": "model-00003-of-00004.safetensors", "model.layers.41.input_layernorm.weight": "model-00003-of-00004.safetensors", "model.layers.41.mlp.gate_proj.weight": "model-00003-of-00004.safetensors", "model.layers.41.mlp.down_proj.weight": "model-00003-of-00004.safetensors", "model.layers.41.mlp.up_proj.weight": "model-00003-of-00004.safetensors", "model.layers.41.post_attention_layernorm.weight": "model-00003-of-00004.safetensors", "model.layers.42.self_attn.o_proj.weight": "model-00003-of-00004.safetensors", "model.layers.42.self_attn.q_proj.weight": "model-00003-of-00004.safetensors", "model.layers.42.self_attn.k_proj.weight": "model-00003-of-00004.safetensors", "model.layers.42.self_attn.v_proj.weight": "model-00003-of-00004.safetensors", "model.layers.42.input_layernorm.weight": "model-00003-of-00004.safetensors", "model.layers.42.mlp.gate_proj.weight": "model-00003-of-00004.safetensors", "model.layers.42.mlp.down_proj.weight": "model-00003-of-00004.safetensors", "model.layers.42.mlp.up_proj.weight": "model-00003-of-00004.safetensors", "model.layers.42.post_attention_layernorm.weight": "model-00003-of-00004.safetensors", "model.layers.43.self_attn.o_proj.weight": "model-00003-of-00004.safetensors", "model.layers.43.self_attn.q_proj.weight": "model-00003-of-00004.safetensors", "model.layers.43.self_attn.k_proj.weight": "model-00003-of-00004.safetensors", "model.layers.43.self_attn.v_proj.weight": "model-00003-of-00004.safetensors", "model.layers.43.input_layernorm.weight": "model-00003-of-00004.safetensors", "model.layers.43.mlp.gate_proj.weight": "model-00004-of-00004.safetensors", "model.layers.43.mlp.down_proj.weight": "model-00004-of-00004.safetensors", "model.layers.43.mlp.up_proj.weight": "model-00004-of-00004.safetensors", "model.layers.43.post_attention_layernorm.weight": "model-00004-of-00004.safetensors", "model.layers.44.self_attn.o_proj.weight": "model-00004-of-00004.safetensors", "model.layers.44.self_attn.q_proj.weight": "model-00004-of-00004.safetensors", "model.layers.44.self_attn.k_proj.weight": "model-00004-of-00004.safetensors", "model.layers.44.self_attn.v_proj.weight": "model-00004-of-00004.safetensors", "model.layers.44.input_layernorm.weight": "model-00004-of-00004.safetensors", "model.layers.44.mlp.gate_proj.weight": "model-00004-of-00004.safetensors", "model.layers.44.mlp.down_proj.weight": "model-00004-of-00004.safetensors", "model.layers.44.mlp.up_proj.weight": "model-00004-of-00004.safetensors", "model.layers.44.post_attention_layernorm.weight": "model-00004-of-00004.safetensors", "model.layers.45.self_attn.o_proj.weight": "model-00004-of-00004.safetensors", "model.layers.45.self_attn.q_proj.weight": "model-00004-of-00004.safetensors", "model.layers.45.self_attn.k_proj.weight": "model-00004-of-00004.safetensors", "model.layers.45.self_attn.v_proj.weight": "model-00004-of-00004.safetensors", "model.layers.45.input_layernorm.weight": "model-00004-of-00004.safetensors", "model.layers.45.mlp.gate_proj.weight": "model-00004-of-00004.safetensors", "model.layers.45.mlp.down_proj.weight": "model-00004-of-00004.safetensors", "model.layers.45.mlp.up_proj.weight": "model-00004-of-00004.safetensors", "model.layers.45.post_attention_layernorm.weight": "model-00004-of-00004.safetensors", "model.layers.46.self_attn.o_proj.weight": "model-00004-of-00004.safetensors", "model.layers.46.self_attn.q_proj.weight": "model-00004-of-00004.safetensors", "model.layers.46.self_attn.k_proj.weight": "model-00004-of-00004.safetensors", "model.layers.46.self_attn.v_proj.weight": "model-00004-of-00004.safetensors", "model.layers.46.input_layernorm.weight": "model-00004-of-00004.safetensors", "model.layers.46.mlp.gate_proj.weight": "model-00004-of-00004.safetensors", "model.layers.46.mlp.down_proj.weight": "model-00004-of-00004.safetensors", "model.layers.46.mlp.up_proj.weight": "model-00004-of-00004.safetensors", "model.layers.46.post_attention_layernorm.weight": "model-00004-of-00004.safetensors", "model.layers.47.self_attn.o_proj.weight": "model-00004-of-00004.safetensors", "model.layers.47.self_attn.q_proj.weight": "model-00004-of-00004.safetensors", "model.layers.47.self_attn.k_proj.weight": "model-00004-of-00004.safetensors", "model.layers.47.self_attn.v_proj.weight": "model-00004-of-00004.safetensors", "model.layers.47.input_layernorm.weight": "model-00004-of-00004.safetensors", "model.layers.47.mlp.gate_proj.weight": "model-00004-of-00004.safetensors", "model.layers.47.mlp.down_proj.weight": "model-00004-of-00004.safetensors", "model.layers.47.mlp.up_proj.weight": "model-00004-of-00004.safetensors", "model.layers.47.post_attention_layernorm.weight": "model-00004-of-00004.safetensors", "model.layers.5.self_attn.o_proj.weight": "model-00004-of-00004.safetensors", "model.layers.5.self_attn.q_proj.weight": "model-00004-of-00004.safetensors", "model.layers.5.self_attn.k_proj.weight": "model-00004-of-00004.safetensors", "model.layers.5.self_attn.v_proj.weight": "model-00004-of-00004.safetensors", "model.layers.5.input_layernorm.weight": "model-00004-of-00004.safetensors", "model.layers.5.mlp.gate_proj.weight": "model-00004-of-00004.safetensors", "model.layers.5.mlp.down_proj.weight": "model-00004-of-00004.safetensors", "model.layers.5.mlp.up_proj.weight": "model-00004-of-00004.safetensors", "model.layers.5.post_attention_layernorm.weight": "model-00004-of-00004.safetensors", "model.layers.6.self_attn.o_proj.weight": "model-00004-of-00004.safetensors", "model.layers.6.self_attn.q_proj.weight": "model-00004-of-00004.safetensors", "model.layers.6.self_attn.k_proj.weight": "model-00004-of-00004.safetensors", "model.layers.6.self_attn.v_proj.weight": "model-00004-of-00004.safetensors", "model.layers.6.input_layernorm.weight": "model-00004-of-00004.safetensors", "model.layers.6.mlp.gate_proj.weight": "model-00004-of-00004.safetensors", "model.layers.6.mlp.down_proj.weight": "model-00004-of-00004.safetensors", "model.layers.6.mlp.up_proj.weight": "model-00004-of-00004.safetensors", "model.layers.6.post_attention_layernorm.weight": "model-00004-of-00004.safetensors", "model.layers.7.self_attn.o_proj.weight": "model-00004-of-00004.safetensors", "model.layers.7.self_attn.q_proj.weight": "model-00004-of-00004.safetensors", "model.layers.7.self_attn.k_proj.weight": "model-00004-of-00004.safetensors", "model.layers.7.self_attn.v_proj.weight": "model-00004-of-00004.safetensors", "model.layers.7.input_layernorm.weight": "model-00004-of-00004.safetensors", "model.layers.7.mlp.gate_proj.weight": "model-00004-of-00004.safetensors", "model.layers.7.mlp.down_proj.weight": "model-00004-of-00004.safetensors", "model.layers.7.mlp.up_proj.weight": "model-00004-of-00004.safetensors", "model.layers.7.post_attention_layernorm.weight": "model-00004-of-00004.safetensors", "model.layers.8.self_attn.o_proj.weight": "model-00004-of-00004.safetensors", "model.layers.8.self_attn.q_proj.weight": "model-00004-of-00004.safetensors", "model.layers.8.self_attn.k_proj.weight": "model-00004-of-00004.safetensors", "model.layers.8.self_attn.v_proj.weight": "model-00004-of-00004.safetensors", "model.layers.8.input_layernorm.weight": "model-00004-of-00004.safetensors", "model.layers.8.mlp.gate_proj.weight": "model-00004-of-00004.safetensors", "model.layers.8.mlp.down_proj.weight": "model-00004-of-00004.safetensors", "model.layers.8.mlp.up_proj.weight": "model-00004-of-00004.safetensors", "model.layers.8.post_attention_layernorm.weight": "model-00004-of-00004.safetensors", "model.layers.9.self_attn.o_proj.weight": "model-00004-of-00004.safetensors", "model.layers.9.self_attn.q_proj.weight": "model-00004-of-00004.safetensors", "model.layers.9.self_attn.k_proj.weight": "model-00004-of-00004.safetensors", "model.layers.9.self_attn.v_proj.weight": "model-00004-of-00004.safetensors", "model.layers.9.input_layernorm.weight": "model-00004-of-00004.safetensors", "model.layers.9.mlp.gate_proj.weight": "model-00004-of-00004.safetensors", "model.layers.9.mlp.down_proj.weight": "model-00004-of-00004.safetensors", "model.layers.9.mlp.up_proj.weight": "model-00004-of-00004.safetensors", "model.layers.9.post_attention_layernorm.weight": "model-00004-of-00004.safetensors", "model.norm.weight": "model-00004-of-00004.safetensors", "model.embed_tokens.weight": "model-00004-of-00004.safetensors", "lm_head.weight": "model-00004-of-00004.safetensors"}}
|
original_repo_url.txt
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
https://huggingface.co/internlm/internlm2-chat-20b
|
special_tokens_map.json
ADDED
@@ -0,0 +1,30 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"bos_token": {
|
3 |
+
"content": "<s>",
|
4 |
+
"lstrip": false,
|
5 |
+
"normalized": false,
|
6 |
+
"rstrip": false,
|
7 |
+
"single_word": false
|
8 |
+
},
|
9 |
+
"eos_token": {
|
10 |
+
"content": "</s>",
|
11 |
+
"lstrip": false,
|
12 |
+
"normalized": false,
|
13 |
+
"rstrip": false,
|
14 |
+
"single_word": false
|
15 |
+
},
|
16 |
+
"pad_token": {
|
17 |
+
"content": "</s>",
|
18 |
+
"lstrip": false,
|
19 |
+
"normalized": false,
|
20 |
+
"rstrip": false,
|
21 |
+
"single_word": false
|
22 |
+
},
|
23 |
+
"unk_token": {
|
24 |
+
"content": "<unk>",
|
25 |
+
"lstrip": false,
|
26 |
+
"normalized": false,
|
27 |
+
"rstrip": false,
|
28 |
+
"single_word": false
|
29 |
+
}
|
30 |
+
}
|
tokenizer.model
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:f868398fc4e05ee1e8aeba95ddf18ddcc45b8bce55d5093bead5bbf80429b48b
|
3 |
+
size 1477754
|
tokenizer_config.json
ADDED
@@ -0,0 +1,48 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"add_bos_token": true,
|
3 |
+
"add_eos_token": false,
|
4 |
+
"added_tokens_decoder": {
|
5 |
+
"0": {
|
6 |
+
"content": "<unk>",
|
7 |
+
"lstrip": false,
|
8 |
+
"normalized": false,
|
9 |
+
"rstrip": false,
|
10 |
+
"single_word": false,
|
11 |
+
"special": true
|
12 |
+
},
|
13 |
+
"1": {
|
14 |
+
"content": "<s>",
|
15 |
+
"lstrip": false,
|
16 |
+
"normalized": false,
|
17 |
+
"rstrip": false,
|
18 |
+
"single_word": false,
|
19 |
+
"special": true
|
20 |
+
},
|
21 |
+
"2": {
|
22 |
+
"content": "</s>",
|
23 |
+
"lstrip": false,
|
24 |
+
"normalized": false,
|
25 |
+
"rstrip": false,
|
26 |
+
"single_word": false,
|
27 |
+
"special": true
|
28 |
+
}
|
29 |
+
},
|
30 |
+
"auto_map": {
|
31 |
+
"AutoTokenizer": [
|
32 |
+
"tokenization_internlm.InternLMTokenizer",
|
33 |
+
null
|
34 |
+
]
|
35 |
+
},
|
36 |
+
"bos_token": "<s>",
|
37 |
+
"clean_up_tokenization_spaces": true,
|
38 |
+
"eos_token": "</s>",
|
39 |
+
"legacy": true,
|
40 |
+
"model_max_length": 1000000000000000019884624838656,
|
41 |
+
"pad_token": "</s>",
|
42 |
+
"sp_model_kwargs": {},
|
43 |
+
"spaces_between_special_tokens": false,
|
44 |
+
"tokenizer_class": "LlamaTokenizer",
|
45 |
+
"trust_remote_code": false,
|
46 |
+
"unk_token": "<unk>",
|
47 |
+
"use_default_system_prompt": false
|
48 |
+
}
|