Quant for 3.5
Browse files- README.md +119 -29
- config.json +25 -0
- model.safetensors.index.json +1 -0
- original_repo_url.txt +1 -0
- output-00001-of-00002.safetensors +3 -0
- output-00002-of-00002.safetensors +3 -0
- special_tokens_map.json +30 -0
- tokenizer.model +3 -0
- tokenizer_config.json +48 -0
README.md
CHANGED
@@ -1,57 +1,147 @@
|
|
1 |
---
|
2 |
pipeline_tag: text-generation
|
3 |
license: other
|
4 |
-
quantized_by: bartowski
|
5 |
---
|
|
|
6 |
|
7 |
-
|
8 |
|
9 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
10 |
|
11 |
-
|
12 |
|
13 |
-
|
14 |
|
15 |
-
|
16 |
|
17 |
-
Default arguments used except when the bits per weight is above 6.0, at that point the lm_head layer is quantized at 8 bits per weight instead of the default 6.
|
18 |
|
19 |
-
|
|
|
20 |
|
|
|
|
|
|
|
|
|
21 |
|
|
|
22 |
|
23 |
-
|
|
|
24 |
|
25 |
-
<a href="https://huggingface.co/bartowski/internlm2-20b-llama-exl2/tree/4_5">4.5 bits per weight</a>
|
26 |
|
27 |
-
|
28 |
|
29 |
-
|
30 |
|
31 |
-
|
32 |
|
33 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
34 |
|
35 |
-
```shell
|
36 |
-
git clone --single-branch --branch 4_0 https://huggingface.co/bartowski/internlm2-20b-llama-exl2
|
37 |
-
```
|
38 |
|
39 |
-
|
|
|
|
|
40 |
|
41 |
-
```shell
|
42 |
-
pip3 install huggingface-hub
|
43 |
-
```
|
44 |
|
45 |
-
|
46 |
|
47 |
-
|
48 |
-
|
49 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
50 |
```
|
51 |
|
52 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
53 |
|
54 |
-
|
55 |
-
|
56 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
57 |
```
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
pipeline_tag: text-generation
|
3 |
license: other
|
|
|
4 |
---
|
5 |
+
# InternLM
|
6 |
|
7 |
+
<div align="center">
|
8 |
|
9 |
+
<img src="https://github.com/InternLM/InternLM/assets/22529082/b9788105-8892-4398-8b47-b513a292378e" width="200"/>
|
10 |
+
<div> </div>
|
11 |
+
<div align="center">
|
12 |
+
<b><font size="5">InternLM</font></b>
|
13 |
+
<sup>
|
14 |
+
<a href="https://internlm.intern-ai.org.cn/">
|
15 |
+
<i><font size="4">HOT</font></i>
|
16 |
+
</a>
|
17 |
+
</sup>
|
18 |
+
<div> </div>
|
19 |
+
</div>
|
20 |
|
21 |
+
[![evaluation](https://github.com/InternLM/InternLM/assets/22529082/f80a2a58-5ddf-471a-8da4-32ab65c8fd3b)](https://github.com/internLM/OpenCompass/)
|
22 |
|
23 |
+
[💻Github Repo](https://github.com/InternLM/InternLM) • [🤔Reporting Issues](https://github.com/InternLM/InternLM/issues/new)
|
24 |
|
25 |
+
</div>
|
26 |
|
|
|
27 |
|
28 |
+
## Introduction
|
29 |
+
The second generation of the InternLM model, InternLM2, includes models at two scales: 7B and 20B. For the convenience of users and researchers, we have open-sourced four versions of each scale of the model, which are:
|
30 |
|
31 |
+
- internlm2-base: A high-quality and highly adaptable model base, serving as an excellent starting point for deep domain adaptation.
|
32 |
+
- internlm2 (**recommended**): Built upon the internlm2-base, this version has further pretrained on domain-specific corpus. It shows outstanding performance in evaluations while maintaining robust general language abilities, making it our recommended choice for most applications.
|
33 |
+
- internlm2-chat-sft: Based on the Base model, it undergoes supervised human alignment training.
|
34 |
+
- internlm2-chat (**recommended**): Optimized for conversational interaction on top of the internlm2-chat-sft through RLHF, it excels in instruction adherence, empathetic chatting, and tool invocation.
|
35 |
|
36 |
+
The base model of InternLM2 has the following technical features:
|
37 |
|
38 |
+
- Effective support for ultra-long contexts of up to 200,000 characters: The model nearly perfectly achieves "finding a needle in a haystack" in long inputs of 200,000 characters. It also leads among open-source models in performance on long-text tasks such as LongBench and L-Eval.
|
39 |
+
- Comprehensive performance enhancement: Compared to the previous generation model, it shows significant improvements in various capabilities, including reasoning, mathematics, and coding.
|
40 |
|
|
|
41 |
|
42 |
+
## InternLM2-20B
|
43 |
|
44 |
+
### Performance Evaluation
|
45 |
|
46 |
+
We have evaluated InternLM2 on several important benchmarks using the open-source evaluation tool [OpenCompass](https://github.com/open-compass/opencompass). Some of the evaluation results are shown in the table below. You are welcome to visit the [OpenCompass Leaderboard](https://opencompass.org.cn/rank) for more evaluation results.
|
47 |
|
48 |
+
| Dataset\Models | InternLM2-7B | InternLM2-Chat-7B | InternLM2-20B | InternLM2-Chat-20B | ChatGPT | GPT-4 |
|
49 |
+
| --- | --- | --- | --- | --- | --- | --- |
|
50 |
+
| MMLU | 65.8 | 63.7 | 67.7 | 66.5 | 69.1 | 83.0 |
|
51 |
+
| AGIEval | 49.9 | 47.2 | 53.0 | 50.3 | 39.9 | 55.1 |
|
52 |
+
| BBH | 65.0 | 61.2 | 72.1 | 68.3 | 70.1 | 86.7 |
|
53 |
+
| GSM8K | 70.8 | 70.7 | 76.1 | 79.6 | 78.2 | 91.4 |
|
54 |
+
| MATH | 20.2 | 23.0 | 25.5 | 31.9 | 28.0 | 45.8 |
|
55 |
+
| HumanEval | 43.3 | 59.8 | 48.8 | 67.1 | 73.2 | 74.4 |
|
56 |
+
| MBPP(Sanitized) | 51.8 | 51.4 | 63.0 | 65.8 | 78.9 | 79.0 |
|
57 |
|
|
|
|
|
|
|
58 |
|
59 |
+
- The evaluation results were obtained from [OpenCompass](https://github.com/open-compass/opencompass) , and evaluation configuration can be found in the configuration files provided by [OpenCompass](https://github.com/open-compass/opencompass).
|
60 |
+
- The evaluation data may have numerical differences due to the version iteration of [OpenCompass](https://github.com/open-compass/opencompass), so please refer to the latest evaluation results of [OpenCompass](https://github.com/open-compass/opencompass).
|
61 |
+
|
62 |
|
|
|
|
|
|
|
63 |
|
64 |
+
**Limitations:** Although we have made efforts to ensure the safety of the model during the training process and to encourage the model to generate text that complies with ethical and legal requirements, the model may still produce unexpected outputs due to its size and probabilistic generation paradigm. For example, the generated responses may contain biases, discrimination, or other harmful content. Please do not propagate such content. We are not responsible for any consequences resulting from the dissemination of harmful information.
|
65 |
|
66 |
+
### Import from Transformers
|
67 |
+
To load the InternLM2-20B model using Transformers, use the following code:
|
68 |
+
```python
|
69 |
+
import torch
|
70 |
+
from transformers import AutoTokenizer, AutoModelForCausalLM
|
71 |
+
tokenizer = AutoTokenizer.from_pretrained("internlm/internlm2-20b", trust_remote_code=True)
|
72 |
+
# Set `torch_dtype=torch.float16` to load model in float16, otherwise it will be loaded as float32 and might cause OOM Error.
|
73 |
+
model = AutoModelForCausalLM.from_pretrained("internlm/internlm2-20b", torch_dtype=torch.float16, trust_remote_code=True).cuda()
|
74 |
+
model = model.eval()
|
75 |
+
inputs = tokenizer(["A beautiful flower"], return_tensors="pt")
|
76 |
+
for k,v in inputs.items():
|
77 |
+
inputs[k] = v.cuda()
|
78 |
+
gen_kwargs = {"max_length": 128, "top_p": 0.8, "temperature": 0.8, "do_sample": True, "repetition_penalty": 1.0}
|
79 |
+
output = model.generate(**inputs, **gen_kwargs)
|
80 |
+
output = tokenizer.decode(output[0].tolist(), skip_special_tokens=True)
|
81 |
+
print(output)
|
82 |
+
# A beautiful flower with a long history of use in Ayurveda and traditional Chinese medicine. Known for its ability to help the body adapt to stress, it is a calming and soothing herb. It is used for its ability to help promote healthy sleep patterns, calm the nervous system and to help the body adapt to stress. It is also used for its ability to help the body deal with the symptoms of anxiety and depression. It is also used for its ability to help the body adapt to stress. It is also used for its ability to help the body adapt to stress. It is also used for its ability to help the
|
83 |
```
|
84 |
|
85 |
+
## Open Source License
|
86 |
+
|
87 |
+
The code is licensed under Apache-2.0, while model weights are fully open for academic research and also allow **free** commercial usage. To apply for a commercial license, please fill in the [application form (English)](https://wj.qq.com/s2/12727483/5dba/)/[申请表(中文)](https://wj.qq.com/s2/12725412/f7c1/). For other questions or collaborations, please contact <internlm@pjlab.org.cn>.
|
88 |
+
|
89 |
+
## 简介
|
90 |
+
第二代浦语模型, InternLM2 包含 7B 和 20B 两个量级的模型。为了方便用户使用和研究,每个量级的模型我们总共开源了四个版本的模型,他们分别是
|
91 |
+
|
92 |
+
- internlm2-base: 高质量和具有很强可塑性的模型基座,是模型进行深度领域适配的高质量起点;
|
93 |
+
- internlm2(**推荐**): 在internlm2-base基础上,进一步在特定领域的语料上进行预训练,在评测中成绩优异,同时保持了很好的通用语言能力,是我们推荐的在大部分应用中考虑选用的优秀基座;
|
94 |
+
- internlm2-chat-sft:在Base基础上,进行有监督的人类对齐训练;
|
95 |
+
- internlm2-chat(**推荐**):在internlm2-chat-sft基础上,经过RLHF,面向对话交互进行了优化,具有很好的指令遵循、共情聊天和调用工具等的能力。
|
96 |
+
|
97 |
+
InternLM2 的基础模型具备以下的技术特点
|
98 |
+
|
99 |
+
- 有效支持20万字超长上下文:模型在20万字长输入中几乎完美地实现长文“大海捞针”,而且在 LongBench 和 L-Eval 等长文任务中的表现也达到开源模型中的领先水平。
|
100 |
+
- 综合性能全面提升:各能力维度相比上一代模型全面进步,在推理、数学、代码等方面的能力提升显著。
|
101 |
+
|
102 |
|
103 |
+
## InternLM2-20B
|
104 |
+
|
105 |
+
### 性能评测
|
106 |
+
|
107 |
+
我们使用开源评测工具 [OpenCompass](https://github.com/internLM/OpenCompass/) 对 InternLM2 在几个重要的评测集进行了评测 ,部分评测结果如下表所示,欢迎访问[ OpenCompass 榜单 ](https://opencompass.org.cn/rank)获取更多的评测结果。
|
108 |
+
|
109 |
+
| 评测集 | InternLM2-7B | InternLM2-Chat-7B | InternLM2-20B | InternLM2-Chat-20B | ChatGPT | GPT-4 |
|
110 |
+
| --- | --- | --- | --- | --- | --- | --- |
|
111 |
+
| MMLU | 65.8 | 63.7 | 67.7 | 66.5 | 69.1 | 83.0 |
|
112 |
+
| AGIEval | 49.9 | 47.2 | 53.0 | 50.3 | 39.9 | 55.1 |
|
113 |
+
| BBH | 65.0 | 61.2 | 72.1 | 68.3 | 70.1 | 86.7 |
|
114 |
+
| GSM8K | 70.8 | 70.7 | 76.1 | 79.6 | 78.2 | 91.4 |
|
115 |
+
| MATH | 20.2 | 23.0 | 25.5 | 31.9 | 28.0 | 45.8 |
|
116 |
+
| HumanEval | 43.3 | 59.8 | 48.8 | 67.1 | 73.2 | 74.4 |
|
117 |
+
| MBPP(Sanitized) | 51.8 | 51.4 | 63.0 | 65.8 | 78.9 | 79.0 |
|
118 |
+
|
119 |
+
- 以上评测结果基于 [OpenCompass](https://github.com/open-compass/opencompass) 获得(部分数据标注`*`代表数据来自原始论文),具体测试细节可参见 [OpenCompass](https://github.com/open-compass/opencompass) 中提供的配置文件。
|
120 |
+
- 评测数据会因 [OpenCompass](https://github.com/open-compass/opencompass) 的版本迭代而存在数值差异,请以 [OpenCompass](https://github.com/open-compass/opencompass) 最新版的评测结果为主。
|
121 |
+
|
122 |
+
|
123 |
+
**局限性:** 尽管在训练过程中我们非常注重模型的安全性,尽力促使模型输出符合伦理和法律要求的文本,但受限于模型大小以及概率生成范式,模型可能会产生各种不符合预期的输出,例如回复内容包含偏见、歧视等有害内容,请勿传播这些内容。由于传播不良信息导致的任何后果,本项目不承担责任。
|
124 |
+
|
125 |
+
### 通过 Transformers 加载
|
126 |
+
通过以下的代码加载 InternLM2-20B 模型进行文本续写
|
127 |
+
```python
|
128 |
+
import torch
|
129 |
+
from transformers import AutoTokenizer, AutoModelForCausalLM
|
130 |
+
tokenizer = AutoTokenizer.from_pretrained("internlm/internlm2-20b", trust_remote_code=True)
|
131 |
+
# `torch_dtype=torch.float16` 可以令模型以 float16 精度加载,否则 transformers 会将模型加载为 float32,有可能导致显存不足
|
132 |
+
model = AutoModelForCausalLM.from_pretrained("internlm/internlm2-20b", torch_dtype=torch.float16, trust_remote_code=True).cuda()
|
133 |
+
model = model.eval()
|
134 |
+
inputs = tokenizer(["来到美丽的大自然"], return_tensors="pt")
|
135 |
+
for k,v in inputs.items():
|
136 |
+
inputs[k] = v.cuda()
|
137 |
+
gen_kwargs = {"max_length": 128, "top_p": 0.8, "temperature": 0.8, "do_sample": True, "repetition_penalty": 1.0}
|
138 |
+
output = model.generate(**inputs, **gen_kwargs)
|
139 |
+
output = tokenizer.decode(output[0].tolist(), skip_special_tokens=True)
|
140 |
+
print(output)
|
141 |
+
# 来到美丽的大自然,我们欣赏着大自然的美丽风景,感受着大自然的气息。
|
142 |
+
# 今天,我来到了美丽的龙湾公园,这里风景秀丽,山清水秀,鸟语花香。一走进公园,我就被眼前的景象惊呆了:绿油油的草坪上,五颜六色的花朵竞相开放,散发出阵阵清香。微风吹来,花儿随风摆动,好像在向我们点头微笑。远处,巍峨的大山连绵起伏,好像一条巨龙在空中飞舞。山下,一条清澈的小河静静地流淌着,河里的鱼儿自由自在地
|
143 |
```
|
144 |
+
|
145 |
+
## 开源许可证
|
146 |
+
|
147 |
+
本仓库的代码依照 Apache-2.0 协议开源。模型权重对学术研究完全开放,也可申请免费的商业使用授权([申请表](https://wj.qq.com/s2/12725412/f7c1/))。其他问题与合作请联系 <internlm@pjlab.org.cn>。
|
config.json
ADDED
@@ -0,0 +1,25 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"_name_or_path": "/models/internlm2-20b",
|
3 |
+
"architectures": "LlamaForCausalLM",
|
4 |
+
"attn_implementation": "eager",
|
5 |
+
"bias": false,
|
6 |
+
"bos_token_id": 1,
|
7 |
+
"eos_token_id": 2,
|
8 |
+
"hidden_act": "silu",
|
9 |
+
"hidden_size": 6144,
|
10 |
+
"initializer_range": 0.02,
|
11 |
+
"intermediate_size": 16384,
|
12 |
+
"max_position_embeddings": 32768,
|
13 |
+
"model_type": "llama",
|
14 |
+
"num_attention_heads": 48,
|
15 |
+
"num_hidden_layers": 48,
|
16 |
+
"num_key_value_heads": 8,
|
17 |
+
"pad_token_id": 2,
|
18 |
+
"rms_norm_eps": 1e-05,
|
19 |
+
"rope_theta": 1000000,
|
20 |
+
"tie_word_embeddings": false,
|
21 |
+
"torch_dtype": "bfloat16",
|
22 |
+
"transformers_version": "4.36.2",
|
23 |
+
"use_cache": true,
|
24 |
+
"vocab_size": 92544
|
25 |
+
}
|
model.safetensors.index.json
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
{"metadata": {"mergekit_version": "0.0.3.2"}, "weight_map": {"model.layers.0.self_attn.o_proj.weight": "model-00001-of-00004.safetensors", "model.layers.0.self_attn.q_proj.weight": "model-00001-of-00004.safetensors", "model.layers.0.self_attn.k_proj.weight": "model-00001-of-00004.safetensors", "model.layers.0.self_attn.v_proj.weight": "model-00001-of-00004.safetensors", "model.layers.0.input_layernorm.weight": "model-00001-of-00004.safetensors", "model.layers.0.mlp.gate_proj.weight": "model-00001-of-00004.safetensors", "model.layers.0.mlp.down_proj.weight": "model-00001-of-00004.safetensors", "model.layers.0.mlp.up_proj.weight": "model-00001-of-00004.safetensors", "model.layers.0.post_attention_layernorm.weight": "model-00001-of-00004.safetensors", "model.layers.1.self_attn.o_proj.weight": "model-00001-of-00004.safetensors", "model.layers.1.self_attn.q_proj.weight": "model-00001-of-00004.safetensors", "model.layers.1.self_attn.k_proj.weight": "model-00001-of-00004.safetensors", "model.layers.1.self_attn.v_proj.weight": "model-00001-of-00004.safetensors", "model.layers.1.input_layernorm.weight": "model-00001-of-00004.safetensors", "model.layers.1.mlp.gate_proj.weight": "model-00001-of-00004.safetensors", "model.layers.1.mlp.down_proj.weight": "model-00001-of-00004.safetensors", "model.layers.1.mlp.up_proj.weight": "model-00001-of-00004.safetensors", "model.layers.1.post_attention_layernorm.weight": "model-00001-of-00004.safetensors", "model.layers.10.self_attn.o_proj.weight": "model-00001-of-00004.safetensors", "model.layers.10.self_attn.q_proj.weight": "model-00001-of-00004.safetensors", "model.layers.10.self_attn.k_proj.weight": "model-00001-of-00004.safetensors", "model.layers.10.self_attn.v_proj.weight": "model-00001-of-00004.safetensors", "model.layers.10.input_layernorm.weight": "model-00001-of-00004.safetensors", "model.layers.10.mlp.gate_proj.weight": "model-00001-of-00004.safetensors", "model.layers.10.mlp.down_proj.weight": "model-00001-of-00004.safetensors", "model.layers.10.mlp.up_proj.weight": "model-00001-of-00004.safetensors", "model.layers.10.post_attention_layernorm.weight": "model-00001-of-00004.safetensors", "model.layers.11.self_attn.o_proj.weight": "model-00001-of-00004.safetensors", "model.layers.11.self_attn.q_proj.weight": "model-00001-of-00004.safetensors", "model.layers.11.self_attn.k_proj.weight": "model-00001-of-00004.safetensors", "model.layers.11.self_attn.v_proj.weight": "model-00001-of-00004.safetensors", "model.layers.11.input_layernorm.weight": "model-00001-of-00004.safetensors", "model.layers.11.mlp.gate_proj.weight": "model-00001-of-00004.safetensors", "model.layers.11.mlp.down_proj.weight": "model-00001-of-00004.safetensors", "model.layers.11.mlp.up_proj.weight": "model-00001-of-00004.safetensors", "model.layers.11.post_attention_layernorm.weight": "model-00001-of-00004.safetensors", "model.layers.12.self_attn.o_proj.weight": "model-00001-of-00004.safetensors", "model.layers.12.self_attn.q_proj.weight": "model-00001-of-00004.safetensors", "model.layers.12.self_attn.k_proj.weight": "model-00001-of-00004.safetensors", "model.layers.12.self_attn.v_proj.weight": "model-00001-of-00004.safetensors", "model.layers.12.input_layernorm.weight": "model-00001-of-00004.safetensors", "model.layers.12.mlp.gate_proj.weight": "model-00001-of-00004.safetensors", "model.layers.12.mlp.down_proj.weight": "model-00001-of-00004.safetensors", "model.layers.12.mlp.up_proj.weight": "model-00001-of-00004.safetensors", "model.layers.12.post_attention_layernorm.weight": "model-00001-of-00004.safetensors", "model.layers.13.self_attn.o_proj.weight": "model-00001-of-00004.safetensors", "model.layers.13.self_attn.q_proj.weight": "model-00001-of-00004.safetensors", "model.layers.13.self_attn.k_proj.weight": "model-00001-of-00004.safetensors", "model.layers.13.self_attn.v_proj.weight": "model-00001-of-00004.safetensors", "model.layers.13.input_layernorm.weight": "model-00001-of-00004.safetensors", "model.layers.13.mlp.gate_proj.weight": "model-00001-of-00004.safetensors", "model.layers.13.mlp.down_proj.weight": "model-00001-of-00004.safetensors", "model.layers.13.mlp.up_proj.weight": "model-00001-of-00004.safetensors", "model.layers.13.post_attention_layernorm.weight": "model-00001-of-00004.safetensors", "model.layers.14.self_attn.o_proj.weight": "model-00001-of-00004.safetensors", "model.layers.14.self_attn.q_proj.weight": "model-00001-of-00004.safetensors", "model.layers.14.self_attn.k_proj.weight": "model-00001-of-00004.safetensors", "model.layers.14.self_attn.v_proj.weight": "model-00001-of-00004.safetensors", "model.layers.14.input_layernorm.weight": "model-00001-of-00004.safetensors", "model.layers.14.mlp.gate_proj.weight": "model-00001-of-00004.safetensors", "model.layers.14.mlp.down_proj.weight": "model-00001-of-00004.safetensors", "model.layers.14.mlp.up_proj.weight": "model-00001-of-00004.safetensors", "model.layers.14.post_attention_layernorm.weight": "model-00001-of-00004.safetensors", "model.layers.15.self_attn.o_proj.weight": "model-00001-of-00004.safetensors", "model.layers.15.self_attn.q_proj.weight": "model-00001-of-00004.safetensors", "model.layers.15.self_attn.k_proj.weight": "model-00001-of-00004.safetensors", "model.layers.15.self_attn.v_proj.weight": "model-00001-of-00004.safetensors", "model.layers.15.input_layernorm.weight": "model-00001-of-00004.safetensors", "model.layers.15.mlp.gate_proj.weight": "model-00001-of-00004.safetensors", "model.layers.15.mlp.down_proj.weight": "model-00001-of-00004.safetensors", "model.layers.15.mlp.up_proj.weight": "model-00001-of-00004.safetensors", "model.layers.15.post_attention_layernorm.weight": "model-00001-of-00004.safetensors", "model.layers.16.self_attn.o_proj.weight": "model-00001-of-00004.safetensors", "model.layers.16.self_attn.q_proj.weight": "model-00001-of-00004.safetensors", "model.layers.16.self_attn.k_proj.weight": "model-00001-of-00004.safetensors", "model.layers.16.self_attn.v_proj.weight": "model-00001-of-00004.safetensors", "model.layers.16.input_layernorm.weight": "model-00001-of-00004.safetensors", "model.layers.16.mlp.gate_proj.weight": "model-00001-of-00004.safetensors", "model.layers.16.mlp.down_proj.weight": "model-00001-of-00004.safetensors", "model.layers.16.mlp.up_proj.weight": "model-00001-of-00004.safetensors", "model.layers.16.post_attention_layernorm.weight": "model-00001-of-00004.safetensors", "model.layers.17.self_attn.o_proj.weight": "model-00001-of-00004.safetensors", "model.layers.17.self_attn.q_proj.weight": "model-00001-of-00004.safetensors", "model.layers.17.self_attn.k_proj.weight": "model-00001-of-00004.safetensors", "model.layers.17.self_attn.v_proj.weight": "model-00001-of-00004.safetensors", "model.layers.17.input_layernorm.weight": "model-00001-of-00004.safetensors", "model.layers.17.mlp.gate_proj.weight": "model-00001-of-00004.safetensors", "model.layers.17.mlp.down_proj.weight": "model-00001-of-00004.safetensors", "model.layers.17.mlp.up_proj.weight": "model-00001-of-00004.safetensors", "model.layers.17.post_attention_layernorm.weight": "model-00001-of-00004.safetensors", "model.layers.18.self_attn.o_proj.weight": "model-00001-of-00004.safetensors", "model.layers.18.self_attn.q_proj.weight": "model-00001-of-00004.safetensors", "model.layers.18.self_attn.k_proj.weight": "model-00001-of-00004.safetensors", "model.layers.18.self_attn.v_proj.weight": "model-00001-of-00004.safetensors", "model.layers.18.input_layernorm.weight": "model-00001-of-00004.safetensors", "model.layers.18.mlp.gate_proj.weight": "model-00001-of-00004.safetensors", "model.layers.18.mlp.down_proj.weight": "model-00001-of-00004.safetensors", "model.layers.18.mlp.up_proj.weight": "model-00001-of-00004.safetensors", "model.layers.18.post_attention_layernorm.weight": "model-00001-of-00004.safetensors", "model.layers.19.self_attn.o_proj.weight": "model-00001-of-00004.safetensors", "model.layers.19.self_attn.q_proj.weight": "model-00001-of-00004.safetensors", "model.layers.19.self_attn.k_proj.weight": "model-00001-of-00004.safetensors", "model.layers.19.self_attn.v_proj.weight": "model-00001-of-00004.safetensors", "model.layers.19.input_layernorm.weight": "model-00001-of-00004.safetensors", "model.layers.19.mlp.gate_proj.weight": "model-00001-of-00004.safetensors", "model.layers.19.mlp.down_proj.weight": "model-00001-of-00004.safetensors", "model.layers.19.mlp.up_proj.weight": "model-00001-of-00004.safetensors", "model.layers.19.post_attention_layernorm.weight": "model-00001-of-00004.safetensors", "model.layers.2.self_attn.o_proj.weight": "model-00001-of-00004.safetensors", "model.layers.2.self_attn.q_proj.weight": "model-00001-of-00004.safetensors", "model.layers.2.self_attn.k_proj.weight": "model-00001-of-00004.safetensors", "model.layers.2.self_attn.v_proj.weight": "model-00001-of-00004.safetensors", "model.layers.2.input_layernorm.weight": "model-00001-of-00004.safetensors", "model.layers.2.mlp.gate_proj.weight": "model-00001-of-00004.safetensors", "model.layers.2.mlp.down_proj.weight": "model-00001-of-00004.safetensors", "model.layers.2.mlp.up_proj.weight": "model-00002-of-00004.safetensors", "model.layers.2.post_attention_layernorm.weight": "model-00002-of-00004.safetensors", "model.layers.20.self_attn.o_proj.weight": "model-00002-of-00004.safetensors", "model.layers.20.self_attn.q_proj.weight": "model-00002-of-00004.safetensors", "model.layers.20.self_attn.k_proj.weight": "model-00002-of-00004.safetensors", "model.layers.20.self_attn.v_proj.weight": "model-00002-of-00004.safetensors", "model.layers.20.input_layernorm.weight": "model-00002-of-00004.safetensors", "model.layers.20.mlp.gate_proj.weight": "model-00002-of-00004.safetensors", "model.layers.20.mlp.down_proj.weight": "model-00002-of-00004.safetensors", "model.layers.20.mlp.up_proj.weight": "model-00002-of-00004.safetensors", "model.layers.20.post_attention_layernorm.weight": "model-00002-of-00004.safetensors", "model.layers.21.self_attn.o_proj.weight": "model-00002-of-00004.safetensors", "model.layers.21.self_attn.q_proj.weight": "model-00002-of-00004.safetensors", "model.layers.21.self_attn.k_proj.weight": "model-00002-of-00004.safetensors", "model.layers.21.self_attn.v_proj.weight": "model-00002-of-00004.safetensors", "model.layers.21.input_layernorm.weight": "model-00002-of-00004.safetensors", "model.layers.21.mlp.gate_proj.weight": "model-00002-of-00004.safetensors", "model.layers.21.mlp.down_proj.weight": "model-00002-of-00004.safetensors", "model.layers.21.mlp.up_proj.weight": "model-00002-of-00004.safetensors", "model.layers.21.post_attention_layernorm.weight": "model-00002-of-00004.safetensors", "model.layers.22.self_attn.o_proj.weight": "model-00002-of-00004.safetensors", "model.layers.22.self_attn.q_proj.weight": "model-00002-of-00004.safetensors", "model.layers.22.self_attn.k_proj.weight": "model-00002-of-00004.safetensors", "model.layers.22.self_attn.v_proj.weight": "model-00002-of-00004.safetensors", "model.layers.22.input_layernorm.weight": "model-00002-of-00004.safetensors", "model.layers.22.mlp.gate_proj.weight": "model-00002-of-00004.safetensors", "model.layers.22.mlp.down_proj.weight": "model-00002-of-00004.safetensors", "model.layers.22.mlp.up_proj.weight": "model-00002-of-00004.safetensors", "model.layers.22.post_attention_layernorm.weight": "model-00002-of-00004.safetensors", "model.layers.23.self_attn.o_proj.weight": "model-00002-of-00004.safetensors", "model.layers.23.self_attn.q_proj.weight": "model-00002-of-00004.safetensors", "model.layers.23.self_attn.k_proj.weight": "model-00002-of-00004.safetensors", "model.layers.23.self_attn.v_proj.weight": "model-00002-of-00004.safetensors", "model.layers.23.input_layernorm.weight": "model-00002-of-00004.safetensors", "model.layers.23.mlp.gate_proj.weight": "model-00002-of-00004.safetensors", "model.layers.23.mlp.down_proj.weight": "model-00002-of-00004.safetensors", "model.layers.23.mlp.up_proj.weight": "model-00002-of-00004.safetensors", "model.layers.23.post_attention_layernorm.weight": "model-00002-of-00004.safetensors", "model.layers.24.self_attn.o_proj.weight": "model-00002-of-00004.safetensors", "model.layers.24.self_attn.q_proj.weight": "model-00002-of-00004.safetensors", "model.layers.24.self_attn.k_proj.weight": "model-00002-of-00004.safetensors", "model.layers.24.self_attn.v_proj.weight": "model-00002-of-00004.safetensors", "model.layers.24.input_layernorm.weight": "model-00002-of-00004.safetensors", "model.layers.24.mlp.gate_proj.weight": "model-00002-of-00004.safetensors", "model.layers.24.mlp.down_proj.weight": "model-00002-of-00004.safetensors", "model.layers.24.mlp.up_proj.weight": "model-00002-of-00004.safetensors", "model.layers.24.post_attention_layernorm.weight": "model-00002-of-00004.safetensors", "model.layers.25.self_attn.o_proj.weight": "model-00002-of-00004.safetensors", "model.layers.25.self_attn.q_proj.weight": "model-00002-of-00004.safetensors", "model.layers.25.self_attn.k_proj.weight": "model-00002-of-00004.safetensors", "model.layers.25.self_attn.v_proj.weight": "model-00002-of-00004.safetensors", "model.layers.25.input_layernorm.weight": "model-00002-of-00004.safetensors", "model.layers.25.mlp.gate_proj.weight": "model-00002-of-00004.safetensors", "model.layers.25.mlp.down_proj.weight": "model-00002-of-00004.safetensors", "model.layers.25.mlp.up_proj.weight": "model-00002-of-00004.safetensors", "model.layers.25.post_attention_layernorm.weight": "model-00002-of-00004.safetensors", "model.layers.26.self_attn.o_proj.weight": "model-00002-of-00004.safetensors", "model.layers.26.self_attn.q_proj.weight": "model-00002-of-00004.safetensors", "model.layers.26.self_attn.k_proj.weight": "model-00002-of-00004.safetensors", "model.layers.26.self_attn.v_proj.weight": "model-00002-of-00004.safetensors", "model.layers.26.input_layernorm.weight": "model-00002-of-00004.safetensors", "model.layers.26.mlp.gate_proj.weight": "model-00002-of-00004.safetensors", "model.layers.26.mlp.down_proj.weight": "model-00002-of-00004.safetensors", "model.layers.26.mlp.up_proj.weight": "model-00002-of-00004.safetensors", "model.layers.26.post_attention_layernorm.weight": "model-00002-of-00004.safetensors", "model.layers.27.self_attn.o_proj.weight": "model-00002-of-00004.safetensors", "model.layers.27.self_attn.q_proj.weight": "model-00002-of-00004.safetensors", "model.layers.27.self_attn.k_proj.weight": "model-00002-of-00004.safetensors", "model.layers.27.self_attn.v_proj.weight": "model-00002-of-00004.safetensors", "model.layers.27.input_layernorm.weight": "model-00002-of-00004.safetensors", "model.layers.27.mlp.gate_proj.weight": "model-00002-of-00004.safetensors", "model.layers.27.mlp.down_proj.weight": "model-00002-of-00004.safetensors", "model.layers.27.mlp.up_proj.weight": "model-00002-of-00004.safetensors", "model.layers.27.post_attention_layernorm.weight": "model-00002-of-00004.safetensors", "model.layers.28.self_attn.o_proj.weight": "model-00002-of-00004.safetensors", "model.layers.28.self_attn.q_proj.weight": "model-00002-of-00004.safetensors", "model.layers.28.self_attn.k_proj.weight": "model-00002-of-00004.safetensors", "model.layers.28.self_attn.v_proj.weight": "model-00002-of-00004.safetensors", "model.layers.28.input_layernorm.weight": "model-00002-of-00004.safetensors", "model.layers.28.mlp.gate_proj.weight": "model-00002-of-00004.safetensors", "model.layers.28.mlp.down_proj.weight": "model-00002-of-00004.safetensors", "model.layers.28.mlp.up_proj.weight": "model-00002-of-00004.safetensors", "model.layers.28.post_attention_layernorm.weight": "model-00002-of-00004.safetensors", "model.layers.29.self_attn.o_proj.weight": "model-00002-of-00004.safetensors", "model.layers.29.self_attn.q_proj.weight": "model-00002-of-00004.safetensors", "model.layers.29.self_attn.k_proj.weight": "model-00002-of-00004.safetensors", "model.layers.29.self_attn.v_proj.weight": "model-00002-of-00004.safetensors", "model.layers.29.input_layernorm.weight": "model-00002-of-00004.safetensors", "model.layers.29.mlp.gate_proj.weight": "model-00002-of-00004.safetensors", "model.layers.29.mlp.down_proj.weight": "model-00002-of-00004.safetensors", "model.layers.29.mlp.up_proj.weight": "model-00002-of-00004.safetensors", "model.layers.29.post_attention_layernorm.weight": "model-00002-of-00004.safetensors", "model.layers.3.self_attn.o_proj.weight": "model-00002-of-00004.safetensors", "model.layers.3.self_attn.q_proj.weight": "model-00002-of-00004.safetensors", "model.layers.3.self_attn.k_proj.weight": "model-00002-of-00004.safetensors", "model.layers.3.self_attn.v_proj.weight": "model-00002-of-00004.safetensors", "model.layers.3.input_layernorm.weight": "model-00002-of-00004.safetensors", "model.layers.3.mlp.gate_proj.weight": "model-00002-of-00004.safetensors", "model.layers.3.mlp.down_proj.weight": "model-00002-of-00004.safetensors", "model.layers.3.mlp.up_proj.weight": "model-00002-of-00004.safetensors", "model.layers.3.post_attention_layernorm.weight": "model-00002-of-00004.safetensors", "model.layers.30.self_attn.o_proj.weight": "model-00002-of-00004.safetensors", "model.layers.30.self_attn.q_proj.weight": "model-00002-of-00004.safetensors", "model.layers.30.self_attn.k_proj.weight": "model-00002-of-00004.safetensors", "model.layers.30.self_attn.v_proj.weight": "model-00002-of-00004.safetensors", "model.layers.30.input_layernorm.weight": "model-00002-of-00004.safetensors", "model.layers.30.mlp.gate_proj.weight": "model-00002-of-00004.safetensors", "model.layers.30.mlp.down_proj.weight": "model-00002-of-00004.safetensors", "model.layers.30.mlp.up_proj.weight": "model-00002-of-00004.safetensors", "model.layers.30.post_attention_layernorm.weight": "model-00002-of-00004.safetensors", "model.layers.31.self_attn.o_proj.weight": "model-00002-of-00004.safetensors", "model.layers.31.self_attn.q_proj.weight": "model-00002-of-00004.safetensors", "model.layers.31.self_attn.k_proj.weight": "model-00002-of-00004.safetensors", "model.layers.31.self_attn.v_proj.weight": "model-00002-of-00004.safetensors", "model.layers.31.input_layernorm.weight": "model-00002-of-00004.safetensors", "model.layers.31.mlp.gate_proj.weight": "model-00002-of-00004.safetensors", "model.layers.31.mlp.down_proj.weight": "model-00003-of-00004.safetensors", "model.layers.31.mlp.up_proj.weight": "model-00003-of-00004.safetensors", "model.layers.31.post_attention_layernorm.weight": "model-00003-of-00004.safetensors", "model.layers.32.self_attn.o_proj.weight": "model-00003-of-00004.safetensors", "model.layers.32.self_attn.q_proj.weight": "model-00003-of-00004.safetensors", "model.layers.32.self_attn.k_proj.weight": "model-00003-of-00004.safetensors", "model.layers.32.self_attn.v_proj.weight": "model-00003-of-00004.safetensors", "model.layers.32.input_layernorm.weight": "model-00003-of-00004.safetensors", "model.layers.32.mlp.gate_proj.weight": "model-00003-of-00004.safetensors", "model.layers.32.mlp.down_proj.weight": "model-00003-of-00004.safetensors", "model.layers.32.mlp.up_proj.weight": "model-00003-of-00004.safetensors", "model.layers.32.post_attention_layernorm.weight": "model-00003-of-00004.safetensors", "model.layers.33.self_attn.o_proj.weight": "model-00003-of-00004.safetensors", "model.layers.33.self_attn.q_proj.weight": "model-00003-of-00004.safetensors", "model.layers.33.self_attn.k_proj.weight": "model-00003-of-00004.safetensors", "model.layers.33.self_attn.v_proj.weight": "model-00003-of-00004.safetensors", "model.layers.33.input_layernorm.weight": "model-00003-of-00004.safetensors", "model.layers.33.mlp.gate_proj.weight": "model-00003-of-00004.safetensors", "model.layers.33.mlp.down_proj.weight": "model-00003-of-00004.safetensors", "model.layers.33.mlp.up_proj.weight": "model-00003-of-00004.safetensors", "model.layers.33.post_attention_layernorm.weight": "model-00003-of-00004.safetensors", "model.layers.34.self_attn.o_proj.weight": "model-00003-of-00004.safetensors", "model.layers.34.self_attn.q_proj.weight": "model-00003-of-00004.safetensors", "model.layers.34.self_attn.k_proj.weight": "model-00003-of-00004.safetensors", "model.layers.34.self_attn.v_proj.weight": "model-00003-of-00004.safetensors", "model.layers.34.input_layernorm.weight": "model-00003-of-00004.safetensors", "model.layers.34.mlp.gate_proj.weight": "model-00003-of-00004.safetensors", "model.layers.34.mlp.down_proj.weight": "model-00003-of-00004.safetensors", "model.layers.34.mlp.up_proj.weight": "model-00003-of-00004.safetensors", "model.layers.34.post_attention_layernorm.weight": "model-00003-of-00004.safetensors", "model.layers.35.self_attn.o_proj.weight": "model-00003-of-00004.safetensors", "model.layers.35.self_attn.q_proj.weight": "model-00003-of-00004.safetensors", "model.layers.35.self_attn.k_proj.weight": "model-00003-of-00004.safetensors", "model.layers.35.self_attn.v_proj.weight": "model-00003-of-00004.safetensors", "model.layers.35.input_layernorm.weight": "model-00003-of-00004.safetensors", "model.layers.35.mlp.gate_proj.weight": "model-00003-of-00004.safetensors", "model.layers.35.mlp.down_proj.weight": "model-00003-of-00004.safetensors", "model.layers.35.mlp.up_proj.weight": "model-00003-of-00004.safetensors", "model.layers.35.post_attention_layernorm.weight": "model-00003-of-00004.safetensors", "model.layers.36.self_attn.o_proj.weight": "model-00003-of-00004.safetensors", "model.layers.36.self_attn.q_proj.weight": "model-00003-of-00004.safetensors", "model.layers.36.self_attn.k_proj.weight": "model-00003-of-00004.safetensors", "model.layers.36.self_attn.v_proj.weight": "model-00003-of-00004.safetensors", "model.layers.36.input_layernorm.weight": "model-00003-of-00004.safetensors", "model.layers.36.mlp.gate_proj.weight": "model-00003-of-00004.safetensors", "model.layers.36.mlp.down_proj.weight": "model-00003-of-00004.safetensors", "model.layers.36.mlp.up_proj.weight": "model-00003-of-00004.safetensors", "model.layers.36.post_attention_layernorm.weight": "model-00003-of-00004.safetensors", "model.layers.37.self_attn.o_proj.weight": "model-00003-of-00004.safetensors", "model.layers.37.self_attn.q_proj.weight": "model-00003-of-00004.safetensors", "model.layers.37.self_attn.k_proj.weight": "model-00003-of-00004.safetensors", "model.layers.37.self_attn.v_proj.weight": "model-00003-of-00004.safetensors", "model.layers.37.input_layernorm.weight": "model-00003-of-00004.safetensors", "model.layers.37.mlp.gate_proj.weight": "model-00003-of-00004.safetensors", "model.layers.37.mlp.down_proj.weight": "model-00003-of-00004.safetensors", "model.layers.37.mlp.up_proj.weight": "model-00003-of-00004.safetensors", "model.layers.37.post_attention_layernorm.weight": "model-00003-of-00004.safetensors", "model.layers.38.self_attn.o_proj.weight": "model-00003-of-00004.safetensors", "model.layers.38.self_attn.q_proj.weight": "model-00003-of-00004.safetensors", "model.layers.38.self_attn.k_proj.weight": "model-00003-of-00004.safetensors", "model.layers.38.self_attn.v_proj.weight": "model-00003-of-00004.safetensors", "model.layers.38.input_layernorm.weight": "model-00003-of-00004.safetensors", "model.layers.38.mlp.gate_proj.weight": "model-00003-of-00004.safetensors", "model.layers.38.mlp.down_proj.weight": "model-00003-of-00004.safetensors", "model.layers.38.mlp.up_proj.weight": "model-00003-of-00004.safetensors", "model.layers.38.post_attention_layernorm.weight": "model-00003-of-00004.safetensors", "model.layers.39.self_attn.o_proj.weight": "model-00003-of-00004.safetensors", "model.layers.39.self_attn.q_proj.weight": "model-00003-of-00004.safetensors", "model.layers.39.self_attn.k_proj.weight": "model-00003-of-00004.safetensors", "model.layers.39.self_attn.v_proj.weight": "model-00003-of-00004.safetensors", "model.layers.39.input_layernorm.weight": "model-00003-of-00004.safetensors", "model.layers.39.mlp.gate_proj.weight": "model-00003-of-00004.safetensors", "model.layers.39.mlp.down_proj.weight": "model-00003-of-00004.safetensors", "model.layers.39.mlp.up_proj.weight": "model-00003-of-00004.safetensors", "model.layers.39.post_attention_layernorm.weight": "model-00003-of-00004.safetensors", "model.layers.4.self_attn.o_proj.weight": "model-00003-of-00004.safetensors", "model.layers.4.self_attn.q_proj.weight": "model-00003-of-00004.safetensors", "model.layers.4.self_attn.k_proj.weight": "model-00003-of-00004.safetensors", "model.layers.4.self_attn.v_proj.weight": "model-00003-of-00004.safetensors", "model.layers.4.input_layernorm.weight": "model-00003-of-00004.safetensors", "model.layers.4.mlp.gate_proj.weight": "model-00003-of-00004.safetensors", "model.layers.4.mlp.down_proj.weight": "model-00003-of-00004.safetensors", "model.layers.4.mlp.up_proj.weight": "model-00003-of-00004.safetensors", "model.layers.4.post_attention_layernorm.weight": "model-00003-of-00004.safetensors", "model.layers.40.self_attn.o_proj.weight": "model-00003-of-00004.safetensors", "model.layers.40.self_attn.q_proj.weight": "model-00003-of-00004.safetensors", "model.layers.40.self_attn.k_proj.weight": "model-00003-of-00004.safetensors", "model.layers.40.self_attn.v_proj.weight": "model-00003-of-00004.safetensors", "model.layers.40.input_layernorm.weight": "model-00003-of-00004.safetensors", "model.layers.40.mlp.gate_proj.weight": "model-00003-of-00004.safetensors", "model.layers.40.mlp.down_proj.weight": "model-00003-of-00004.safetensors", "model.layers.40.mlp.up_proj.weight": "model-00003-of-00004.safetensors", "model.layers.40.post_attention_layernorm.weight": "model-00003-of-00004.safetensors", "model.layers.41.self_attn.o_proj.weight": "model-00003-of-00004.safetensors", "model.layers.41.self_attn.q_proj.weight": "model-00003-of-00004.safetensors", "model.layers.41.self_attn.k_proj.weight": "model-00003-of-00004.safetensors", "model.layers.41.self_attn.v_proj.weight": "model-00003-of-00004.safetensors", "model.layers.41.input_layernorm.weight": "model-00003-of-00004.safetensors", "model.layers.41.mlp.gate_proj.weight": "model-00003-of-00004.safetensors", "model.layers.41.mlp.down_proj.weight": "model-00003-of-00004.safetensors", "model.layers.41.mlp.up_proj.weight": "model-00003-of-00004.safetensors", "model.layers.41.post_attention_layernorm.weight": "model-00003-of-00004.safetensors", "model.layers.42.self_attn.o_proj.weight": "model-00003-of-00004.safetensors", "model.layers.42.self_attn.q_proj.weight": "model-00003-of-00004.safetensors", "model.layers.42.self_attn.k_proj.weight": "model-00003-of-00004.safetensors", "model.layers.42.self_attn.v_proj.weight": "model-00003-of-00004.safetensors", "model.layers.42.input_layernorm.weight": "model-00003-of-00004.safetensors", "model.layers.42.mlp.gate_proj.weight": "model-00003-of-00004.safetensors", "model.layers.42.mlp.down_proj.weight": "model-00003-of-00004.safetensors", "model.layers.42.mlp.up_proj.weight": "model-00003-of-00004.safetensors", "model.layers.42.post_attention_layernorm.weight": "model-00003-of-00004.safetensors", "model.layers.43.self_attn.o_proj.weight": "model-00003-of-00004.safetensors", "model.layers.43.self_attn.q_proj.weight": "model-00003-of-00004.safetensors", "model.layers.43.self_attn.k_proj.weight": "model-00003-of-00004.safetensors", "model.layers.43.self_attn.v_proj.weight": "model-00003-of-00004.safetensors", "model.layers.43.input_layernorm.weight": "model-00003-of-00004.safetensors", "model.layers.43.mlp.gate_proj.weight": "model-00004-of-00004.safetensors", "model.layers.43.mlp.down_proj.weight": "model-00004-of-00004.safetensors", "model.layers.43.mlp.up_proj.weight": "model-00004-of-00004.safetensors", "model.layers.43.post_attention_layernorm.weight": "model-00004-of-00004.safetensors", "model.layers.44.self_attn.o_proj.weight": "model-00004-of-00004.safetensors", "model.layers.44.self_attn.q_proj.weight": "model-00004-of-00004.safetensors", "model.layers.44.self_attn.k_proj.weight": "model-00004-of-00004.safetensors", "model.layers.44.self_attn.v_proj.weight": "model-00004-of-00004.safetensors", "model.layers.44.input_layernorm.weight": "model-00004-of-00004.safetensors", "model.layers.44.mlp.gate_proj.weight": "model-00004-of-00004.safetensors", "model.layers.44.mlp.down_proj.weight": "model-00004-of-00004.safetensors", "model.layers.44.mlp.up_proj.weight": "model-00004-of-00004.safetensors", "model.layers.44.post_attention_layernorm.weight": "model-00004-of-00004.safetensors", "model.layers.45.self_attn.o_proj.weight": "model-00004-of-00004.safetensors", "model.layers.45.self_attn.q_proj.weight": "model-00004-of-00004.safetensors", "model.layers.45.self_attn.k_proj.weight": "model-00004-of-00004.safetensors", "model.layers.45.self_attn.v_proj.weight": "model-00004-of-00004.safetensors", "model.layers.45.input_layernorm.weight": "model-00004-of-00004.safetensors", "model.layers.45.mlp.gate_proj.weight": "model-00004-of-00004.safetensors", "model.layers.45.mlp.down_proj.weight": "model-00004-of-00004.safetensors", "model.layers.45.mlp.up_proj.weight": "model-00004-of-00004.safetensors", "model.layers.45.post_attention_layernorm.weight": "model-00004-of-00004.safetensors", "model.layers.46.self_attn.o_proj.weight": "model-00004-of-00004.safetensors", "model.layers.46.self_attn.q_proj.weight": "model-00004-of-00004.safetensors", "model.layers.46.self_attn.k_proj.weight": "model-00004-of-00004.safetensors", "model.layers.46.self_attn.v_proj.weight": "model-00004-of-00004.safetensors", "model.layers.46.input_layernorm.weight": "model-00004-of-00004.safetensors", "model.layers.46.mlp.gate_proj.weight": "model-00004-of-00004.safetensors", "model.layers.46.mlp.down_proj.weight": "model-00004-of-00004.safetensors", "model.layers.46.mlp.up_proj.weight": "model-00004-of-00004.safetensors", "model.layers.46.post_attention_layernorm.weight": "model-00004-of-00004.safetensors", "model.layers.47.self_attn.o_proj.weight": "model-00004-of-00004.safetensors", "model.layers.47.self_attn.q_proj.weight": "model-00004-of-00004.safetensors", "model.layers.47.self_attn.k_proj.weight": "model-00004-of-00004.safetensors", "model.layers.47.self_attn.v_proj.weight": "model-00004-of-00004.safetensors", "model.layers.47.input_layernorm.weight": "model-00004-of-00004.safetensors", "model.layers.47.mlp.gate_proj.weight": "model-00004-of-00004.safetensors", "model.layers.47.mlp.down_proj.weight": "model-00004-of-00004.safetensors", "model.layers.47.mlp.up_proj.weight": "model-00004-of-00004.safetensors", "model.layers.47.post_attention_layernorm.weight": "model-00004-of-00004.safetensors", "model.layers.5.self_attn.o_proj.weight": "model-00004-of-00004.safetensors", "model.layers.5.self_attn.q_proj.weight": "model-00004-of-00004.safetensors", "model.layers.5.self_attn.k_proj.weight": "model-00004-of-00004.safetensors", "model.layers.5.self_attn.v_proj.weight": "model-00004-of-00004.safetensors", "model.layers.5.input_layernorm.weight": "model-00004-of-00004.safetensors", "model.layers.5.mlp.gate_proj.weight": "model-00004-of-00004.safetensors", "model.layers.5.mlp.down_proj.weight": "model-00004-of-00004.safetensors", "model.layers.5.mlp.up_proj.weight": "model-00004-of-00004.safetensors", "model.layers.5.post_attention_layernorm.weight": "model-00004-of-00004.safetensors", "model.layers.6.self_attn.o_proj.weight": "model-00004-of-00004.safetensors", "model.layers.6.self_attn.q_proj.weight": "model-00004-of-00004.safetensors", "model.layers.6.self_attn.k_proj.weight": "model-00004-of-00004.safetensors", "model.layers.6.self_attn.v_proj.weight": "model-00004-of-00004.safetensors", "model.layers.6.input_layernorm.weight": "model-00004-of-00004.safetensors", "model.layers.6.mlp.gate_proj.weight": "model-00004-of-00004.safetensors", "model.layers.6.mlp.down_proj.weight": "model-00004-of-00004.safetensors", "model.layers.6.mlp.up_proj.weight": "model-00004-of-00004.safetensors", "model.layers.6.post_attention_layernorm.weight": "model-00004-of-00004.safetensors", "model.layers.7.self_attn.o_proj.weight": "model-00004-of-00004.safetensors", "model.layers.7.self_attn.q_proj.weight": "model-00004-of-00004.safetensors", "model.layers.7.self_attn.k_proj.weight": "model-00004-of-00004.safetensors", "model.layers.7.self_attn.v_proj.weight": "model-00004-of-00004.safetensors", "model.layers.7.input_layernorm.weight": "model-00004-of-00004.safetensors", "model.layers.7.mlp.gate_proj.weight": "model-00004-of-00004.safetensors", "model.layers.7.mlp.down_proj.weight": "model-00004-of-00004.safetensors", "model.layers.7.mlp.up_proj.weight": "model-00004-of-00004.safetensors", "model.layers.7.post_attention_layernorm.weight": "model-00004-of-00004.safetensors", "model.layers.8.self_attn.o_proj.weight": "model-00004-of-00004.safetensors", "model.layers.8.self_attn.q_proj.weight": "model-00004-of-00004.safetensors", "model.layers.8.self_attn.k_proj.weight": "model-00004-of-00004.safetensors", "model.layers.8.self_attn.v_proj.weight": "model-00004-of-00004.safetensors", "model.layers.8.input_layernorm.weight": "model-00004-of-00004.safetensors", "model.layers.8.mlp.gate_proj.weight": "model-00004-of-00004.safetensors", "model.layers.8.mlp.down_proj.weight": "model-00004-of-00004.safetensors", "model.layers.8.mlp.up_proj.weight": "model-00004-of-00004.safetensors", "model.layers.8.post_attention_layernorm.weight": "model-00004-of-00004.safetensors", "model.layers.9.self_attn.o_proj.weight": "model-00004-of-00004.safetensors", "model.layers.9.self_attn.q_proj.weight": "model-00004-of-00004.safetensors", "model.layers.9.self_attn.k_proj.weight": "model-00004-of-00004.safetensors", "model.layers.9.self_attn.v_proj.weight": "model-00004-of-00004.safetensors", "model.layers.9.input_layernorm.weight": "model-00004-of-00004.safetensors", "model.layers.9.mlp.gate_proj.weight": "model-00004-of-00004.safetensors", "model.layers.9.mlp.down_proj.weight": "model-00004-of-00004.safetensors", "model.layers.9.mlp.up_proj.weight": "model-00004-of-00004.safetensors", "model.layers.9.post_attention_layernorm.weight": "model-00004-of-00004.safetensors", "model.norm.weight": "model-00004-of-00004.safetensors", "model.embed_tokens.weight": "model-00004-of-00004.safetensors", "lm_head.weight": "model-00004-of-00004.safetensors"}}
|
original_repo_url.txt
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
https://huggingface.co/internlm/internlm2-20b
|
output-00001-of-00002.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:9428055bda6e59b3c375dee9faa789ea32a31a0561d536c5940a47ecb47e3e59
|
3 |
+
size 8551363348
|
output-00002-of-00002.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:ef330af808f206dc1f3ab96ec52e28657abf043b494b3428c0d60f3aeb04fbf7
|
3 |
+
size 1240802504
|
special_tokens_map.json
ADDED
@@ -0,0 +1,30 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"bos_token": {
|
3 |
+
"content": "<s>",
|
4 |
+
"lstrip": false,
|
5 |
+
"normalized": false,
|
6 |
+
"rstrip": false,
|
7 |
+
"single_word": false
|
8 |
+
},
|
9 |
+
"eos_token": {
|
10 |
+
"content": "</s>",
|
11 |
+
"lstrip": false,
|
12 |
+
"normalized": false,
|
13 |
+
"rstrip": false,
|
14 |
+
"single_word": false
|
15 |
+
},
|
16 |
+
"pad_token": {
|
17 |
+
"content": "</s>",
|
18 |
+
"lstrip": false,
|
19 |
+
"normalized": false,
|
20 |
+
"rstrip": false,
|
21 |
+
"single_word": false
|
22 |
+
},
|
23 |
+
"unk_token": {
|
24 |
+
"content": "<unk>",
|
25 |
+
"lstrip": false,
|
26 |
+
"normalized": false,
|
27 |
+
"rstrip": false,
|
28 |
+
"single_word": false
|
29 |
+
}
|
30 |
+
}
|
tokenizer.model
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:f868398fc4e05ee1e8aeba95ddf18ddcc45b8bce55d5093bead5bbf80429b48b
|
3 |
+
size 1477754
|
tokenizer_config.json
ADDED
@@ -0,0 +1,48 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"add_bos_token": true,
|
3 |
+
"add_eos_token": false,
|
4 |
+
"added_tokens_decoder": {
|
5 |
+
"0": {
|
6 |
+
"content": "<unk>",
|
7 |
+
"lstrip": false,
|
8 |
+
"normalized": false,
|
9 |
+
"rstrip": false,
|
10 |
+
"single_word": false,
|
11 |
+
"special": true
|
12 |
+
},
|
13 |
+
"1": {
|
14 |
+
"content": "<s>",
|
15 |
+
"lstrip": false,
|
16 |
+
"normalized": false,
|
17 |
+
"rstrip": false,
|
18 |
+
"single_word": false,
|
19 |
+
"special": true
|
20 |
+
},
|
21 |
+
"2": {
|
22 |
+
"content": "</s>",
|
23 |
+
"lstrip": false,
|
24 |
+
"normalized": false,
|
25 |
+
"rstrip": false,
|
26 |
+
"single_word": false,
|
27 |
+
"special": true
|
28 |
+
}
|
29 |
+
},
|
30 |
+
"auto_map": {
|
31 |
+
"AutoTokenizer": [
|
32 |
+
"tokenization_internlm.InternLMTokenizer",
|
33 |
+
null
|
34 |
+
]
|
35 |
+
},
|
36 |
+
"bos_token": "<s>",
|
37 |
+
"clean_up_tokenization_spaces": true,
|
38 |
+
"eos_token": "</s>",
|
39 |
+
"legacy": true,
|
40 |
+
"model_max_length": 1000000000000000019884624838656,
|
41 |
+
"pad_token": "</s>",
|
42 |
+
"sp_model_kwargs": {},
|
43 |
+
"spaces_between_special_tokens": false,
|
44 |
+
"tokenizer_class": "LlamaTokenizer",
|
45 |
+
"trust_remote_code": false,
|
46 |
+
"unk_token": "<unk>",
|
47 |
+
"use_default_system_prompt": false
|
48 |
+
}
|