bartowski commited on
Commit
24ab84d
·
verified ·
1 Parent(s): cc78b24

Quant for 5.0

Browse files
MODEL_LICENSE ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ The aiXcoder Model License
2
+
3
+ 1. Definitions
4
+
5
+ “Licensor” means the aiXcoder Model Team that distributes its Software.
6
+
7
+ “Software” means the aiXcoder model parameters made available under this license.
8
+
9
+ 2. License Grant
10
+
11
+ Subject to the terms and conditions of this License, the Licensor hereby grants to you a non-exclusive, worldwide, non-transferable, non-sublicensable, revocable, royalty-free copyright license to use the Software solely for your non-commercial research purposes.
12
+
13
+ The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
14
+
15
+ 3. Restriction
16
+
17
+ You will not use, copy, modify, merge, publish, distribute, reproduce, or create derivative works of the Software, in whole or in part, for any commercial, military, or illegal purposes.
18
+
19
+ You will not use the Software for any act that may undermine China's national security and national unity, harm the public interest of society, or infringe upon the rights and interests of human beings.
20
+
21
+ 4. Disclaimer
22
+
23
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
24
+
25
+ 5. Limitation of Liability
26
+
27
+ EXCEPT TO THE EXTENT PROHIBITED BY APPLICABLE LAW, IN NO EVENT AND UNDER NO LEGAL THEORY, WHETHER BASED IN TORT, NEGLIGENCE, CONTRACT, LIABILITY, OR OTHERWISE WILL ANY LICENSOR BE LIABLE TO YOU FOR ANY DIRECT, INDIRECT, SPECIAL, INCIDENTAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES, OR ANY OTHER COMMERCIAL LOSSES, EVEN IF THE LICENSOR HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
28
+
29
+ 6. Dispute Resolution
30
+
31
+ This license shall be governed and construed in accordance with the laws of People’s Republic of China. Any dispute arising from or in connection with this License shall be submitted to Haidian District People's Court in Beijing.
32
+
33
+ Note that the license is subject to update to a more comprehensive version. For any questions related to the license and copyright, please contact us at license@aixcoder.com.
README.md CHANGED
@@ -1,80 +1,242 @@
1
- ---
2
 
3
- quantized_by: bartowski
4
- pipeline_tag: text-generation
5
- ---
6
 
7
- ## Exllama v2 Quantizations of aixcoder-7b-base
8
 
9
- Using <a href="https://github.com/turboderp/exllamav2/releases/tag/v0.0.18">turboderp's ExLlamaV2 v0.0.18</a> for quantization.
10
 
11
- <b>The "main" branch only contains the measurement.json, download one of the other branches for the model (see below)</b>
 
 
 
 
 
 
12
 
13
- Each branch contains an individual bits per weight, with the main one containing only the meaurement.json for further conversions.
14
 
15
- Original model: https://huggingface.co/aiXcoder/aixcoder-7b-base
16
 
17
- ## Prompt format
18
 
19
- No chat template specified so default is used. This may be incorrect, check original model card for details.
20
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21
  ```
22
- <|im_start|>system
23
- {system_prompt}<|im_end|>
24
- <|im_start|>user
25
- {prompt}<|im_end|>
26
- <|im_start|>assistant
27
- <|im_end|>
28
- <|im_start|>assistant
29
 
 
 
 
 
 
 
 
 
 
 
30
  ```
31
 
32
- ## Available sizes
 
 
33
 
 
34
 
35
- | Branch | Bits | lm_head bits | VRAM (4k) | VRAM (16k) | VRAM (32k) | Description |
36
- | ----- | ---- | ------- | ------ | ------ | ------ | ------------ |
37
- | [8_0](https://huggingface.co/bartowski/aixcoder-7b-base-exl2/tree/8_0) | 8.0 | 8.0 | 8.4 GB | 9.8 GB | 11.8 GB | Maximum quality that ExLlamaV2 can produce, near unquantized performance. |
38
- | [6_5](https://huggingface.co/bartowski/aixcoder-7b-base-exl2/tree/6_5) | 6.5 | 8.0 | 7.2 GB | 8.6 GB | 10.6 GB | Very similar to 8.0, good tradeoff of size vs performance, **recommended**. |
39
- | [5_0](https://huggingface.co/bartowski/aixcoder-7b-base-exl2/tree/5_0) | 5.0 | 6.0 | 6.0 GB | 7.4 GB | 9.4 GB | Slightly lower quality vs 6.5, but usable on 8GB cards. |
40
- | [4_25](https://huggingface.co/bartowski/aixcoder-7b-base-exl2/tree/4_25) | 4.25 | 6.0 | 5.3 GB | 6.7 GB | 8.7 GB | GPTQ equivalent bits per weight, slightly higher quality. |
41
- | [3_5](https://huggingface.co/bartowski/aixcoder-7b-base-exl2/tree/3_5) | 3.5 | 6.0 | 4.7 GB | 6.1 GB | 8.1 GB | Lower quality, only use if you have to. |
42
 
43
- ## Download instructions
 
 
44
 
45
- With git:
46
 
47
- ```shell
48
- git clone --single-branch --branch 6_5 https://huggingface.co/bartowski/aixcoder-7b-base-exl2 aixcoder-7b-base-exl2-6_5
 
 
 
49
  ```
50
 
51
- With huggingface hub (credit to TheBloke for instructions):
52
 
53
- ```shell
54
- pip3 install huggingface-hub
 
 
 
 
55
  ```
56
 
57
- To download the `main` (only useful if you only care about measurement.json) branch to a folder called `aixcoder-7b-base-exl2`:
 
 
 
 
 
 
 
 
 
 
 
 
 
58
 
59
- ```shell
60
- mkdir aixcoder-7b-base-exl2
61
- huggingface-cli download bartowski/aixcoder-7b-base-exl2 --local-dir aixcoder-7b-base-exl2 --local-dir-use-symlinks False
 
 
 
 
 
 
 
 
 
 
62
  ```
63
 
64
- To download from a different branch, add the `--revision` parameter:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
65
 
66
- Linux:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
67
 
68
- ```shell
69
- mkdir aixcoder-7b-base-exl2-6_5
70
- huggingface-cli download bartowski/aixcoder-7b-base-exl2 --revision 6_5 --local-dir aixcoder-7b-base-exl2-6_5 --local-dir-use-symlinks False
71
  ```
72
 
73
- Windows (which apparently doesn't like _ in folders sometimes?):
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
74
 
75
- ```shell
76
- mkdir aixcoder-7b-base-exl2-6.5
77
- huggingface-cli download bartowski/aixcoder-7b-base-exl2 --revision 6_5 --local-dir aixcoder-7b-base-exl2-6.5 --local-dir-use-symlinks False
78
  ```
79
 
80
- Want to support my work? Visit my ko-fi page here: https://ko-fi.com/bartowski
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # aiXcoder-7B Code Large Language Model
2
 
3
+ <p align="center">
4
+ 🏠 <a href="https://www.aixcoder.com/" target="_blank">Official website</a>|🛠 <a href="https://marketplace.visualstudio.com/items?itemName=aixcoder-plugin.aixcoder" target="_blank">VS Code Plugin</a>|🛠 <a href="https://plugins.jetbrains.com/plugin/13574-aixcoder-code-completer" target="_blank">Jetbrains Plugin</a>|<a href="https://github.com/aixcoder-plugin/aiXcoder-7B" target="_blank">Github Project</a>
5
+ </p>
6
 
7
+ Welcome to the official repository of aiXcoder-7B Code Large Language Model. This model is designed to understand and generate code across multiple programming languages, offering state-of-the-art performance in code completion, comprehension, generation, and more tasks about programming languages.
8
 
9
+ Table of Contents
10
 
11
+ 1. [Model Introduction](#model-introduction)
12
+ 2. [Quickstart](#quickstart)
13
+ - [Environment Requirements](#environment-requirements)
14
+ - [Model Weights](#model-weights)
15
+ - [Inference Example](#inference-example)
16
+ 3. [License](#license)
17
+ 4. [Acknowledgments](#acknowledgments)
18
 
 
19
 
 
20
 
21
+ ## Model Introduction
22
 
23
+ As the capabilities of large code models are gradually being unearthed, aiXcoder has consistently pondered on how to make these models more beneficial in real development scenarios. To this end, we have open-sourced aiXcoder 7B Base, which has undergone extensive training on 1.2T Unique Tokens, and the model's pre-training tasks as well as the contextual information have been uniquely designed for real-world code generation contexts.
24
 
25
+ aiXcoder 7B Base stands out as the most effective model in code completion scenarios among all models of similar parameter sizes, and it also surpasses mainstream models like codellama 34B and StarCoder2 15B in the average performance on the multilingual nl2code benchmark.
26
+
27
+ In our ongoing exploration to apply large code models, the release of aiXcoder 7B Base represents a significant milestone. The current version of aiXcoder 7B Base is a foundational model that focuses on improving the efficiency and accuracy of code completion and code generation tasks, aiming to provide robust support for developers in these scenarios. It is important to note that this version has not undergone specific instruct-tuning, which means it might not yet offer optimal performance for specialized higher-level tasks such as test case generation and code debugging.
28
+
29
+ However, we have plans for further development of the aiXcoder model series already in motion. In the near future, we aim to release new versions of the model that have been meticulously instruct-tuned for a wider range of programming tasks, including but not limited to test case generation and code debugging. Through these instruct-tuned models, we anticipate offering developers more comprehensive and deeper programming support, helping them to maximize efficiency at every stage of software development.
30
+
31
+ ## Quickstart
32
+
33
+ ### Environment Requirements
34
+
35
+ #### Option 1: Build Env
36
+
37
+ To run the model inference code, you'll need the following environment setup:
38
+
39
+ - Python 3.8 or higher
40
+ - PyTorch 2.1.0 or higher
41
+ - sentencepiece 0.2.0 or higher
42
+ - transformers 4.34.1 or higher (if run inference by transformers library)
43
+
44
+ Please ensure all dependencies are installed using the following command:
45
+
46
+ ```bash
47
+ conda create -n aixcoder-7b python=3.11
48
+ conda activate aixcoder-7b
49
+ git clone git@github.com:aixcoder-plugin/aiXcoder-7b.git
50
+ cd aiXcoder-7b
51
+ pip install -r requirements.txt
52
  ```
 
 
 
 
 
 
 
53
 
54
+ `requirements.txt` listed all necessary libraries and their versions.
55
+
56
+ To achieve faster inference speeds, especially for large models, we recommend installing `flash attention`. `Flash attention` is an optimized attention mechanism that significantly reduces computation time for transformer-based models without sacrificing accuracy.
57
+
58
+ Before proceeding, ensure your environment meets the CUDA requirements as `flash attention` leverages GPU acceleration. Follow these steps to install `flash attention`:
59
+
60
+ ```bash
61
+ git clone git@github.com:Dao-AILab/flash-attention.git
62
+ cd flash-attention
63
+ MAX_JOBS=8 python setup.py install
64
  ```
65
 
66
+ #### Option 2: Docker
67
+
68
+ For a consistent and isolated environment, we recommend running the model inference code using Docker. Here's how to set up and use Docker for our model:
69
 
70
+ 1. Install Docker: If you haven't already, install Docker on your machine.
71
 
72
+ 2. Pull the Docker Image: Pull the Docker image from Docker Hub.
 
 
 
 
 
 
73
 
74
+ ```bash
75
+ docker pull pytorch/pytorch:2.1.0-cuda11.8-cudnn8-devel
76
+ ```
77
 
78
+ 3. Run the Container: Once the image is pulled, you can run the model inside a Docker container.
79
 
80
+ ```bash
81
+ docker run --gpus all -it -v /dev/shm:/dev/shm --name aix_instance pytorch/pytorch:2.1.0-cuda11.8-cudnn8-devel /bin/bash
82
+ pip install sentencepiece
83
+ git clone git@github.com:aixcoder-plugin/aiXcoder-7b.git
84
+ cd aiXcoder-7b
85
  ```
86
 
87
+ This command starts a container named aix_instance from the pytorch image. You can interact with the model inside this container.
88
 
89
+ To achieve faster inference speeds, especially for large models, we recommend installing `flash attention`.
90
+
91
+ ```bash
92
+ git clone git@github.com:Dao-AILab/flash-attention.git
93
+ cd flash-attention
94
+ MAX_JOBS=8 python setup.py install
95
  ```
96
 
97
+ 4. Model Inference: Within the Docker container, you can run the model inference code as described in the Inference Example section.
98
+
99
+ Using Docker provides a clean, controlled environment that minimizes issues related to software versions and dependencies.
100
+
101
+ ### Model Weights
102
+
103
+ You can download the model weights from the following link:
104
+
105
+ - [aiXcoder Base Download](https://huggingface.co/aiXcoder/aixcoder-7b-base)
106
+ - aiXcoder Instruct Download (Comming soon...)
107
+
108
+ ### Inference Example
109
+
110
+ #### Command Line Execution
111
 
112
+ For a quick start, you can run the model inference directly from the command line:
113
+
114
+ ```bash
115
+ torchrun --nproc_per_node 1 sess_megatron.py --model_dir "path/to/model_weights_dir"
116
+ ```
117
+
118
+ Replace "path/to/model_weights_dir" with the actual path to your downloaded model weights.
119
+
120
+
121
+ or run inference with huggingface's transformers:
122
+
123
+ ```bash
124
+ python sess_huggingface.py
125
  ```
126
 
127
+ #### Python Script Execution
128
+
129
+ Alternatively, you can invoke the model programmatically within your Python scripts. This method provides more flexibility for integrating the model into your applications or workflows. Here's a simple example on how to do it:
130
+
131
+ ```python
132
+
133
+ from sess_megatron import TestInference
134
+
135
+ infer = TestInference()
136
+ res = infer.run_infer(
137
+ # for FIM style input, code_string stands for prefix context
138
+ code_string="""# 快速排序算法""",
139
+ # for FIM style input, later_code stands for suffix context
140
+ later_code="\n",
141
+ # file_path should be a path from project to file
142
+ file_path="test.py",
143
+ # max num for generated tokens
144
+ max_new_tokens=256,
145
+ )
146
+ print(res)
147
 
148
+ """output:
149
+
150
+ def quick_sort(arr):
151
+ if len(arr) <= 1:
152
+ return arr
153
+ pivot = arr[0]
154
+ less = [i for i in arr[1:] if i <= pivot]
155
+ greater = [i for i in arr[1:] if i > pivot]
156
+ return quick_sort(less) + [pivot] + quick_sort(greater)
157
+
158
+
159
+ # 测试
160
+ arr = [3, 2, 1, 4, 5]
161
+ print(quick_sort(arr)) # [1, 2, 3, 4, 5]
162
+ """
163
 
 
 
 
164
  ```
165
 
166
+ ```python
167
+
168
+
169
+ import torch
170
+ import sys
171
+ from hf_mini.utils import input_wrapper
172
+ from transformers import AutoModelForCausalLM, AutoTokenizer
173
+
174
+ device = "cuda" # the device to load the model onto
175
+
176
+ tokenizer = AutoTokenizer.from_pretrained("aiXcoder/aixcoder-7b-base")
177
+ model = AutoModelForCausalLM.from_pretrained("aiXcoder/aixcoder-7b-base", torch_dtype=torch.bfloat16)
178
+
179
+
180
+ text = input_wrapper(
181
+ # for FIM style input, code_string stands for prefix context
182
+ code_string="# 快速排序算法",
183
+ # for FIM style input, later_code stands for suffix context
184
+ later_code="\n# 测试\narr = [3, 2, 1, 4, 5]\nprint(quick_sort(arr)) # [1, 2, 3, 4, 5]",
185
+ # file_path should be a path from project to file
186
+ path="test.py"
187
+ )
188
+
189
+ if len(text) == 0:
190
+ sys.exit()
191
+
192
+ inputs = tokenizer(text, return_tensors="pt", return_token_type_ids=False)
193
+
194
+ inputs = inputs.to(device)
195
+ model.to(device)
196
+
197
+ outputs = model.generate(**inputs, max_new_tokens=256)
198
+ print(tokenizer.decode(outputs[0], skip_special_tokens=False))
199
+
200
+
201
+
202
+ """output:
203
+ def quick_sort(arr):
204
+ # 如果数组长度小于等于1,直接返回
205
+ if len(arr) <= 1:
206
+ return arr
207
+ # 选择数组的第一个元素作为基准
208
+ pivot = arr[0]
209
+ # 初始化左右指针
210
+ left, right = 1, len(arr) - 1
211
+ # 循环直到左指针小于右指针
212
+ while left < right:
213
+ # 从右到左找到第一个小于基准的元素,与左指针元素交换
214
+ if arr[right] < pivot:
215
+ arr[left], arr[right] = arr[right], arr[left]
216
+ left += 1
217
+ # 从左到右找到第一个大于等于基准的元素,与右指针元素交换
218
+ if arr[left] >= pivot:
219
+ right -= 1
220
+ # 将基准元素与左指针元素交换
221
+ arr[left], arr[0] = arr[0], arr[left]
222
+ # 对左半部分进行递归排序
223
+ quick_sort(arr[:left])
224
+ # 对右半部分进行递归排序
225
+ quick_sort(arr[left + 1:])
226
+ return arr</s>
227
+ """
228
 
 
 
 
229
  ```
230
 
231
+
232
+
233
+ ## License
234
+
235
+ The model weights are licensed under the [Model License](./MODEL_LICENSE) for academic research use; for commercial use, please apply by sending an email to support@aiXcoder.com.
236
+
237
+
238
+ ## Acknowledgments
239
+
240
+ We would like to thank all contributors to the open-source projects and datasets that made this work possible.
241
+
242
+ Thank you for your interest in our Code Large Language Model. We look forward to your contributions and feedback!
aix3-7b-base.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2e92a1733e2b780c18640406c2e097862339e1c725d1d5d87fb0b032ef7598f3
3
+ size 14865219217
config.json ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "LlamaForCausalLM"
4
+ ],
5
+ "bos_token_id": 1,
6
+ "eos_token_id": 2,
7
+ "hidden_act": "silu",
8
+ "hidden_size": 4096,
9
+ "initializer_range": 0.02,
10
+ "intermediate_size": 14464,
11
+ "max_position_embeddings": 32768,
12
+ "model_type": "llama",
13
+ "num_attention_heads": 32,
14
+ "num_hidden_layers": 32,
15
+ "num_key_value_heads": 8,
16
+ "pretraining_tp": 1,
17
+ "rms_norm_eps": 1e-06,
18
+ "rope_theta": 256000.0,
19
+ "tie_word_embeddings": false,
20
+ "torch_dtype": "bfloat16",
21
+ "transformers_version": "4.34.1",
22
+ "use_cache": true,
23
+ "vocab_size": 49152,
24
+ "quantization_config": {
25
+ "quant_method": "exl2",
26
+ "version": "0.0.18",
27
+ "bits": 5.0,
28
+ "head_bits": 6,
29
+ "calibration": {
30
+ "rows": 100,
31
+ "length": 2048,
32
+ "dataset": "(default)"
33
+ }
34
+ }
35
+ }
generation_config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 1,
4
+ "eos_token_id": 2,
5
+ "transformers_version": "4.34.1"
6
+ }
output.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:95fbab0c77882db85fbf84c3439af6c6a10d937bc0b16772df2e2ce13d0b99a3
3
+ size 4955091392
pytorch_model.bin.index.json ADDED
@@ -0,0 +1,299 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "metadata": {
3
+ "total_size": 7432572928
4
+ },
5
+ "weight_map": {
6
+ "lm_head.weight": "pytorch_model-00001-of-00001.bin",
7
+ "model.embed_tokens.weight": "pytorch_model-00001-of-00001.bin",
8
+ "model.layers.0.input_layernorm.weight": "pytorch_model-00001-of-00001.bin",
9
+ "model.layers.0.mlp.down_proj.weight": "pytorch_model-00001-of-00001.bin",
10
+ "model.layers.0.mlp.gate_proj.weight": "pytorch_model-00001-of-00001.bin",
11
+ "model.layers.0.mlp.up_proj.weight": "pytorch_model-00001-of-00001.bin",
12
+ "model.layers.0.post_attention_layernorm.weight": "pytorch_model-00001-of-00001.bin",
13
+ "model.layers.0.self_attn.k_proj.weight": "pytorch_model-00001-of-00001.bin",
14
+ "model.layers.0.self_attn.o_proj.weight": "pytorch_model-00001-of-00001.bin",
15
+ "model.layers.0.self_attn.q_proj.weight": "pytorch_model-00001-of-00001.bin",
16
+ "model.layers.0.self_attn.v_proj.weight": "pytorch_model-00001-of-00001.bin",
17
+ "model.layers.1.input_layernorm.weight": "pytorch_model-00001-of-00001.bin",
18
+ "model.layers.1.mlp.down_proj.weight": "pytorch_model-00001-of-00001.bin",
19
+ "model.layers.1.mlp.gate_proj.weight": "pytorch_model-00001-of-00001.bin",
20
+ "model.layers.1.mlp.up_proj.weight": "pytorch_model-00001-of-00001.bin",
21
+ "model.layers.1.post_attention_layernorm.weight": "pytorch_model-00001-of-00001.bin",
22
+ "model.layers.1.self_attn.k_proj.weight": "pytorch_model-00001-of-00001.bin",
23
+ "model.layers.1.self_attn.o_proj.weight": "pytorch_model-00001-of-00001.bin",
24
+ "model.layers.1.self_attn.q_proj.weight": "pytorch_model-00001-of-00001.bin",
25
+ "model.layers.1.self_attn.v_proj.weight": "pytorch_model-00001-of-00001.bin",
26
+ "model.layers.10.input_layernorm.weight": "pytorch_model-00001-of-00001.bin",
27
+ "model.layers.10.mlp.down_proj.weight": "pytorch_model-00001-of-00001.bin",
28
+ "model.layers.10.mlp.gate_proj.weight": "pytorch_model-00001-of-00001.bin",
29
+ "model.layers.10.mlp.up_proj.weight": "pytorch_model-00001-of-00001.bin",
30
+ "model.layers.10.post_attention_layernorm.weight": "pytorch_model-00001-of-00001.bin",
31
+ "model.layers.10.self_attn.k_proj.weight": "pytorch_model-00001-of-00001.bin",
32
+ "model.layers.10.self_attn.o_proj.weight": "pytorch_model-00001-of-00001.bin",
33
+ "model.layers.10.self_attn.q_proj.weight": "pytorch_model-00001-of-00001.bin",
34
+ "model.layers.10.self_attn.v_proj.weight": "pytorch_model-00001-of-00001.bin",
35
+ "model.layers.11.input_layernorm.weight": "pytorch_model-00001-of-00001.bin",
36
+ "model.layers.11.mlp.down_proj.weight": "pytorch_model-00001-of-00001.bin",
37
+ "model.layers.11.mlp.gate_proj.weight": "pytorch_model-00001-of-00001.bin",
38
+ "model.layers.11.mlp.up_proj.weight": "pytorch_model-00001-of-00001.bin",
39
+ "model.layers.11.post_attention_layernorm.weight": "pytorch_model-00001-of-00001.bin",
40
+ "model.layers.11.self_attn.k_proj.weight": "pytorch_model-00001-of-00001.bin",
41
+ "model.layers.11.self_attn.o_proj.weight": "pytorch_model-00001-of-00001.bin",
42
+ "model.layers.11.self_attn.q_proj.weight": "pytorch_model-00001-of-00001.bin",
43
+ "model.layers.11.self_attn.v_proj.weight": "pytorch_model-00001-of-00001.bin",
44
+ "model.layers.12.input_layernorm.weight": "pytorch_model-00001-of-00001.bin",
45
+ "model.layers.12.mlp.down_proj.weight": "pytorch_model-00001-of-00001.bin",
46
+ "model.layers.12.mlp.gate_proj.weight": "pytorch_model-00001-of-00001.bin",
47
+ "model.layers.12.mlp.up_proj.weight": "pytorch_model-00001-of-00001.bin",
48
+ "model.layers.12.post_attention_layernorm.weight": "pytorch_model-00001-of-00001.bin",
49
+ "model.layers.12.self_attn.k_proj.weight": "pytorch_model-00001-of-00001.bin",
50
+ "model.layers.12.self_attn.o_proj.weight": "pytorch_model-00001-of-00001.bin",
51
+ "model.layers.12.self_attn.q_proj.weight": "pytorch_model-00001-of-00001.bin",
52
+ "model.layers.12.self_attn.v_proj.weight": "pytorch_model-00001-of-00001.bin",
53
+ "model.layers.13.input_layernorm.weight": "pytorch_model-00001-of-00001.bin",
54
+ "model.layers.13.mlp.down_proj.weight": "pytorch_model-00001-of-00001.bin",
55
+ "model.layers.13.mlp.gate_proj.weight": "pytorch_model-00001-of-00001.bin",
56
+ "model.layers.13.mlp.up_proj.weight": "pytorch_model-00001-of-00001.bin",
57
+ "model.layers.13.post_attention_layernorm.weight": "pytorch_model-00001-of-00001.bin",
58
+ "model.layers.13.self_attn.k_proj.weight": "pytorch_model-00001-of-00001.bin",
59
+ "model.layers.13.self_attn.o_proj.weight": "pytorch_model-00001-of-00001.bin",
60
+ "model.layers.13.self_attn.q_proj.weight": "pytorch_model-00001-of-00001.bin",
61
+ "model.layers.13.self_attn.v_proj.weight": "pytorch_model-00001-of-00001.bin",
62
+ "model.layers.14.input_layernorm.weight": "pytorch_model-00001-of-00001.bin",
63
+ "model.layers.14.mlp.down_proj.weight": "pytorch_model-00001-of-00001.bin",
64
+ "model.layers.14.mlp.gate_proj.weight": "pytorch_model-00001-of-00001.bin",
65
+ "model.layers.14.mlp.up_proj.weight": "pytorch_model-00001-of-00001.bin",
66
+ "model.layers.14.post_attention_layernorm.weight": "pytorch_model-00001-of-00001.bin",
67
+ "model.layers.14.self_attn.k_proj.weight": "pytorch_model-00001-of-00001.bin",
68
+ "model.layers.14.self_attn.o_proj.weight": "pytorch_model-00001-of-00001.bin",
69
+ "model.layers.14.self_attn.q_proj.weight": "pytorch_model-00001-of-00001.bin",
70
+ "model.layers.14.self_attn.v_proj.weight": "pytorch_model-00001-of-00001.bin",
71
+ "model.layers.15.input_layernorm.weight": "pytorch_model-00001-of-00001.bin",
72
+ "model.layers.15.mlp.down_proj.weight": "pytorch_model-00001-of-00001.bin",
73
+ "model.layers.15.mlp.gate_proj.weight": "pytorch_model-00001-of-00001.bin",
74
+ "model.layers.15.mlp.up_proj.weight": "pytorch_model-00001-of-00001.bin",
75
+ "model.layers.15.post_attention_layernorm.weight": "pytorch_model-00001-of-00001.bin",
76
+ "model.layers.15.self_attn.k_proj.weight": "pytorch_model-00001-of-00001.bin",
77
+ "model.layers.15.self_attn.o_proj.weight": "pytorch_model-00001-of-00001.bin",
78
+ "model.layers.15.self_attn.q_proj.weight": "pytorch_model-00001-of-00001.bin",
79
+ "model.layers.15.self_attn.v_proj.weight": "pytorch_model-00001-of-00001.bin",
80
+ "model.layers.16.input_layernorm.weight": "pytorch_model-00001-of-00001.bin",
81
+ "model.layers.16.mlp.down_proj.weight": "pytorch_model-00001-of-00001.bin",
82
+ "model.layers.16.mlp.gate_proj.weight": "pytorch_model-00001-of-00001.bin",
83
+ "model.layers.16.mlp.up_proj.weight": "pytorch_model-00001-of-00001.bin",
84
+ "model.layers.16.post_attention_layernorm.weight": "pytorch_model-00001-of-00001.bin",
85
+ "model.layers.16.self_attn.k_proj.weight": "pytorch_model-00001-of-00001.bin",
86
+ "model.layers.16.self_attn.o_proj.weight": "pytorch_model-00001-of-00001.bin",
87
+ "model.layers.16.self_attn.q_proj.weight": "pytorch_model-00001-of-00001.bin",
88
+ "model.layers.16.self_attn.v_proj.weight": "pytorch_model-00001-of-00001.bin",
89
+ "model.layers.17.input_layernorm.weight": "pytorch_model-00001-of-00001.bin",
90
+ "model.layers.17.mlp.down_proj.weight": "pytorch_model-00001-of-00001.bin",
91
+ "model.layers.17.mlp.gate_proj.weight": "pytorch_model-00001-of-00001.bin",
92
+ "model.layers.17.mlp.up_proj.weight": "pytorch_model-00001-of-00001.bin",
93
+ "model.layers.17.post_attention_layernorm.weight": "pytorch_model-00001-of-00001.bin",
94
+ "model.layers.17.self_attn.k_proj.weight": "pytorch_model-00001-of-00001.bin",
95
+ "model.layers.17.self_attn.o_proj.weight": "pytorch_model-00001-of-00001.bin",
96
+ "model.layers.17.self_attn.q_proj.weight": "pytorch_model-00001-of-00001.bin",
97
+ "model.layers.17.self_attn.v_proj.weight": "pytorch_model-00001-of-00001.bin",
98
+ "model.layers.18.input_layernorm.weight": "pytorch_model-00001-of-00001.bin",
99
+ "model.layers.18.mlp.down_proj.weight": "pytorch_model-00001-of-00001.bin",
100
+ "model.layers.18.mlp.gate_proj.weight": "pytorch_model-00001-of-00001.bin",
101
+ "model.layers.18.mlp.up_proj.weight": "pytorch_model-00001-of-00001.bin",
102
+ "model.layers.18.post_attention_layernorm.weight": "pytorch_model-00001-of-00001.bin",
103
+ "model.layers.18.self_attn.k_proj.weight": "pytorch_model-00001-of-00001.bin",
104
+ "model.layers.18.self_attn.o_proj.weight": "pytorch_model-00001-of-00001.bin",
105
+ "model.layers.18.self_attn.q_proj.weight": "pytorch_model-00001-of-00001.bin",
106
+ "model.layers.18.self_attn.v_proj.weight": "pytorch_model-00001-of-00001.bin",
107
+ "model.layers.19.input_layernorm.weight": "pytorch_model-00001-of-00001.bin",
108
+ "model.layers.19.mlp.down_proj.weight": "pytorch_model-00001-of-00001.bin",
109
+ "model.layers.19.mlp.gate_proj.weight": "pytorch_model-00001-of-00001.bin",
110
+ "model.layers.19.mlp.up_proj.weight": "pytorch_model-00001-of-00001.bin",
111
+ "model.layers.19.post_attention_layernorm.weight": "pytorch_model-00001-of-00001.bin",
112
+ "model.layers.19.self_attn.k_proj.weight": "pytorch_model-00001-of-00001.bin",
113
+ "model.layers.19.self_attn.o_proj.weight": "pytorch_model-00001-of-00001.bin",
114
+ "model.layers.19.self_attn.q_proj.weight": "pytorch_model-00001-of-00001.bin",
115
+ "model.layers.19.self_attn.v_proj.weight": "pytorch_model-00001-of-00001.bin",
116
+ "model.layers.2.input_layernorm.weight": "pytorch_model-00001-of-00001.bin",
117
+ "model.layers.2.mlp.down_proj.weight": "pytorch_model-00001-of-00001.bin",
118
+ "model.layers.2.mlp.gate_proj.weight": "pytorch_model-00001-of-00001.bin",
119
+ "model.layers.2.mlp.up_proj.weight": "pytorch_model-00001-of-00001.bin",
120
+ "model.layers.2.post_attention_layernorm.weight": "pytorch_model-00001-of-00001.bin",
121
+ "model.layers.2.self_attn.k_proj.weight": "pytorch_model-00001-of-00001.bin",
122
+ "model.layers.2.self_attn.o_proj.weight": "pytorch_model-00001-of-00001.bin",
123
+ "model.layers.2.self_attn.q_proj.weight": "pytorch_model-00001-of-00001.bin",
124
+ "model.layers.2.self_attn.v_proj.weight": "pytorch_model-00001-of-00001.bin",
125
+ "model.layers.20.input_layernorm.weight": "pytorch_model-00001-of-00001.bin",
126
+ "model.layers.20.mlp.down_proj.weight": "pytorch_model-00001-of-00001.bin",
127
+ "model.layers.20.mlp.gate_proj.weight": "pytorch_model-00001-of-00001.bin",
128
+ "model.layers.20.mlp.up_proj.weight": "pytorch_model-00001-of-00001.bin",
129
+ "model.layers.20.post_attention_layernorm.weight": "pytorch_model-00001-of-00001.bin",
130
+ "model.layers.20.self_attn.k_proj.weight": "pytorch_model-00001-of-00001.bin",
131
+ "model.layers.20.self_attn.o_proj.weight": "pytorch_model-00001-of-00001.bin",
132
+ "model.layers.20.self_attn.q_proj.weight": "pytorch_model-00001-of-00001.bin",
133
+ "model.layers.20.self_attn.v_proj.weight": "pytorch_model-00001-of-00001.bin",
134
+ "model.layers.21.input_layernorm.weight": "pytorch_model-00001-of-00001.bin",
135
+ "model.layers.21.mlp.down_proj.weight": "pytorch_model-00001-of-00001.bin",
136
+ "model.layers.21.mlp.gate_proj.weight": "pytorch_model-00001-of-00001.bin",
137
+ "model.layers.21.mlp.up_proj.weight": "pytorch_model-00001-of-00001.bin",
138
+ "model.layers.21.post_attention_layernorm.weight": "pytorch_model-00001-of-00001.bin",
139
+ "model.layers.21.self_attn.k_proj.weight": "pytorch_model-00001-of-00001.bin",
140
+ "model.layers.21.self_attn.o_proj.weight": "pytorch_model-00001-of-00001.bin",
141
+ "model.layers.21.self_attn.q_proj.weight": "pytorch_model-00001-of-00001.bin",
142
+ "model.layers.21.self_attn.v_proj.weight": "pytorch_model-00001-of-00001.bin",
143
+ "model.layers.22.input_layernorm.weight": "pytorch_model-00001-of-00001.bin",
144
+ "model.layers.22.mlp.down_proj.weight": "pytorch_model-00001-of-00001.bin",
145
+ "model.layers.22.mlp.gate_proj.weight": "pytorch_model-00001-of-00001.bin",
146
+ "model.layers.22.mlp.up_proj.weight": "pytorch_model-00001-of-00001.bin",
147
+ "model.layers.22.post_attention_layernorm.weight": "pytorch_model-00001-of-00001.bin",
148
+ "model.layers.22.self_attn.k_proj.weight": "pytorch_model-00001-of-00001.bin",
149
+ "model.layers.22.self_attn.o_proj.weight": "pytorch_model-00001-of-00001.bin",
150
+ "model.layers.22.self_attn.q_proj.weight": "pytorch_model-00001-of-00001.bin",
151
+ "model.layers.22.self_attn.v_proj.weight": "pytorch_model-00001-of-00001.bin",
152
+ "model.layers.23.input_layernorm.weight": "pytorch_model-00001-of-00001.bin",
153
+ "model.layers.23.mlp.down_proj.weight": "pytorch_model-00001-of-00001.bin",
154
+ "model.layers.23.mlp.gate_proj.weight": "pytorch_model-00001-of-00001.bin",
155
+ "model.layers.23.mlp.up_proj.weight": "pytorch_model-00001-of-00001.bin",
156
+ "model.layers.23.post_attention_layernorm.weight": "pytorch_model-00001-of-00001.bin",
157
+ "model.layers.23.self_attn.k_proj.weight": "pytorch_model-00001-of-00001.bin",
158
+ "model.layers.23.self_attn.o_proj.weight": "pytorch_model-00001-of-00001.bin",
159
+ "model.layers.23.self_attn.q_proj.weight": "pytorch_model-00001-of-00001.bin",
160
+ "model.layers.23.self_attn.v_proj.weight": "pytorch_model-00001-of-00001.bin",
161
+ "model.layers.24.input_layernorm.weight": "pytorch_model-00001-of-00001.bin",
162
+ "model.layers.24.mlp.down_proj.weight": "pytorch_model-00001-of-00001.bin",
163
+ "model.layers.24.mlp.gate_proj.weight": "pytorch_model-00001-of-00001.bin",
164
+ "model.layers.24.mlp.up_proj.weight": "pytorch_model-00001-of-00001.bin",
165
+ "model.layers.24.post_attention_layernorm.weight": "pytorch_model-00001-of-00001.bin",
166
+ "model.layers.24.self_attn.k_proj.weight": "pytorch_model-00001-of-00001.bin",
167
+ "model.layers.24.self_attn.o_proj.weight": "pytorch_model-00001-of-00001.bin",
168
+ "model.layers.24.self_attn.q_proj.weight": "pytorch_model-00001-of-00001.bin",
169
+ "model.layers.24.self_attn.v_proj.weight": "pytorch_model-00001-of-00001.bin",
170
+ "model.layers.25.input_layernorm.weight": "pytorch_model-00001-of-00001.bin",
171
+ "model.layers.25.mlp.down_proj.weight": "pytorch_model-00001-of-00001.bin",
172
+ "model.layers.25.mlp.gate_proj.weight": "pytorch_model-00001-of-00001.bin",
173
+ "model.layers.25.mlp.up_proj.weight": "pytorch_model-00001-of-00001.bin",
174
+ "model.layers.25.post_attention_layernorm.weight": "pytorch_model-00001-of-00001.bin",
175
+ "model.layers.25.self_attn.k_proj.weight": "pytorch_model-00001-of-00001.bin",
176
+ "model.layers.25.self_attn.o_proj.weight": "pytorch_model-00001-of-00001.bin",
177
+ "model.layers.25.self_attn.q_proj.weight": "pytorch_model-00001-of-00001.bin",
178
+ "model.layers.25.self_attn.v_proj.weight": "pytorch_model-00001-of-00001.bin",
179
+ "model.layers.26.input_layernorm.weight": "pytorch_model-00001-of-00001.bin",
180
+ "model.layers.26.mlp.down_proj.weight": "pytorch_model-00001-of-00001.bin",
181
+ "model.layers.26.mlp.gate_proj.weight": "pytorch_model-00001-of-00001.bin",
182
+ "model.layers.26.mlp.up_proj.weight": "pytorch_model-00001-of-00001.bin",
183
+ "model.layers.26.post_attention_layernorm.weight": "pytorch_model-00001-of-00001.bin",
184
+ "model.layers.26.self_attn.k_proj.weight": "pytorch_model-00001-of-00001.bin",
185
+ "model.layers.26.self_attn.o_proj.weight": "pytorch_model-00001-of-00001.bin",
186
+ "model.layers.26.self_attn.q_proj.weight": "pytorch_model-00001-of-00001.bin",
187
+ "model.layers.26.self_attn.v_proj.weight": "pytorch_model-00001-of-00001.bin",
188
+ "model.layers.27.input_layernorm.weight": "pytorch_model-00001-of-00001.bin",
189
+ "model.layers.27.mlp.down_proj.weight": "pytorch_model-00001-of-00001.bin",
190
+ "model.layers.27.mlp.gate_proj.weight": "pytorch_model-00001-of-00001.bin",
191
+ "model.layers.27.mlp.up_proj.weight": "pytorch_model-00001-of-00001.bin",
192
+ "model.layers.27.post_attention_layernorm.weight": "pytorch_model-00001-of-00001.bin",
193
+ "model.layers.27.self_attn.k_proj.weight": "pytorch_model-00001-of-00001.bin",
194
+ "model.layers.27.self_attn.o_proj.weight": "pytorch_model-00001-of-00001.bin",
195
+ "model.layers.27.self_attn.q_proj.weight": "pytorch_model-00001-of-00001.bin",
196
+ "model.layers.27.self_attn.v_proj.weight": "pytorch_model-00001-of-00001.bin",
197
+ "model.layers.28.input_layernorm.weight": "pytorch_model-00001-of-00001.bin",
198
+ "model.layers.28.mlp.down_proj.weight": "pytorch_model-00001-of-00001.bin",
199
+ "model.layers.28.mlp.gate_proj.weight": "pytorch_model-00001-of-00001.bin",
200
+ "model.layers.28.mlp.up_proj.weight": "pytorch_model-00001-of-00001.bin",
201
+ "model.layers.28.post_attention_layernorm.weight": "pytorch_model-00001-of-00001.bin",
202
+ "model.layers.28.self_attn.k_proj.weight": "pytorch_model-00001-of-00001.bin",
203
+ "model.layers.28.self_attn.o_proj.weight": "pytorch_model-00001-of-00001.bin",
204
+ "model.layers.28.self_attn.q_proj.weight": "pytorch_model-00001-of-00001.bin",
205
+ "model.layers.28.self_attn.v_proj.weight": "pytorch_model-00001-of-00001.bin",
206
+ "model.layers.29.input_layernorm.weight": "pytorch_model-00001-of-00001.bin",
207
+ "model.layers.29.mlp.down_proj.weight": "pytorch_model-00001-of-00001.bin",
208
+ "model.layers.29.mlp.gate_proj.weight": "pytorch_model-00001-of-00001.bin",
209
+ "model.layers.29.mlp.up_proj.weight": "pytorch_model-00001-of-00001.bin",
210
+ "model.layers.29.post_attention_layernorm.weight": "pytorch_model-00001-of-00001.bin",
211
+ "model.layers.29.self_attn.k_proj.weight": "pytorch_model-00001-of-00001.bin",
212
+ "model.layers.29.self_attn.o_proj.weight": "pytorch_model-00001-of-00001.bin",
213
+ "model.layers.29.self_attn.q_proj.weight": "pytorch_model-00001-of-00001.bin",
214
+ "model.layers.29.self_attn.v_proj.weight": "pytorch_model-00001-of-00001.bin",
215
+ "model.layers.3.input_layernorm.weight": "pytorch_model-00001-of-00001.bin",
216
+ "model.layers.3.mlp.down_proj.weight": "pytorch_model-00001-of-00001.bin",
217
+ "model.layers.3.mlp.gate_proj.weight": "pytorch_model-00001-of-00001.bin",
218
+ "model.layers.3.mlp.up_proj.weight": "pytorch_model-00001-of-00001.bin",
219
+ "model.layers.3.post_attention_layernorm.weight": "pytorch_model-00001-of-00001.bin",
220
+ "model.layers.3.self_attn.k_proj.weight": "pytorch_model-00001-of-00001.bin",
221
+ "model.layers.3.self_attn.o_proj.weight": "pytorch_model-00001-of-00001.bin",
222
+ "model.layers.3.self_attn.q_proj.weight": "pytorch_model-00001-of-00001.bin",
223
+ "model.layers.3.self_attn.v_proj.weight": "pytorch_model-00001-of-00001.bin",
224
+ "model.layers.30.input_layernorm.weight": "pytorch_model-00001-of-00001.bin",
225
+ "model.layers.30.mlp.down_proj.weight": "pytorch_model-00001-of-00001.bin",
226
+ "model.layers.30.mlp.gate_proj.weight": "pytorch_model-00001-of-00001.bin",
227
+ "model.layers.30.mlp.up_proj.weight": "pytorch_model-00001-of-00001.bin",
228
+ "model.layers.30.post_attention_layernorm.weight": "pytorch_model-00001-of-00001.bin",
229
+ "model.layers.30.self_attn.k_proj.weight": "pytorch_model-00001-of-00001.bin",
230
+ "model.layers.30.self_attn.o_proj.weight": "pytorch_model-00001-of-00001.bin",
231
+ "model.layers.30.self_attn.q_proj.weight": "pytorch_model-00001-of-00001.bin",
232
+ "model.layers.30.self_attn.v_proj.weight": "pytorch_model-00001-of-00001.bin",
233
+ "model.layers.31.input_layernorm.weight": "pytorch_model-00001-of-00001.bin",
234
+ "model.layers.31.mlp.down_proj.weight": "pytorch_model-00001-of-00001.bin",
235
+ "model.layers.31.mlp.gate_proj.weight": "pytorch_model-00001-of-00001.bin",
236
+ "model.layers.31.mlp.up_proj.weight": "pytorch_model-00001-of-00001.bin",
237
+ "model.layers.31.post_attention_layernorm.weight": "pytorch_model-00001-of-00001.bin",
238
+ "model.layers.31.self_attn.k_proj.weight": "pytorch_model-00001-of-00001.bin",
239
+ "model.layers.31.self_attn.o_proj.weight": "pytorch_model-00001-of-00001.bin",
240
+ "model.layers.31.self_attn.q_proj.weight": "pytorch_model-00001-of-00001.bin",
241
+ "model.layers.31.self_attn.v_proj.weight": "pytorch_model-00001-of-00001.bin",
242
+ "model.layers.4.input_layernorm.weight": "pytorch_model-00001-of-00001.bin",
243
+ "model.layers.4.mlp.down_proj.weight": "pytorch_model-00001-of-00001.bin",
244
+ "model.layers.4.mlp.gate_proj.weight": "pytorch_model-00001-of-00001.bin",
245
+ "model.layers.4.mlp.up_proj.weight": "pytorch_model-00001-of-00001.bin",
246
+ "model.layers.4.post_attention_layernorm.weight": "pytorch_model-00001-of-00001.bin",
247
+ "model.layers.4.self_attn.k_proj.weight": "pytorch_model-00001-of-00001.bin",
248
+ "model.layers.4.self_attn.o_proj.weight": "pytorch_model-00001-of-00001.bin",
249
+ "model.layers.4.self_attn.q_proj.weight": "pytorch_model-00001-of-00001.bin",
250
+ "model.layers.4.self_attn.v_proj.weight": "pytorch_model-00001-of-00001.bin",
251
+ "model.layers.5.input_layernorm.weight": "pytorch_model-00001-of-00001.bin",
252
+ "model.layers.5.mlp.down_proj.weight": "pytorch_model-00001-of-00001.bin",
253
+ "model.layers.5.mlp.gate_proj.weight": "pytorch_model-00001-of-00001.bin",
254
+ "model.layers.5.mlp.up_proj.weight": "pytorch_model-00001-of-00001.bin",
255
+ "model.layers.5.post_attention_layernorm.weight": "pytorch_model-00001-of-00001.bin",
256
+ "model.layers.5.self_attn.k_proj.weight": "pytorch_model-00001-of-00001.bin",
257
+ "model.layers.5.self_attn.o_proj.weight": "pytorch_model-00001-of-00001.bin",
258
+ "model.layers.5.self_attn.q_proj.weight": "pytorch_model-00001-of-00001.bin",
259
+ "model.layers.5.self_attn.v_proj.weight": "pytorch_model-00001-of-00001.bin",
260
+ "model.layers.6.input_layernorm.weight": "pytorch_model-00001-of-00001.bin",
261
+ "model.layers.6.mlp.down_proj.weight": "pytorch_model-00001-of-00001.bin",
262
+ "model.layers.6.mlp.gate_proj.weight": "pytorch_model-00001-of-00001.bin",
263
+ "model.layers.6.mlp.up_proj.weight": "pytorch_model-00001-of-00001.bin",
264
+ "model.layers.6.post_attention_layernorm.weight": "pytorch_model-00001-of-00001.bin",
265
+ "model.layers.6.self_attn.k_proj.weight": "pytorch_model-00001-of-00001.bin",
266
+ "model.layers.6.self_attn.o_proj.weight": "pytorch_model-00001-of-00001.bin",
267
+ "model.layers.6.self_attn.q_proj.weight": "pytorch_model-00001-of-00001.bin",
268
+ "model.layers.6.self_attn.v_proj.weight": "pytorch_model-00001-of-00001.bin",
269
+ "model.layers.7.input_layernorm.weight": "pytorch_model-00001-of-00001.bin",
270
+ "model.layers.7.mlp.down_proj.weight": "pytorch_model-00001-of-00001.bin",
271
+ "model.layers.7.mlp.gate_proj.weight": "pytorch_model-00001-of-00001.bin",
272
+ "model.layers.7.mlp.up_proj.weight": "pytorch_model-00001-of-00001.bin",
273
+ "model.layers.7.post_attention_layernorm.weight": "pytorch_model-00001-of-00001.bin",
274
+ "model.layers.7.self_attn.k_proj.weight": "pytorch_model-00001-of-00001.bin",
275
+ "model.layers.7.self_attn.o_proj.weight": "pytorch_model-00001-of-00001.bin",
276
+ "model.layers.7.self_attn.q_proj.weight": "pytorch_model-00001-of-00001.bin",
277
+ "model.layers.7.self_attn.v_proj.weight": "pytorch_model-00001-of-00001.bin",
278
+ "model.layers.8.input_layernorm.weight": "pytorch_model-00001-of-00001.bin",
279
+ "model.layers.8.mlp.down_proj.weight": "pytorch_model-00001-of-00001.bin",
280
+ "model.layers.8.mlp.gate_proj.weight": "pytorch_model-00001-of-00001.bin",
281
+ "model.layers.8.mlp.up_proj.weight": "pytorch_model-00001-of-00001.bin",
282
+ "model.layers.8.post_attention_layernorm.weight": "pytorch_model-00001-of-00001.bin",
283
+ "model.layers.8.self_attn.k_proj.weight": "pytorch_model-00001-of-00001.bin",
284
+ "model.layers.8.self_attn.o_proj.weight": "pytorch_model-00001-of-00001.bin",
285
+ "model.layers.8.self_attn.q_proj.weight": "pytorch_model-00001-of-00001.bin",
286
+ "model.layers.8.self_attn.v_proj.weight": "pytorch_model-00001-of-00001.bin",
287
+ "model.layers.9.input_layernorm.weight": "pytorch_model-00001-of-00001.bin",
288
+ "model.layers.9.mlp.down_proj.weight": "pytorch_model-00001-of-00001.bin",
289
+ "model.layers.9.mlp.gate_proj.weight": "pytorch_model-00001-of-00001.bin",
290
+ "model.layers.9.mlp.up_proj.weight": "pytorch_model-00001-of-00001.bin",
291
+ "model.layers.9.post_attention_layernorm.weight": "pytorch_model-00001-of-00001.bin",
292
+ "model.layers.9.self_attn.k_proj.weight": "pytorch_model-00001-of-00001.bin",
293
+ "model.layers.9.self_attn.o_proj.weight": "pytorch_model-00001-of-00001.bin",
294
+ "model.layers.9.self_attn.q_proj.weight": "pytorch_model-00001-of-00001.bin",
295
+ "model.layers.9.self_attn.v_proj.weight": "pytorch_model-00001-of-00001.bin",
296
+ "model.norm.weight": "pytorch_model-00001-of-00001.bin"
297
+ }
298
+ }
299
+
special_tokens_map.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {}
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dc5db6cf9475a51b095e672224050d079ea6266110602ca66983e6a178722acd
3
+ size 871150
tokenizer_config.json ADDED
@@ -0,0 +1,40 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "<unk>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "<s>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "</s>",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ }
27
+ },
28
+ "additional_special_tokens": [],
29
+ "bos_token": "<s>",
30
+ "clean_up_tokenization_spaces": false,
31
+ "eos_token": "</s>",
32
+ "legacy": true,
33
+ "model_max_length": 1000000000000000019884624838656,
34
+ "pad_token": null,
35
+ "sp_model_kwargs": {},
36
+ "spaces_between_special_tokens": false,
37
+ "tokenizer_class": "PreTrainedTokenizerFast",
38
+ "unk_token": "<unk>",
39
+ "use_default_system_prompt": false
40
+ }