Viren csris commited on
Commit
7222796
0 Parent(s):

Duplicate from togethercomputer/GPT-NeoXT-Chat-Base-20B

Browse files

Co-authored-by: Charles Srisuwananukorn <csris@users.noreply.huggingface.co>

.gitattributes ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tflite filter=lfs diff=lfs merge=lfs -text
29
+ *.tgz filter=lfs diff=lfs merge=lfs -text
30
+ *.wasm filter=lfs diff=lfs merge=lfs -text
31
+ *.xz filter=lfs diff=lfs merge=lfs -text
32
+ *.zip filter=lfs diff=lfs merge=lfs -text
33
+ *.zst filter=lfs diff=lfs merge=lfs -text
34
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,231 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ duplicated_from: togethercomputer/GPT-NeoXT-Chat-Base-20B
6
+ ---
7
+
8
+ ***<p style="font-size: 24px">Feel free to try out our [OpenChatKit feedback app](https://huggingface.co/spaces/togethercomputer/OpenChatKit)!</p>***
9
+
10
+ # GPT-NeoXT-Chat-Base-20B-v0.16
11
+
12
+ > TLDR: As part of OpenChatKit (codebase available [here](https://github.com/togethercomputer/OpenChaT)),
13
+ > GPT-NeoXT-Chat-Base-20B-v0.16 is a 20B parameter language model, fine-tuned from EleutherAI’s GPT-NeoX with over 40 million instructions on 100% carbon negative compute.
14
+
15
+ GPT-NeoXT-Chat-Base-20B-v0.16 is based on ElutherAI’s GPT-NeoX model, and is fine-tuned with data focusing on dialog-style interactions.
16
+ We focused the tuning on several tasks such as question answering, classification, extraction, and summarization.
17
+ We’ve fine-tuned the model with a collection of 43 million high-quality instructions.
18
+ Together partnered with LAION and Ontocord.ai, who both helped curate the dataset the model is based on.
19
+ You can read more about this process and the availability of this dataset in LAION’s blog post [here](https://laion.ai/blog/oig-dataset/).
20
+
21
+ In addition to the aforementioned fine-tuning, GPT-NeoXT-Chat-Base-20B-v0.16 has also undergone further fine-tuning via a small amount of feedback data.
22
+ This allows the model to better adapt to human preferences in the conversations.
23
+
24
+ ## Model Details
25
+ - **Developed by**: Together Computer.
26
+ - **Model type**: Language Model
27
+ - **Language(s)**: English
28
+ - **License**: Apache 2.0
29
+ - **Model Description**: A 20B parameter open source chat model, fine-tuned from EleutherAI’s NeoX with over 40M instructions on 100% carbon negative compute
30
+ - **Resources for more information**: [GitHub Repository](https://github.com/togethercomputer/OpenChaT).
31
+
32
+ # Quick Start
33
+
34
+ ## GPU Inference
35
+
36
+ This requires a GPU with 48GB memory.
37
+ ```python
38
+ from transformers import AutoTokenizer, AutoModelForCausalLM
39
+ # init
40
+ tokenizer = AutoTokenizer.from_pretrained("togethercomputer/GPT-NeoXT-Chat-Base-20B")
41
+ model = AutoModelForCausalLM.from_pretrained("togethercomputer/GPT-NeoXT-Chat-Base-20B", torch_dtype=torch.float16)
42
+ model = model.to('cuda:0')
43
+ # infer
44
+ inputs = tokenizer("<human>: Hello!\n<bot>:", return_tensors='pt').to(model.device)
45
+ outputs = model.generate(**inputs, max_new_tokens=10, do_sample=True, temperature=0.8)
46
+ output_str = tokenizer.decode(outputs[0])
47
+ print(output_str)
48
+ ```
49
+
50
+ ## GPU Inference in Int8
51
+
52
+ This requires a GPU with 24GB memory.
53
+
54
+ ```python
55
+ from transformers import AutoTokenizer, AutoModelForCausalLM
56
+ # init
57
+ tokenizer = AutoTokenizer.from_pretrained("togethercomputer/GPT-NeoXT-Chat-Base-20B")
58
+ model = AutoModelForCausalLM.from_pretrained("togethercomputer/GPT-NeoXT-Chat-Base-20B", device_map="auto", load_in_8bit=True)
59
+ # infer
60
+ inputs = tokenizer("<human>: Hello!\n<bot>:", return_tensors='pt').to(model.device)
61
+ outputs = model.generate(**inputs, max_new_tokens=10, do_sample=True, temperature=0.8)
62
+ output_str = tokenizer.decode(outputs[0])
63
+ print(output_str)
64
+ ```
65
+
66
+ ## CPU Inference
67
+
68
+ ```python
69
+ from transformers import AutoTokenizer, AutoModelForCausalLM
70
+ # init
71
+ tokenizer = AutoTokenizer.from_pretrained("togethercomputer/GPT-NeoXT-Chat-Base-20B")
72
+ model = AutoModelForCausalLM.from_pretrained("togethercomputer/GPT-NeoXT-Chat-Base-20B", torch_dtype=torch.bfloat16)
73
+ # infer
74
+ inputs = tokenizer("<human>: Hello!\n<bot>:", return_tensors='pt').to(model.device)
75
+ outputs = model.generate(**inputs, max_new_tokens=10, do_sample=True, temperature=0.8)
76
+ output_str = tokenizer.decode(outputs[0])
77
+ print(output_str)
78
+ ```
79
+
80
+
81
+ ## Strengths of the model
82
+
83
+ There are several tasks that OpenChatKit excels at out of the box. This includes:
84
+
85
+ - Example 1: Summarization and question answering within context.
86
+
87
+ ```markdown
88
+ **Summarize a long document into a single sentence and conduct question answering related to the document, with multiple rounds**
89
+
90
+ <human>: Last year, the travel industry saw a big rebound in demand — and that demand is showing no signs of slowing down this spring break travel season. Planes and hotels will be full, travelers will likely face long queues, cancellations, massive crowds and plenty of other travel nightmares. But perhaps the most frustrating thing you’ll have to prepare for this spring break is if your luggage goes missing. You’ve got to somehow track down your belongings that were in the airline’s care and then wait to be reunited — all while trying to enjoy the start of your long-awaited vacation. As far as travel nightmares go, lost luggage is up there as one of the worst.
91
+
92
+ To be honest up front, I am great at strategically losing things — or rather, losing stuff with style. (I hope you appreciate the “Toy Story” reference.) So when a product comes along that’s a perfect way to keep track of my belongings — especially when traveling — I’m adding it to my travel bag.
93
+
94
+ I’ve been a big fan of Apple’s $29 item tracker, dubbed AirTag. The long-rumored AirTag product finally launched in 2021, and while it’s not the first item tracker to exist, it has two key features that set it apart from the rest to make it the ultimate travel companion: access to the Find My network and Precision Finding technology.
95
+
96
+ The Federal Aviation Administration has even ruled that having an AirTag in your luggage is permitted and, in fact, not a safety hazard. The devices only contain 0.1 grams of lithium, which falls below the FAA’s limit.
97
+
98
+ “Luggage tracking devices powered by lithium metal cells that have 0.3 grams or less of lithium can be used on checked baggage,” the FAA said in a statement. “Apple AirTags meet this threshold; other luggage tracking devices may not.”
99
+
100
+ But, you may be wondering what makes them so special and why you should carry one in your checked luggage this spring break? Let’s explain why.
101
+
102
+ Summarize the above into a single paragraph.
103
+
104
+ <bot>: If you're traveling this spring break, you might want to consider investing in an item tracker to help you find your lost luggage. Apple's AirTag is the best option, according to a new report.
105
+
106
+ <human>: But isn't it unsafe to put AirTag in a luggage?
107
+
108
+ <bot>: The Federal Aviation Administration has ruled that having an AirTag in your luggage is permitted and, in fact, not a safety hazard.
109
+
110
+ <human>: Ah, why is that?
111
+
112
+ <bot>: The devices only contain 0.1 grams of lithium, which falls below the FAA’s limit.
113
+ ```
114
+
115
+ - Example 2: Extraction.
116
+
117
+ ```markdown
118
+ **Extract structured information from unstructured documents, as illustrated in the following example.**
119
+
120
+ <human>: Generate a table summarizing the options outlined in this email.
121
+
122
+ Team, we have to make a decision on the product plan for Goocrux v2.
123
+
124
+ The first option will take us 3 weeks of development, and will achieve performance of 80ms, but means that the UI has to change.
125
+
126
+ The second option will take us 4 weeks of development, and will achieve performance of 60ms, without any UI change.
127
+
128
+ The third option will take the longest -- 6 weeks of development, will achieve performance of 30ms, without any UI change.
129
+
130
+ A markdown table with four columns: (1) Option ID, (2) Development time, (3) Performance, (4) UI Change:
131
+
132
+ <bot>: #.
133
+ | ID | Option ID | Development time | Performance | UI Change |
134
+ | --------- | --------- | --------- | --------- | --------- |
135
+ | 1 | 1 | 3 weeks | 80ms | Yes |
136
+ | 2 | 2 | 4 weeks | 60ms | No |
137
+ | 3 | 3 | 6 weeks | 30ms | No |
138
+ ```
139
+
140
+ - Example 3: Classification.
141
+
142
+ ```markdown
143
+ **Classifying a sentence or paragraph into different categories.**
144
+
145
+ <human>: Classify the sentiment of the following sentence into Positive, Neutral, or Negative: Super excited about teaching Stanford’s first course on Large Language Models! Check the syllabus out here
146
+ <bot>: Positive
147
+
148
+ <human>: Classify the sentiment of the following sentence into Positive, Neutral, or Negative: How about the following sentence: It is raining outside and I feel so blue
149
+ <bot>: Negative
150
+ ```
151
+
152
+ In addition, the model does well on few-shot prompts. For both classification and extraction, the model performs even better with few shots, as in most HELM tasks. [Contact us](https://www.together.xyz/contact) if you’re interested in trying few-shot prompts with the model.
153
+
154
+ ## Weaknesses of the model
155
+
156
+ That said, there are several areas where we have more work to do, and we need your help! Some of these include:
157
+
158
+ - Knowledge-based closed question and answering: The chatbot may hallucinate and give incorrect results. Be sure to fact check, and if possible provide feedback with the corrected information.
159
+ - Coding tasks: The chatbot was not trained on a large enough corpus of source code to excel at writing code. We welcome contributions of additional datasets to improve this!
160
+ - Repetition: Sometimes the chatbot will repeat its response. We’re working to improve this, but in the meantime you can click the refresh button to start a new conversation.
161
+ - Context switching: If you change the topic in the middle of a conversation the chatbot often cannot make the switch automatically and will continue to give answers related to the prior topic.
162
+ - Creative writing and longer answers: The chatbot does not generate long, creative text such as an essay or story.
163
+
164
+ We are excited to work with you to address these weaknesses by getting your feedback, bolstering data sets, and improving accuracy.
165
+
166
+ # Uses
167
+
168
+ ## Direct Use
169
+
170
+ The model is intended for research purposes. Possible research areas and tasks include
171
+
172
+ - Safe deployment of models which have the potential to generate harmful content.
173
+ - Probing and understanding the limitations and biases of dialogue models or language models.
174
+ - Generation of artworks and use in design and other artistic processes.
175
+ - Applications in educational or creative tools.
176
+ - Research on dialogue models or language models.
177
+
178
+ Excluded uses are described below.
179
+
180
+ ### Misuse, Malicious Use, and Out-of-Scope Use
181
+
182
+ The OpenChatKit community provides GPT-NeoXT-Chat-Base-20B-v0.16 as an open source tool for building chatbots.
183
+ The community is not responsible for any misuse, malicious use, or out-of-scope use of the model.
184
+ It is the responsibility of the end user to ensure that the model is used in a responsible and ethical manner.
185
+
186
+ #### Out-of-Scope Use
187
+
188
+ GPT-NeoXT-Chat-Base-20B-v0.16 is designed for use in chatbot applications and may not perform well for other use cases outside of its intended scope.
189
+ For example, it may not be suitable for use in safety-critical applications or for making decisions that have a significant impact on individuals or society.
190
+ It is important to consider the limitations of the model and to only use it for its intended purpose.
191
+
192
+ #### Misuse and Malicious Use
193
+
194
+ GPT-NeoXT-Chat-Base-20B-v0.16 is designed for use in chatbot applications and should not be used for any other purpose.
195
+ Misuse of the model, such as using it to engage in illegal or unethical activities, is strictly prohibited and goes against the principles of the OpenChatKit community project.
196
+
197
+ Using the model to generate content that is cruel to individuals is a misuse of this model. This includes, but is not limited to:
198
+
199
+ - Generating fake news, misinformation, or propaganda
200
+ - Promoting hate speech, discrimination, or violence against individuals or groups
201
+ - Impersonating individuals or organizations without their consent
202
+ - Engaging in cyberbullying or harassment
203
+ - Defamatory content
204
+ - Spamming or scamming
205
+ - Sharing confidential or sensitive information without proper authorization
206
+ - Violating the terms of use of the model or the data used to train it
207
+ - Creating automated bots for malicious purposes such as spreading malware, phishing scams, or spamming
208
+
209
+ ## Limitations
210
+
211
+ GPT-NeoXT-Chat-Base-20B-v0.16, like other language model-based chatbots, has limitations that should be taken into consideration.
212
+ For example, the model may not always provide accurate or relevant answers, particularly for questions that are complex, ambiguous, or outside of its training data.
213
+ We therefore welcome contributions from individuals and organizations, and encourage collaboration towards creating a more robust and inclusive chatbot.
214
+
215
+ ## Training
216
+
217
+ **Training Data**
218
+
219
+ Please refer to [togethercomputer/OpenDataHub](https://github.com/togethercomputer/OpenDataHub)
220
+
221
+ **Training Procedure**
222
+
223
+ - **Hardware:** 2 x 8 x A100 GPUs
224
+ - **Optimizer:** [8bit-AdamW](https://github.com/TimDettmers/bitsandbytes)
225
+ - **Gradient Accumulations**: 2
226
+ - **Batch:** 2 x 2 x 64 x 2048 = 524288 tokens
227
+ - **Learning rate:** warmup to 1e-6 for 100 steps and then kept constant
228
+
229
+ ## Community
230
+
231
+ Join us on [Together Discord](https://discord.gg/6ZVDU8tTD4)
config.json ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "togethercomputer/GPT-NeoXT-Chat-Base-20B",
3
+ "architectures": [
4
+ "GPTNeoXForCausalLM"
5
+ ],
6
+ "attention_probs_dropout_prob": 0,
7
+ "bos_token_id": 0,
8
+ "eos_token_id": 0,
9
+ "hidden_act": "gelu_fast",
10
+ "hidden_dropout_prob": 0,
11
+ "hidden_size": 6144,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 24576,
14
+ "layer_norm_eps": 1e-05,
15
+ "max_position_embeddings": 2048,
16
+ "model_type": "gpt_neox",
17
+ "num_attention_heads": 64,
18
+ "num_hidden_layers": 44,
19
+ "rotary_emb_base": 10000,
20
+ "rotary_pct": 0.25,
21
+ "tie_word_embeddings": false,
22
+ "torch_dtype": "float16",
23
+ "transformers_version": "4.21.1",
24
+ "use_cache": true,
25
+ "vocab_size": 50432
26
+ }
pytorch_model-00001-of-00005.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1cfd3a71c56d95e80f3d964e8be957f71bea4f1073788ac56d28a7815294ff5e
3
+ size 9953774091
pytorch_model-00002-of-00005.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c895eabbe65d8c9be65fd02a124436b2154aff20e2f9acd573496bd367d8ad1d
3
+ size 9787088144
pytorch_model-00003-of-00005.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:71e7644ee43a87b0cb150fd0543d2f0f4f2a8b97c10015549303b2bca33ae4f2
3
+ size 9707369423
pytorch_model-00004-of-00005.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:26add9b5cdf3454dbba9eaf8a1255734a9b684fa058b3032c6111782cf9d0f92
3
+ size 9711578808
pytorch_model-00005-of-00005.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c902504815cc65f0349a5180d72159fb4250de1c6d22d7320bbf0e2772ffe0b4
3
+ size 2134105435
pytorch_model.bin.index.json ADDED
@@ -0,0 +1,671 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "metadata": {
3
+ "total_size": 41293685880
4
+ },
5
+ "weight_map": {
6
+ "embed_out.weight": "pytorch_model-00005-of-00005.bin",
7
+ "gpt_neox.embed_in.weight": "pytorch_model-00001-of-00005.bin",
8
+ "gpt_neox.final_layer_norm.bias": "pytorch_model-00005-of-00005.bin",
9
+ "gpt_neox.final_layer_norm.weight": "pytorch_model-00005-of-00005.bin",
10
+ "gpt_neox.layers.0.attention.bias": "pytorch_model-00001-of-00005.bin",
11
+ "gpt_neox.layers.0.attention.dense.bias": "pytorch_model-00001-of-00005.bin",
12
+ "gpt_neox.layers.0.attention.dense.weight": "pytorch_model-00001-of-00005.bin",
13
+ "gpt_neox.layers.0.attention.masked_bias": "pytorch_model-00001-of-00005.bin",
14
+ "gpt_neox.layers.0.attention.query_key_value.bias": "pytorch_model-00001-of-00005.bin",
15
+ "gpt_neox.layers.0.attention.query_key_value.weight": "pytorch_model-00001-of-00005.bin",
16
+ "gpt_neox.layers.0.attention.rotary_emb.inv_freq": "pytorch_model-00001-of-00005.bin",
17
+ "gpt_neox.layers.0.input_layernorm.bias": "pytorch_model-00001-of-00005.bin",
18
+ "gpt_neox.layers.0.input_layernorm.weight": "pytorch_model-00001-of-00005.bin",
19
+ "gpt_neox.layers.0.mlp.dense_4h_to_h.bias": "pytorch_model-00001-of-00005.bin",
20
+ "gpt_neox.layers.0.mlp.dense_4h_to_h.weight": "pytorch_model-00001-of-00005.bin",
21
+ "gpt_neox.layers.0.mlp.dense_h_to_4h.bias": "pytorch_model-00001-of-00005.bin",
22
+ "gpt_neox.layers.0.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00005.bin",
23
+ "gpt_neox.layers.0.post_attention_layernorm.bias": "pytorch_model-00001-of-00005.bin",
24
+ "gpt_neox.layers.0.post_attention_layernorm.weight": "pytorch_model-00001-of-00005.bin",
25
+ "gpt_neox.layers.1.attention.bias": "pytorch_model-00001-of-00005.bin",
26
+ "gpt_neox.layers.1.attention.dense.bias": "pytorch_model-00001-of-00005.bin",
27
+ "gpt_neox.layers.1.attention.dense.weight": "pytorch_model-00001-of-00005.bin",
28
+ "gpt_neox.layers.1.attention.masked_bias": "pytorch_model-00001-of-00005.bin",
29
+ "gpt_neox.layers.1.attention.query_key_value.bias": "pytorch_model-00001-of-00005.bin",
30
+ "gpt_neox.layers.1.attention.query_key_value.weight": "pytorch_model-00001-of-00005.bin",
31
+ "gpt_neox.layers.1.attention.rotary_emb.inv_freq": "pytorch_model-00001-of-00005.bin",
32
+ "gpt_neox.layers.1.input_layernorm.bias": "pytorch_model-00001-of-00005.bin",
33
+ "gpt_neox.layers.1.input_layernorm.weight": "pytorch_model-00001-of-00005.bin",
34
+ "gpt_neox.layers.1.mlp.dense_4h_to_h.bias": "pytorch_model-00001-of-00005.bin",
35
+ "gpt_neox.layers.1.mlp.dense_4h_to_h.weight": "pytorch_model-00001-of-00005.bin",
36
+ "gpt_neox.layers.1.mlp.dense_h_to_4h.bias": "pytorch_model-00001-of-00005.bin",
37
+ "gpt_neox.layers.1.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00005.bin",
38
+ "gpt_neox.layers.1.post_attention_layernorm.bias": "pytorch_model-00001-of-00005.bin",
39
+ "gpt_neox.layers.1.post_attention_layernorm.weight": "pytorch_model-00001-of-00005.bin",
40
+ "gpt_neox.layers.10.attention.bias": "pytorch_model-00001-of-00005.bin",
41
+ "gpt_neox.layers.10.attention.dense.bias": "pytorch_model-00002-of-00005.bin",
42
+ "gpt_neox.layers.10.attention.dense.weight": "pytorch_model-00002-of-00005.bin",
43
+ "gpt_neox.layers.10.attention.masked_bias": "pytorch_model-00001-of-00005.bin",
44
+ "gpt_neox.layers.10.attention.query_key_value.bias": "pytorch_model-00001-of-00005.bin",
45
+ "gpt_neox.layers.10.attention.query_key_value.weight": "pytorch_model-00001-of-00005.bin",
46
+ "gpt_neox.layers.10.attention.rotary_emb.inv_freq": "pytorch_model-00001-of-00005.bin",
47
+ "gpt_neox.layers.10.input_layernorm.bias": "pytorch_model-00001-of-00005.bin",
48
+ "gpt_neox.layers.10.input_layernorm.weight": "pytorch_model-00001-of-00005.bin",
49
+ "gpt_neox.layers.10.mlp.dense_4h_to_h.bias": "pytorch_model-00002-of-00005.bin",
50
+ "gpt_neox.layers.10.mlp.dense_4h_to_h.weight": "pytorch_model-00002-of-00005.bin",
51
+ "gpt_neox.layers.10.mlp.dense_h_to_4h.bias": "pytorch_model-00002-of-00005.bin",
52
+ "gpt_neox.layers.10.mlp.dense_h_to_4h.weight": "pytorch_model-00002-of-00005.bin",
53
+ "gpt_neox.layers.10.post_attention_layernorm.bias": "pytorch_model-00001-of-00005.bin",
54
+ "gpt_neox.layers.10.post_attention_layernorm.weight": "pytorch_model-00001-of-00005.bin",
55
+ "gpt_neox.layers.11.attention.bias": "pytorch_model-00002-of-00005.bin",
56
+ "gpt_neox.layers.11.attention.dense.bias": "pytorch_model-00002-of-00005.bin",
57
+ "gpt_neox.layers.11.attention.dense.weight": "pytorch_model-00002-of-00005.bin",
58
+ "gpt_neox.layers.11.attention.masked_bias": "pytorch_model-00002-of-00005.bin",
59
+ "gpt_neox.layers.11.attention.query_key_value.bias": "pytorch_model-00002-of-00005.bin",
60
+ "gpt_neox.layers.11.attention.query_key_value.weight": "pytorch_model-00002-of-00005.bin",
61
+ "gpt_neox.layers.11.attention.rotary_emb.inv_freq": "pytorch_model-00002-of-00005.bin",
62
+ "gpt_neox.layers.11.input_layernorm.bias": "pytorch_model-00002-of-00005.bin",
63
+ "gpt_neox.layers.11.input_layernorm.weight": "pytorch_model-00002-of-00005.bin",
64
+ "gpt_neox.layers.11.mlp.dense_4h_to_h.bias": "pytorch_model-00002-of-00005.bin",
65
+ "gpt_neox.layers.11.mlp.dense_4h_to_h.weight": "pytorch_model-00002-of-00005.bin",
66
+ "gpt_neox.layers.11.mlp.dense_h_to_4h.bias": "pytorch_model-00002-of-00005.bin",
67
+ "gpt_neox.layers.11.mlp.dense_h_to_4h.weight": "pytorch_model-00002-of-00005.bin",
68
+ "gpt_neox.layers.11.post_attention_layernorm.bias": "pytorch_model-00002-of-00005.bin",
69
+ "gpt_neox.layers.11.post_attention_layernorm.weight": "pytorch_model-00002-of-00005.bin",
70
+ "gpt_neox.layers.12.attention.bias": "pytorch_model-00002-of-00005.bin",
71
+ "gpt_neox.layers.12.attention.dense.bias": "pytorch_model-00002-of-00005.bin",
72
+ "gpt_neox.layers.12.attention.dense.weight": "pytorch_model-00002-of-00005.bin",
73
+ "gpt_neox.layers.12.attention.masked_bias": "pytorch_model-00002-of-00005.bin",
74
+ "gpt_neox.layers.12.attention.query_key_value.bias": "pytorch_model-00002-of-00005.bin",
75
+ "gpt_neox.layers.12.attention.query_key_value.weight": "pytorch_model-00002-of-00005.bin",
76
+ "gpt_neox.layers.12.attention.rotary_emb.inv_freq": "pytorch_model-00002-of-00005.bin",
77
+ "gpt_neox.layers.12.input_layernorm.bias": "pytorch_model-00002-of-00005.bin",
78
+ "gpt_neox.layers.12.input_layernorm.weight": "pytorch_model-00002-of-00005.bin",
79
+ "gpt_neox.layers.12.mlp.dense_4h_to_h.bias": "pytorch_model-00002-of-00005.bin",
80
+ "gpt_neox.layers.12.mlp.dense_4h_to_h.weight": "pytorch_model-00002-of-00005.bin",
81
+ "gpt_neox.layers.12.mlp.dense_h_to_4h.bias": "pytorch_model-00002-of-00005.bin",
82
+ "gpt_neox.layers.12.mlp.dense_h_to_4h.weight": "pytorch_model-00002-of-00005.bin",
83
+ "gpt_neox.layers.12.post_attention_layernorm.bias": "pytorch_model-00002-of-00005.bin",
84
+ "gpt_neox.layers.12.post_attention_layernorm.weight": "pytorch_model-00002-of-00005.bin",
85
+ "gpt_neox.layers.13.attention.bias": "pytorch_model-00002-of-00005.bin",
86
+ "gpt_neox.layers.13.attention.dense.bias": "pytorch_model-00002-of-00005.bin",
87
+ "gpt_neox.layers.13.attention.dense.weight": "pytorch_model-00002-of-00005.bin",
88
+ "gpt_neox.layers.13.attention.masked_bias": "pytorch_model-00002-of-00005.bin",
89
+ "gpt_neox.layers.13.attention.query_key_value.bias": "pytorch_model-00002-of-00005.bin",
90
+ "gpt_neox.layers.13.attention.query_key_value.weight": "pytorch_model-00002-of-00005.bin",
91
+ "gpt_neox.layers.13.attention.rotary_emb.inv_freq": "pytorch_model-00002-of-00005.bin",
92
+ "gpt_neox.layers.13.input_layernorm.bias": "pytorch_model-00002-of-00005.bin",
93
+ "gpt_neox.layers.13.input_layernorm.weight": "pytorch_model-00002-of-00005.bin",
94
+ "gpt_neox.layers.13.mlp.dense_4h_to_h.bias": "pytorch_model-00002-of-00005.bin",
95
+ "gpt_neox.layers.13.mlp.dense_4h_to_h.weight": "pytorch_model-00002-of-00005.bin",
96
+ "gpt_neox.layers.13.mlp.dense_h_to_4h.bias": "pytorch_model-00002-of-00005.bin",
97
+ "gpt_neox.layers.13.mlp.dense_h_to_4h.weight": "pytorch_model-00002-of-00005.bin",
98
+ "gpt_neox.layers.13.post_attention_layernorm.bias": "pytorch_model-00002-of-00005.bin",
99
+ "gpt_neox.layers.13.post_attention_layernorm.weight": "pytorch_model-00002-of-00005.bin",
100
+ "gpt_neox.layers.14.attention.bias": "pytorch_model-00002-of-00005.bin",
101
+ "gpt_neox.layers.14.attention.dense.bias": "pytorch_model-00002-of-00005.bin",
102
+ "gpt_neox.layers.14.attention.dense.weight": "pytorch_model-00002-of-00005.bin",
103
+ "gpt_neox.layers.14.attention.masked_bias": "pytorch_model-00002-of-00005.bin",
104
+ "gpt_neox.layers.14.attention.query_key_value.bias": "pytorch_model-00002-of-00005.bin",
105
+ "gpt_neox.layers.14.attention.query_key_value.weight": "pytorch_model-00002-of-00005.bin",
106
+ "gpt_neox.layers.14.attention.rotary_emb.inv_freq": "pytorch_model-00002-of-00005.bin",
107
+ "gpt_neox.layers.14.input_layernorm.bias": "pytorch_model-00002-of-00005.bin",
108
+ "gpt_neox.layers.14.input_layernorm.weight": "pytorch_model-00002-of-00005.bin",
109
+ "gpt_neox.layers.14.mlp.dense_4h_to_h.bias": "pytorch_model-00002-of-00005.bin",
110
+ "gpt_neox.layers.14.mlp.dense_4h_to_h.weight": "pytorch_model-00002-of-00005.bin",
111
+ "gpt_neox.layers.14.mlp.dense_h_to_4h.bias": "pytorch_model-00002-of-00005.bin",
112
+ "gpt_neox.layers.14.mlp.dense_h_to_4h.weight": "pytorch_model-00002-of-00005.bin",
113
+ "gpt_neox.layers.14.post_attention_layernorm.bias": "pytorch_model-00002-of-00005.bin",
114
+ "gpt_neox.layers.14.post_attention_layernorm.weight": "pytorch_model-00002-of-00005.bin",
115
+ "gpt_neox.layers.15.attention.bias": "pytorch_model-00002-of-00005.bin",
116
+ "gpt_neox.layers.15.attention.dense.bias": "pytorch_model-00002-of-00005.bin",
117
+ "gpt_neox.layers.15.attention.dense.weight": "pytorch_model-00002-of-00005.bin",
118
+ "gpt_neox.layers.15.attention.masked_bias": "pytorch_model-00002-of-00005.bin",
119
+ "gpt_neox.layers.15.attention.query_key_value.bias": "pytorch_model-00002-of-00005.bin",
120
+ "gpt_neox.layers.15.attention.query_key_value.weight": "pytorch_model-00002-of-00005.bin",
121
+ "gpt_neox.layers.15.attention.rotary_emb.inv_freq": "pytorch_model-00002-of-00005.bin",
122
+ "gpt_neox.layers.15.input_layernorm.bias": "pytorch_model-00002-of-00005.bin",
123
+ "gpt_neox.layers.15.input_layernorm.weight": "pytorch_model-00002-of-00005.bin",
124
+ "gpt_neox.layers.15.mlp.dense_4h_to_h.bias": "pytorch_model-00002-of-00005.bin",
125
+ "gpt_neox.layers.15.mlp.dense_4h_to_h.weight": "pytorch_model-00002-of-00005.bin",
126
+ "gpt_neox.layers.15.mlp.dense_h_to_4h.bias": "pytorch_model-00002-of-00005.bin",
127
+ "gpt_neox.layers.15.mlp.dense_h_to_4h.weight": "pytorch_model-00002-of-00005.bin",
128
+ "gpt_neox.layers.15.post_attention_layernorm.bias": "pytorch_model-00002-of-00005.bin",
129
+ "gpt_neox.layers.15.post_attention_layernorm.weight": "pytorch_model-00002-of-00005.bin",
130
+ "gpt_neox.layers.16.attention.bias": "pytorch_model-00002-of-00005.bin",
131
+ "gpt_neox.layers.16.attention.dense.bias": "pytorch_model-00002-of-00005.bin",
132
+ "gpt_neox.layers.16.attention.dense.weight": "pytorch_model-00002-of-00005.bin",
133
+ "gpt_neox.layers.16.attention.masked_bias": "pytorch_model-00002-of-00005.bin",
134
+ "gpt_neox.layers.16.attention.query_key_value.bias": "pytorch_model-00002-of-00005.bin",
135
+ "gpt_neox.layers.16.attention.query_key_value.weight": "pytorch_model-00002-of-00005.bin",
136
+ "gpt_neox.layers.16.attention.rotary_emb.inv_freq": "pytorch_model-00002-of-00005.bin",
137
+ "gpt_neox.layers.16.input_layernorm.bias": "pytorch_model-00002-of-00005.bin",
138
+ "gpt_neox.layers.16.input_layernorm.weight": "pytorch_model-00002-of-00005.bin",
139
+ "gpt_neox.layers.16.mlp.dense_4h_to_h.bias": "pytorch_model-00002-of-00005.bin",
140
+ "gpt_neox.layers.16.mlp.dense_4h_to_h.weight": "pytorch_model-00002-of-00005.bin",
141
+ "gpt_neox.layers.16.mlp.dense_h_to_4h.bias": "pytorch_model-00002-of-00005.bin",
142
+ "gpt_neox.layers.16.mlp.dense_h_to_4h.weight": "pytorch_model-00002-of-00005.bin",
143
+ "gpt_neox.layers.16.post_attention_layernorm.bias": "pytorch_model-00002-of-00005.bin",
144
+ "gpt_neox.layers.16.post_attention_layernorm.weight": "pytorch_model-00002-of-00005.bin",
145
+ "gpt_neox.layers.17.attention.bias": "pytorch_model-00002-of-00005.bin",
146
+ "gpt_neox.layers.17.attention.dense.bias": "pytorch_model-00002-of-00005.bin",
147
+ "gpt_neox.layers.17.attention.dense.weight": "pytorch_model-00002-of-00005.bin",
148
+ "gpt_neox.layers.17.attention.masked_bias": "pytorch_model-00002-of-00005.bin",
149
+ "gpt_neox.layers.17.attention.query_key_value.bias": "pytorch_model-00002-of-00005.bin",
150
+ "gpt_neox.layers.17.attention.query_key_value.weight": "pytorch_model-00002-of-00005.bin",
151
+ "gpt_neox.layers.17.attention.rotary_emb.inv_freq": "pytorch_model-00002-of-00005.bin",
152
+ "gpt_neox.layers.17.input_layernorm.bias": "pytorch_model-00002-of-00005.bin",
153
+ "gpt_neox.layers.17.input_layernorm.weight": "pytorch_model-00002-of-00005.bin",
154
+ "gpt_neox.layers.17.mlp.dense_4h_to_h.bias": "pytorch_model-00002-of-00005.bin",
155
+ "gpt_neox.layers.17.mlp.dense_4h_to_h.weight": "pytorch_model-00002-of-00005.bin",
156
+ "gpt_neox.layers.17.mlp.dense_h_to_4h.bias": "pytorch_model-00002-of-00005.bin",
157
+ "gpt_neox.layers.17.mlp.dense_h_to_4h.weight": "pytorch_model-00002-of-00005.bin",
158
+ "gpt_neox.layers.17.post_attention_layernorm.bias": "pytorch_model-00002-of-00005.bin",
159
+ "gpt_neox.layers.17.post_attention_layernorm.weight": "pytorch_model-00002-of-00005.bin",
160
+ "gpt_neox.layers.18.attention.bias": "pytorch_model-00002-of-00005.bin",
161
+ "gpt_neox.layers.18.attention.dense.bias": "pytorch_model-00002-of-00005.bin",
162
+ "gpt_neox.layers.18.attention.dense.weight": "pytorch_model-00002-of-00005.bin",
163
+ "gpt_neox.layers.18.attention.masked_bias": "pytorch_model-00002-of-00005.bin",
164
+ "gpt_neox.layers.18.attention.query_key_value.bias": "pytorch_model-00002-of-00005.bin",
165
+ "gpt_neox.layers.18.attention.query_key_value.weight": "pytorch_model-00002-of-00005.bin",
166
+ "gpt_neox.layers.18.attention.rotary_emb.inv_freq": "pytorch_model-00002-of-00005.bin",
167
+ "gpt_neox.layers.18.input_layernorm.bias": "pytorch_model-00002-of-00005.bin",
168
+ "gpt_neox.layers.18.input_layernorm.weight": "pytorch_model-00002-of-00005.bin",
169
+ "gpt_neox.layers.18.mlp.dense_4h_to_h.bias": "pytorch_model-00002-of-00005.bin",
170
+ "gpt_neox.layers.18.mlp.dense_4h_to_h.weight": "pytorch_model-00002-of-00005.bin",
171
+ "gpt_neox.layers.18.mlp.dense_h_to_4h.bias": "pytorch_model-00002-of-00005.bin",
172
+ "gpt_neox.layers.18.mlp.dense_h_to_4h.weight": "pytorch_model-00002-of-00005.bin",
173
+ "gpt_neox.layers.18.post_attention_layernorm.bias": "pytorch_model-00002-of-00005.bin",
174
+ "gpt_neox.layers.18.post_attention_layernorm.weight": "pytorch_model-00002-of-00005.bin",
175
+ "gpt_neox.layers.19.attention.bias": "pytorch_model-00002-of-00005.bin",
176
+ "gpt_neox.layers.19.attention.dense.bias": "pytorch_model-00002-of-00005.bin",
177
+ "gpt_neox.layers.19.attention.dense.weight": "pytorch_model-00002-of-00005.bin",
178
+ "gpt_neox.layers.19.attention.masked_bias": "pytorch_model-00002-of-00005.bin",
179
+ "gpt_neox.layers.19.attention.query_key_value.bias": "pytorch_model-00002-of-00005.bin",
180
+ "gpt_neox.layers.19.attention.query_key_value.weight": "pytorch_model-00002-of-00005.bin",
181
+ "gpt_neox.layers.19.attention.rotary_emb.inv_freq": "pytorch_model-00002-of-00005.bin",
182
+ "gpt_neox.layers.19.input_layernorm.bias": "pytorch_model-00002-of-00005.bin",
183
+ "gpt_neox.layers.19.input_layernorm.weight": "pytorch_model-00002-of-00005.bin",
184
+ "gpt_neox.layers.19.mlp.dense_4h_to_h.bias": "pytorch_model-00002-of-00005.bin",
185
+ "gpt_neox.layers.19.mlp.dense_4h_to_h.weight": "pytorch_model-00002-of-00005.bin",
186
+ "gpt_neox.layers.19.mlp.dense_h_to_4h.bias": "pytorch_model-00002-of-00005.bin",
187
+ "gpt_neox.layers.19.mlp.dense_h_to_4h.weight": "pytorch_model-00002-of-00005.bin",
188
+ "gpt_neox.layers.19.post_attention_layernorm.bias": "pytorch_model-00002-of-00005.bin",
189
+ "gpt_neox.layers.19.post_attention_layernorm.weight": "pytorch_model-00002-of-00005.bin",
190
+ "gpt_neox.layers.2.attention.bias": "pytorch_model-00001-of-00005.bin",
191
+ "gpt_neox.layers.2.attention.dense.bias": "pytorch_model-00001-of-00005.bin",
192
+ "gpt_neox.layers.2.attention.dense.weight": "pytorch_model-00001-of-00005.bin",
193
+ "gpt_neox.layers.2.attention.masked_bias": "pytorch_model-00001-of-00005.bin",
194
+ "gpt_neox.layers.2.attention.query_key_value.bias": "pytorch_model-00001-of-00005.bin",
195
+ "gpt_neox.layers.2.attention.query_key_value.weight": "pytorch_model-00001-of-00005.bin",
196
+ "gpt_neox.layers.2.attention.rotary_emb.inv_freq": "pytorch_model-00001-of-00005.bin",
197
+ "gpt_neox.layers.2.input_layernorm.bias": "pytorch_model-00001-of-00005.bin",
198
+ "gpt_neox.layers.2.input_layernorm.weight": "pytorch_model-00001-of-00005.bin",
199
+ "gpt_neox.layers.2.mlp.dense_4h_to_h.bias": "pytorch_model-00001-of-00005.bin",
200
+ "gpt_neox.layers.2.mlp.dense_4h_to_h.weight": "pytorch_model-00001-of-00005.bin",
201
+ "gpt_neox.layers.2.mlp.dense_h_to_4h.bias": "pytorch_model-00001-of-00005.bin",
202
+ "gpt_neox.layers.2.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00005.bin",
203
+ "gpt_neox.layers.2.post_attention_layernorm.bias": "pytorch_model-00001-of-00005.bin",
204
+ "gpt_neox.layers.2.post_attention_layernorm.weight": "pytorch_model-00001-of-00005.bin",
205
+ "gpt_neox.layers.20.attention.bias": "pytorch_model-00002-of-00005.bin",
206
+ "gpt_neox.layers.20.attention.dense.bias": "pytorch_model-00002-of-00005.bin",
207
+ "gpt_neox.layers.20.attention.dense.weight": "pytorch_model-00002-of-00005.bin",
208
+ "gpt_neox.layers.20.attention.masked_bias": "pytorch_model-00002-of-00005.bin",
209
+ "gpt_neox.layers.20.attention.query_key_value.bias": "pytorch_model-00002-of-00005.bin",
210
+ "gpt_neox.layers.20.attention.query_key_value.weight": "pytorch_model-00002-of-00005.bin",
211
+ "gpt_neox.layers.20.attention.rotary_emb.inv_freq": "pytorch_model-00002-of-00005.bin",
212
+ "gpt_neox.layers.20.input_layernorm.bias": "pytorch_model-00002-of-00005.bin",
213
+ "gpt_neox.layers.20.input_layernorm.weight": "pytorch_model-00002-of-00005.bin",
214
+ "gpt_neox.layers.20.mlp.dense_4h_to_h.bias": "pytorch_model-00002-of-00005.bin",
215
+ "gpt_neox.layers.20.mlp.dense_4h_to_h.weight": "pytorch_model-00002-of-00005.bin",
216
+ "gpt_neox.layers.20.mlp.dense_h_to_4h.bias": "pytorch_model-00002-of-00005.bin",
217
+ "gpt_neox.layers.20.mlp.dense_h_to_4h.weight": "pytorch_model-00002-of-00005.bin",
218
+ "gpt_neox.layers.20.post_attention_layernorm.bias": "pytorch_model-00002-of-00005.bin",
219
+ "gpt_neox.layers.20.post_attention_layernorm.weight": "pytorch_model-00002-of-00005.bin",
220
+ "gpt_neox.layers.21.attention.bias": "pytorch_model-00002-of-00005.bin",
221
+ "gpt_neox.layers.21.attention.dense.bias": "pytorch_model-00003-of-00005.bin",
222
+ "gpt_neox.layers.21.attention.dense.weight": "pytorch_model-00003-of-00005.bin",
223
+ "gpt_neox.layers.21.attention.masked_bias": "pytorch_model-00002-of-00005.bin",
224
+ "gpt_neox.layers.21.attention.query_key_value.bias": "pytorch_model-00003-of-00005.bin",
225
+ "gpt_neox.layers.21.attention.query_key_value.weight": "pytorch_model-00003-of-00005.bin",
226
+ "gpt_neox.layers.21.attention.rotary_emb.inv_freq": "pytorch_model-00002-of-00005.bin",
227
+ "gpt_neox.layers.21.input_layernorm.bias": "pytorch_model-00002-of-00005.bin",
228
+ "gpt_neox.layers.21.input_layernorm.weight": "pytorch_model-00002-of-00005.bin",
229
+ "gpt_neox.layers.21.mlp.dense_4h_to_h.bias": "pytorch_model-00003-of-00005.bin",
230
+ "gpt_neox.layers.21.mlp.dense_4h_to_h.weight": "pytorch_model-00003-of-00005.bin",
231
+ "gpt_neox.layers.21.mlp.dense_h_to_4h.bias": "pytorch_model-00003-of-00005.bin",
232
+ "gpt_neox.layers.21.mlp.dense_h_to_4h.weight": "pytorch_model-00003-of-00005.bin",
233
+ "gpt_neox.layers.21.post_attention_layernorm.bias": "pytorch_model-00002-of-00005.bin",
234
+ "gpt_neox.layers.21.post_attention_layernorm.weight": "pytorch_model-00002-of-00005.bin",
235
+ "gpt_neox.layers.22.attention.bias": "pytorch_model-00003-of-00005.bin",
236
+ "gpt_neox.layers.22.attention.dense.bias": "pytorch_model-00003-of-00005.bin",
237
+ "gpt_neox.layers.22.attention.dense.weight": "pytorch_model-00003-of-00005.bin",
238
+ "gpt_neox.layers.22.attention.masked_bias": "pytorch_model-00003-of-00005.bin",
239
+ "gpt_neox.layers.22.attention.query_key_value.bias": "pytorch_model-00003-of-00005.bin",
240
+ "gpt_neox.layers.22.attention.query_key_value.weight": "pytorch_model-00003-of-00005.bin",
241
+ "gpt_neox.layers.22.attention.rotary_emb.inv_freq": "pytorch_model-00003-of-00005.bin",
242
+ "gpt_neox.layers.22.input_layernorm.bias": "pytorch_model-00003-of-00005.bin",
243
+ "gpt_neox.layers.22.input_layernorm.weight": "pytorch_model-00003-of-00005.bin",
244
+ "gpt_neox.layers.22.mlp.dense_4h_to_h.bias": "pytorch_model-00003-of-00005.bin",
245
+ "gpt_neox.layers.22.mlp.dense_4h_to_h.weight": "pytorch_model-00003-of-00005.bin",
246
+ "gpt_neox.layers.22.mlp.dense_h_to_4h.bias": "pytorch_model-00003-of-00005.bin",
247
+ "gpt_neox.layers.22.mlp.dense_h_to_4h.weight": "pytorch_model-00003-of-00005.bin",
248
+ "gpt_neox.layers.22.post_attention_layernorm.bias": "pytorch_model-00003-of-00005.bin",
249
+ "gpt_neox.layers.22.post_attention_layernorm.weight": "pytorch_model-00003-of-00005.bin",
250
+ "gpt_neox.layers.23.attention.bias": "pytorch_model-00003-of-00005.bin",
251
+ "gpt_neox.layers.23.attention.dense.bias": "pytorch_model-00003-of-00005.bin",
252
+ "gpt_neox.layers.23.attention.dense.weight": "pytorch_model-00003-of-00005.bin",
253
+ "gpt_neox.layers.23.attention.masked_bias": "pytorch_model-00003-of-00005.bin",
254
+ "gpt_neox.layers.23.attention.query_key_value.bias": "pytorch_model-00003-of-00005.bin",
255
+ "gpt_neox.layers.23.attention.query_key_value.weight": "pytorch_model-00003-of-00005.bin",
256
+ "gpt_neox.layers.23.attention.rotary_emb.inv_freq": "pytorch_model-00003-of-00005.bin",
257
+ "gpt_neox.layers.23.input_layernorm.bias": "pytorch_model-00003-of-00005.bin",
258
+ "gpt_neox.layers.23.input_layernorm.weight": "pytorch_model-00003-of-00005.bin",
259
+ "gpt_neox.layers.23.mlp.dense_4h_to_h.bias": "pytorch_model-00003-of-00005.bin",
260
+ "gpt_neox.layers.23.mlp.dense_4h_to_h.weight": "pytorch_model-00003-of-00005.bin",
261
+ "gpt_neox.layers.23.mlp.dense_h_to_4h.bias": "pytorch_model-00003-of-00005.bin",
262
+ "gpt_neox.layers.23.mlp.dense_h_to_4h.weight": "pytorch_model-00003-of-00005.bin",
263
+ "gpt_neox.layers.23.post_attention_layernorm.bias": "pytorch_model-00003-of-00005.bin",
264
+ "gpt_neox.layers.23.post_attention_layernorm.weight": "pytorch_model-00003-of-00005.bin",
265
+ "gpt_neox.layers.24.attention.bias": "pytorch_model-00003-of-00005.bin",
266
+ "gpt_neox.layers.24.attention.dense.bias": "pytorch_model-00003-of-00005.bin",
267
+ "gpt_neox.layers.24.attention.dense.weight": "pytorch_model-00003-of-00005.bin",
268
+ "gpt_neox.layers.24.attention.masked_bias": "pytorch_model-00003-of-00005.bin",
269
+ "gpt_neox.layers.24.attention.query_key_value.bias": "pytorch_model-00003-of-00005.bin",
270
+ "gpt_neox.layers.24.attention.query_key_value.weight": "pytorch_model-00003-of-00005.bin",
271
+ "gpt_neox.layers.24.attention.rotary_emb.inv_freq": "pytorch_model-00003-of-00005.bin",
272
+ "gpt_neox.layers.24.input_layernorm.bias": "pytorch_model-00003-of-00005.bin",
273
+ "gpt_neox.layers.24.input_layernorm.weight": "pytorch_model-00003-of-00005.bin",
274
+ "gpt_neox.layers.24.mlp.dense_4h_to_h.bias": "pytorch_model-00003-of-00005.bin",
275
+ "gpt_neox.layers.24.mlp.dense_4h_to_h.weight": "pytorch_model-00003-of-00005.bin",
276
+ "gpt_neox.layers.24.mlp.dense_h_to_4h.bias": "pytorch_model-00003-of-00005.bin",
277
+ "gpt_neox.layers.24.mlp.dense_h_to_4h.weight": "pytorch_model-00003-of-00005.bin",
278
+ "gpt_neox.layers.24.post_attention_layernorm.bias": "pytorch_model-00003-of-00005.bin",
279
+ "gpt_neox.layers.24.post_attention_layernorm.weight": "pytorch_model-00003-of-00005.bin",
280
+ "gpt_neox.layers.25.attention.bias": "pytorch_model-00003-of-00005.bin",
281
+ "gpt_neox.layers.25.attention.dense.bias": "pytorch_model-00003-of-00005.bin",
282
+ "gpt_neox.layers.25.attention.dense.weight": "pytorch_model-00003-of-00005.bin",
283
+ "gpt_neox.layers.25.attention.masked_bias": "pytorch_model-00003-of-00005.bin",
284
+ "gpt_neox.layers.25.attention.query_key_value.bias": "pytorch_model-00003-of-00005.bin",
285
+ "gpt_neox.layers.25.attention.query_key_value.weight": "pytorch_model-00003-of-00005.bin",
286
+ "gpt_neox.layers.25.attention.rotary_emb.inv_freq": "pytorch_model-00003-of-00005.bin",
287
+ "gpt_neox.layers.25.input_layernorm.bias": "pytorch_model-00003-of-00005.bin",
288
+ "gpt_neox.layers.25.input_layernorm.weight": "pytorch_model-00003-of-00005.bin",
289
+ "gpt_neox.layers.25.mlp.dense_4h_to_h.bias": "pytorch_model-00003-of-00005.bin",
290
+ "gpt_neox.layers.25.mlp.dense_4h_to_h.weight": "pytorch_model-00003-of-00005.bin",
291
+ "gpt_neox.layers.25.mlp.dense_h_to_4h.bias": "pytorch_model-00003-of-00005.bin",
292
+ "gpt_neox.layers.25.mlp.dense_h_to_4h.weight": "pytorch_model-00003-of-00005.bin",
293
+ "gpt_neox.layers.25.post_attention_layernorm.bias": "pytorch_model-00003-of-00005.bin",
294
+ "gpt_neox.layers.25.post_attention_layernorm.weight": "pytorch_model-00003-of-00005.bin",
295
+ "gpt_neox.layers.26.attention.bias": "pytorch_model-00003-of-00005.bin",
296
+ "gpt_neox.layers.26.attention.dense.bias": "pytorch_model-00003-of-00005.bin",
297
+ "gpt_neox.layers.26.attention.dense.weight": "pytorch_model-00003-of-00005.bin",
298
+ "gpt_neox.layers.26.attention.masked_bias": "pytorch_model-00003-of-00005.bin",
299
+ "gpt_neox.layers.26.attention.query_key_value.bias": "pytorch_model-00003-of-00005.bin",
300
+ "gpt_neox.layers.26.attention.query_key_value.weight": "pytorch_model-00003-of-00005.bin",
301
+ "gpt_neox.layers.26.attention.rotary_emb.inv_freq": "pytorch_model-00003-of-00005.bin",
302
+ "gpt_neox.layers.26.input_layernorm.bias": "pytorch_model-00003-of-00005.bin",
303
+ "gpt_neox.layers.26.input_layernorm.weight": "pytorch_model-00003-of-00005.bin",
304
+ "gpt_neox.layers.26.mlp.dense_4h_to_h.bias": "pytorch_model-00003-of-00005.bin",
305
+ "gpt_neox.layers.26.mlp.dense_4h_to_h.weight": "pytorch_model-00003-of-00005.bin",
306
+ "gpt_neox.layers.26.mlp.dense_h_to_4h.bias": "pytorch_model-00003-of-00005.bin",
307
+ "gpt_neox.layers.26.mlp.dense_h_to_4h.weight": "pytorch_model-00003-of-00005.bin",
308
+ "gpt_neox.layers.26.post_attention_layernorm.bias": "pytorch_model-00003-of-00005.bin",
309
+ "gpt_neox.layers.26.post_attention_layernorm.weight": "pytorch_model-00003-of-00005.bin",
310
+ "gpt_neox.layers.27.attention.bias": "pytorch_model-00003-of-00005.bin",
311
+ "gpt_neox.layers.27.attention.dense.bias": "pytorch_model-00003-of-00005.bin",
312
+ "gpt_neox.layers.27.attention.dense.weight": "pytorch_model-00003-of-00005.bin",
313
+ "gpt_neox.layers.27.attention.masked_bias": "pytorch_model-00003-of-00005.bin",
314
+ "gpt_neox.layers.27.attention.query_key_value.bias": "pytorch_model-00003-of-00005.bin",
315
+ "gpt_neox.layers.27.attention.query_key_value.weight": "pytorch_model-00003-of-00005.bin",
316
+ "gpt_neox.layers.27.attention.rotary_emb.inv_freq": "pytorch_model-00003-of-00005.bin",
317
+ "gpt_neox.layers.27.input_layernorm.bias": "pytorch_model-00003-of-00005.bin",
318
+ "gpt_neox.layers.27.input_layernorm.weight": "pytorch_model-00003-of-00005.bin",
319
+ "gpt_neox.layers.27.mlp.dense_4h_to_h.bias": "pytorch_model-00003-of-00005.bin",
320
+ "gpt_neox.layers.27.mlp.dense_4h_to_h.weight": "pytorch_model-00003-of-00005.bin",
321
+ "gpt_neox.layers.27.mlp.dense_h_to_4h.bias": "pytorch_model-00003-of-00005.bin",
322
+ "gpt_neox.layers.27.mlp.dense_h_to_4h.weight": "pytorch_model-00003-of-00005.bin",
323
+ "gpt_neox.layers.27.post_attention_layernorm.bias": "pytorch_model-00003-of-00005.bin",
324
+ "gpt_neox.layers.27.post_attention_layernorm.weight": "pytorch_model-00003-of-00005.bin",
325
+ "gpt_neox.layers.28.attention.bias": "pytorch_model-00003-of-00005.bin",
326
+ "gpt_neox.layers.28.attention.dense.bias": "pytorch_model-00003-of-00005.bin",
327
+ "gpt_neox.layers.28.attention.dense.weight": "pytorch_model-00003-of-00005.bin",
328
+ "gpt_neox.layers.28.attention.masked_bias": "pytorch_model-00003-of-00005.bin",
329
+ "gpt_neox.layers.28.attention.query_key_value.bias": "pytorch_model-00003-of-00005.bin",
330
+ "gpt_neox.layers.28.attention.query_key_value.weight": "pytorch_model-00003-of-00005.bin",
331
+ "gpt_neox.layers.28.attention.rotary_emb.inv_freq": "pytorch_model-00003-of-00005.bin",
332
+ "gpt_neox.layers.28.input_layernorm.bias": "pytorch_model-00003-of-00005.bin",
333
+ "gpt_neox.layers.28.input_layernorm.weight": "pytorch_model-00003-of-00005.bin",
334
+ "gpt_neox.layers.28.mlp.dense_4h_to_h.bias": "pytorch_model-00003-of-00005.bin",
335
+ "gpt_neox.layers.28.mlp.dense_4h_to_h.weight": "pytorch_model-00003-of-00005.bin",
336
+ "gpt_neox.layers.28.mlp.dense_h_to_4h.bias": "pytorch_model-00003-of-00005.bin",
337
+ "gpt_neox.layers.28.mlp.dense_h_to_4h.weight": "pytorch_model-00003-of-00005.bin",
338
+ "gpt_neox.layers.28.post_attention_layernorm.bias": "pytorch_model-00003-of-00005.bin",
339
+ "gpt_neox.layers.28.post_attention_layernorm.weight": "pytorch_model-00003-of-00005.bin",
340
+ "gpt_neox.layers.29.attention.bias": "pytorch_model-00003-of-00005.bin",
341
+ "gpt_neox.layers.29.attention.dense.bias": "pytorch_model-00003-of-00005.bin",
342
+ "gpt_neox.layers.29.attention.dense.weight": "pytorch_model-00003-of-00005.bin",
343
+ "gpt_neox.layers.29.attention.masked_bias": "pytorch_model-00003-of-00005.bin",
344
+ "gpt_neox.layers.29.attention.query_key_value.bias": "pytorch_model-00003-of-00005.bin",
345
+ "gpt_neox.layers.29.attention.query_key_value.weight": "pytorch_model-00003-of-00005.bin",
346
+ "gpt_neox.layers.29.attention.rotary_emb.inv_freq": "pytorch_model-00003-of-00005.bin",
347
+ "gpt_neox.layers.29.input_layernorm.bias": "pytorch_model-00003-of-00005.bin",
348
+ "gpt_neox.layers.29.input_layernorm.weight": "pytorch_model-00003-of-00005.bin",
349
+ "gpt_neox.layers.29.mlp.dense_4h_to_h.bias": "pytorch_model-00003-of-00005.bin",
350
+ "gpt_neox.layers.29.mlp.dense_4h_to_h.weight": "pytorch_model-00003-of-00005.bin",
351
+ "gpt_neox.layers.29.mlp.dense_h_to_4h.bias": "pytorch_model-00003-of-00005.bin",
352
+ "gpt_neox.layers.29.mlp.dense_h_to_4h.weight": "pytorch_model-00003-of-00005.bin",
353
+ "gpt_neox.layers.29.post_attention_layernorm.bias": "pytorch_model-00003-of-00005.bin",
354
+ "gpt_neox.layers.29.post_attention_layernorm.weight": "pytorch_model-00003-of-00005.bin",
355
+ "gpt_neox.layers.3.attention.bias": "pytorch_model-00001-of-00005.bin",
356
+ "gpt_neox.layers.3.attention.dense.bias": "pytorch_model-00001-of-00005.bin",
357
+ "gpt_neox.layers.3.attention.dense.weight": "pytorch_model-00001-of-00005.bin",
358
+ "gpt_neox.layers.3.attention.masked_bias": "pytorch_model-00001-of-00005.bin",
359
+ "gpt_neox.layers.3.attention.query_key_value.bias": "pytorch_model-00001-of-00005.bin",
360
+ "gpt_neox.layers.3.attention.query_key_value.weight": "pytorch_model-00001-of-00005.bin",
361
+ "gpt_neox.layers.3.attention.rotary_emb.inv_freq": "pytorch_model-00001-of-00005.bin",
362
+ "gpt_neox.layers.3.input_layernorm.bias": "pytorch_model-00001-of-00005.bin",
363
+ "gpt_neox.layers.3.input_layernorm.weight": "pytorch_model-00001-of-00005.bin",
364
+ "gpt_neox.layers.3.mlp.dense_4h_to_h.bias": "pytorch_model-00001-of-00005.bin",
365
+ "gpt_neox.layers.3.mlp.dense_4h_to_h.weight": "pytorch_model-00001-of-00005.bin",
366
+ "gpt_neox.layers.3.mlp.dense_h_to_4h.bias": "pytorch_model-00001-of-00005.bin",
367
+ "gpt_neox.layers.3.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00005.bin",
368
+ "gpt_neox.layers.3.post_attention_layernorm.bias": "pytorch_model-00001-of-00005.bin",
369
+ "gpt_neox.layers.3.post_attention_layernorm.weight": "pytorch_model-00001-of-00005.bin",
370
+ "gpt_neox.layers.30.attention.bias": "pytorch_model-00003-of-00005.bin",
371
+ "gpt_neox.layers.30.attention.dense.bias": "pytorch_model-00003-of-00005.bin",
372
+ "gpt_neox.layers.30.attention.dense.weight": "pytorch_model-00003-of-00005.bin",
373
+ "gpt_neox.layers.30.attention.masked_bias": "pytorch_model-00003-of-00005.bin",
374
+ "gpt_neox.layers.30.attention.query_key_value.bias": "pytorch_model-00003-of-00005.bin",
375
+ "gpt_neox.layers.30.attention.query_key_value.weight": "pytorch_model-00003-of-00005.bin",
376
+ "gpt_neox.layers.30.attention.rotary_emb.inv_freq": "pytorch_model-00003-of-00005.bin",
377
+ "gpt_neox.layers.30.input_layernorm.bias": "pytorch_model-00003-of-00005.bin",
378
+ "gpt_neox.layers.30.input_layernorm.weight": "pytorch_model-00003-of-00005.bin",
379
+ "gpt_neox.layers.30.mlp.dense_4h_to_h.bias": "pytorch_model-00003-of-00005.bin",
380
+ "gpt_neox.layers.30.mlp.dense_4h_to_h.weight": "pytorch_model-00003-of-00005.bin",
381
+ "gpt_neox.layers.30.mlp.dense_h_to_4h.bias": "pytorch_model-00003-of-00005.bin",
382
+ "gpt_neox.layers.30.mlp.dense_h_to_4h.weight": "pytorch_model-00003-of-00005.bin",
383
+ "gpt_neox.layers.30.post_attention_layernorm.bias": "pytorch_model-00003-of-00005.bin",
384
+ "gpt_neox.layers.30.post_attention_layernorm.weight": "pytorch_model-00003-of-00005.bin",
385
+ "gpt_neox.layers.31.attention.bias": "pytorch_model-00003-of-00005.bin",
386
+ "gpt_neox.layers.31.attention.dense.bias": "pytorch_model-00003-of-00005.bin",
387
+ "gpt_neox.layers.31.attention.dense.weight": "pytorch_model-00003-of-00005.bin",
388
+ "gpt_neox.layers.31.attention.masked_bias": "pytorch_model-00003-of-00005.bin",
389
+ "gpt_neox.layers.31.attention.query_key_value.bias": "pytorch_model-00003-of-00005.bin",
390
+ "gpt_neox.layers.31.attention.query_key_value.weight": "pytorch_model-00003-of-00005.bin",
391
+ "gpt_neox.layers.31.attention.rotary_emb.inv_freq": "pytorch_model-00003-of-00005.bin",
392
+ "gpt_neox.layers.31.input_layernorm.bias": "pytorch_model-00003-of-00005.bin",
393
+ "gpt_neox.layers.31.input_layernorm.weight": "pytorch_model-00003-of-00005.bin",
394
+ "gpt_neox.layers.31.mlp.dense_4h_to_h.bias": "pytorch_model-00004-of-00005.bin",
395
+ "gpt_neox.layers.31.mlp.dense_4h_to_h.weight": "pytorch_model-00004-of-00005.bin",
396
+ "gpt_neox.layers.31.mlp.dense_h_to_4h.bias": "pytorch_model-00003-of-00005.bin",
397
+ "gpt_neox.layers.31.mlp.dense_h_to_4h.weight": "pytorch_model-00003-of-00005.bin",
398
+ "gpt_neox.layers.31.post_attention_layernorm.bias": "pytorch_model-00003-of-00005.bin",
399
+ "gpt_neox.layers.31.post_attention_layernorm.weight": "pytorch_model-00003-of-00005.bin",
400
+ "gpt_neox.layers.32.attention.bias": "pytorch_model-00004-of-00005.bin",
401
+ "gpt_neox.layers.32.attention.dense.bias": "pytorch_model-00004-of-00005.bin",
402
+ "gpt_neox.layers.32.attention.dense.weight": "pytorch_model-00004-of-00005.bin",
403
+ "gpt_neox.layers.32.attention.masked_bias": "pytorch_model-00004-of-00005.bin",
404
+ "gpt_neox.layers.32.attention.query_key_value.bias": "pytorch_model-00004-of-00005.bin",
405
+ "gpt_neox.layers.32.attention.query_key_value.weight": "pytorch_model-00004-of-00005.bin",
406
+ "gpt_neox.layers.32.attention.rotary_emb.inv_freq": "pytorch_model-00004-of-00005.bin",
407
+ "gpt_neox.layers.32.input_layernorm.bias": "pytorch_model-00004-of-00005.bin",
408
+ "gpt_neox.layers.32.input_layernorm.weight": "pytorch_model-00004-of-00005.bin",
409
+ "gpt_neox.layers.32.mlp.dense_4h_to_h.bias": "pytorch_model-00004-of-00005.bin",
410
+ "gpt_neox.layers.32.mlp.dense_4h_to_h.weight": "pytorch_model-00004-of-00005.bin",
411
+ "gpt_neox.layers.32.mlp.dense_h_to_4h.bias": "pytorch_model-00004-of-00005.bin",
412
+ "gpt_neox.layers.32.mlp.dense_h_to_4h.weight": "pytorch_model-00004-of-00005.bin",
413
+ "gpt_neox.layers.32.post_attention_layernorm.bias": "pytorch_model-00004-of-00005.bin",
414
+ "gpt_neox.layers.32.post_attention_layernorm.weight": "pytorch_model-00004-of-00005.bin",
415
+ "gpt_neox.layers.33.attention.bias": "pytorch_model-00004-of-00005.bin",
416
+ "gpt_neox.layers.33.attention.dense.bias": "pytorch_model-00004-of-00005.bin",
417
+ "gpt_neox.layers.33.attention.dense.weight": "pytorch_model-00004-of-00005.bin",
418
+ "gpt_neox.layers.33.attention.masked_bias": "pytorch_model-00004-of-00005.bin",
419
+ "gpt_neox.layers.33.attention.query_key_value.bias": "pytorch_model-00004-of-00005.bin",
420
+ "gpt_neox.layers.33.attention.query_key_value.weight": "pytorch_model-00004-of-00005.bin",
421
+ "gpt_neox.layers.33.attention.rotary_emb.inv_freq": "pytorch_model-00004-of-00005.bin",
422
+ "gpt_neox.layers.33.input_layernorm.bias": "pytorch_model-00004-of-00005.bin",
423
+ "gpt_neox.layers.33.input_layernorm.weight": "pytorch_model-00004-of-00005.bin",
424
+ "gpt_neox.layers.33.mlp.dense_4h_to_h.bias": "pytorch_model-00004-of-00005.bin",
425
+ "gpt_neox.layers.33.mlp.dense_4h_to_h.weight": "pytorch_model-00004-of-00005.bin",
426
+ "gpt_neox.layers.33.mlp.dense_h_to_4h.bias": "pytorch_model-00004-of-00005.bin",
427
+ "gpt_neox.layers.33.mlp.dense_h_to_4h.weight": "pytorch_model-00004-of-00005.bin",
428
+ "gpt_neox.layers.33.post_attention_layernorm.bias": "pytorch_model-00004-of-00005.bin",
429
+ "gpt_neox.layers.33.post_attention_layernorm.weight": "pytorch_model-00004-of-00005.bin",
430
+ "gpt_neox.layers.34.attention.bias": "pytorch_model-00004-of-00005.bin",
431
+ "gpt_neox.layers.34.attention.dense.bias": "pytorch_model-00004-of-00005.bin",
432
+ "gpt_neox.layers.34.attention.dense.weight": "pytorch_model-00004-of-00005.bin",
433
+ "gpt_neox.layers.34.attention.masked_bias": "pytorch_model-00004-of-00005.bin",
434
+ "gpt_neox.layers.34.attention.query_key_value.bias": "pytorch_model-00004-of-00005.bin",
435
+ "gpt_neox.layers.34.attention.query_key_value.weight": "pytorch_model-00004-of-00005.bin",
436
+ "gpt_neox.layers.34.attention.rotary_emb.inv_freq": "pytorch_model-00004-of-00005.bin",
437
+ "gpt_neox.layers.34.input_layernorm.bias": "pytorch_model-00004-of-00005.bin",
438
+ "gpt_neox.layers.34.input_layernorm.weight": "pytorch_model-00004-of-00005.bin",
439
+ "gpt_neox.layers.34.mlp.dense_4h_to_h.bias": "pytorch_model-00004-of-00005.bin",
440
+ "gpt_neox.layers.34.mlp.dense_4h_to_h.weight": "pytorch_model-00004-of-00005.bin",
441
+ "gpt_neox.layers.34.mlp.dense_h_to_4h.bias": "pytorch_model-00004-of-00005.bin",
442
+ "gpt_neox.layers.34.mlp.dense_h_to_4h.weight": "pytorch_model-00004-of-00005.bin",
443
+ "gpt_neox.layers.34.post_attention_layernorm.bias": "pytorch_model-00004-of-00005.bin",
444
+ "gpt_neox.layers.34.post_attention_layernorm.weight": "pytorch_model-00004-of-00005.bin",
445
+ "gpt_neox.layers.35.attention.bias": "pytorch_model-00004-of-00005.bin",
446
+ "gpt_neox.layers.35.attention.dense.bias": "pytorch_model-00004-of-00005.bin",
447
+ "gpt_neox.layers.35.attention.dense.weight": "pytorch_model-00004-of-00005.bin",
448
+ "gpt_neox.layers.35.attention.masked_bias": "pytorch_model-00004-of-00005.bin",
449
+ "gpt_neox.layers.35.attention.query_key_value.bias": "pytorch_model-00004-of-00005.bin",
450
+ "gpt_neox.layers.35.attention.query_key_value.weight": "pytorch_model-00004-of-00005.bin",
451
+ "gpt_neox.layers.35.attention.rotary_emb.inv_freq": "pytorch_model-00004-of-00005.bin",
452
+ "gpt_neox.layers.35.input_layernorm.bias": "pytorch_model-00004-of-00005.bin",
453
+ "gpt_neox.layers.35.input_layernorm.weight": "pytorch_model-00004-of-00005.bin",
454
+ "gpt_neox.layers.35.mlp.dense_4h_to_h.bias": "pytorch_model-00004-of-00005.bin",
455
+ "gpt_neox.layers.35.mlp.dense_4h_to_h.weight": "pytorch_model-00004-of-00005.bin",
456
+ "gpt_neox.layers.35.mlp.dense_h_to_4h.bias": "pytorch_model-00004-of-00005.bin",
457
+ "gpt_neox.layers.35.mlp.dense_h_to_4h.weight": "pytorch_model-00004-of-00005.bin",
458
+ "gpt_neox.layers.35.post_attention_layernorm.bias": "pytorch_model-00004-of-00005.bin",
459
+ "gpt_neox.layers.35.post_attention_layernorm.weight": "pytorch_model-00004-of-00005.bin",
460
+ "gpt_neox.layers.36.attention.bias": "pytorch_model-00004-of-00005.bin",
461
+ "gpt_neox.layers.36.attention.dense.bias": "pytorch_model-00004-of-00005.bin",
462
+ "gpt_neox.layers.36.attention.dense.weight": "pytorch_model-00004-of-00005.bin",
463
+ "gpt_neox.layers.36.attention.masked_bias": "pytorch_model-00004-of-00005.bin",
464
+ "gpt_neox.layers.36.attention.query_key_value.bias": "pytorch_model-00004-of-00005.bin",
465
+ "gpt_neox.layers.36.attention.query_key_value.weight": "pytorch_model-00004-of-00005.bin",
466
+ "gpt_neox.layers.36.attention.rotary_emb.inv_freq": "pytorch_model-00004-of-00005.bin",
467
+ "gpt_neox.layers.36.input_layernorm.bias": "pytorch_model-00004-of-00005.bin",
468
+ "gpt_neox.layers.36.input_layernorm.weight": "pytorch_model-00004-of-00005.bin",
469
+ "gpt_neox.layers.36.mlp.dense_4h_to_h.bias": "pytorch_model-00004-of-00005.bin",
470
+ "gpt_neox.layers.36.mlp.dense_4h_to_h.weight": "pytorch_model-00004-of-00005.bin",
471
+ "gpt_neox.layers.36.mlp.dense_h_to_4h.bias": "pytorch_model-00004-of-00005.bin",
472
+ "gpt_neox.layers.36.mlp.dense_h_to_4h.weight": "pytorch_model-00004-of-00005.bin",
473
+ "gpt_neox.layers.36.post_attention_layernorm.bias": "pytorch_model-00004-of-00005.bin",
474
+ "gpt_neox.layers.36.post_attention_layernorm.weight": "pytorch_model-00004-of-00005.bin",
475
+ "gpt_neox.layers.37.attention.bias": "pytorch_model-00004-of-00005.bin",
476
+ "gpt_neox.layers.37.attention.dense.bias": "pytorch_model-00004-of-00005.bin",
477
+ "gpt_neox.layers.37.attention.dense.weight": "pytorch_model-00004-of-00005.bin",
478
+ "gpt_neox.layers.37.attention.masked_bias": "pytorch_model-00004-of-00005.bin",
479
+ "gpt_neox.layers.37.attention.query_key_value.bias": "pytorch_model-00004-of-00005.bin",
480
+ "gpt_neox.layers.37.attention.query_key_value.weight": "pytorch_model-00004-of-00005.bin",
481
+ "gpt_neox.layers.37.attention.rotary_emb.inv_freq": "pytorch_model-00004-of-00005.bin",
482
+ "gpt_neox.layers.37.input_layernorm.bias": "pytorch_model-00004-of-00005.bin",
483
+ "gpt_neox.layers.37.input_layernorm.weight": "pytorch_model-00004-of-00005.bin",
484
+ "gpt_neox.layers.37.mlp.dense_4h_to_h.bias": "pytorch_model-00004-of-00005.bin",
485
+ "gpt_neox.layers.37.mlp.dense_4h_to_h.weight": "pytorch_model-00004-of-00005.bin",
486
+ "gpt_neox.layers.37.mlp.dense_h_to_4h.bias": "pytorch_model-00004-of-00005.bin",
487
+ "gpt_neox.layers.37.mlp.dense_h_to_4h.weight": "pytorch_model-00004-of-00005.bin",
488
+ "gpt_neox.layers.37.post_attention_layernorm.bias": "pytorch_model-00004-of-00005.bin",
489
+ "gpt_neox.layers.37.post_attention_layernorm.weight": "pytorch_model-00004-of-00005.bin",
490
+ "gpt_neox.layers.38.attention.bias": "pytorch_model-00004-of-00005.bin",
491
+ "gpt_neox.layers.38.attention.dense.bias": "pytorch_model-00004-of-00005.bin",
492
+ "gpt_neox.layers.38.attention.dense.weight": "pytorch_model-00004-of-00005.bin",
493
+ "gpt_neox.layers.38.attention.masked_bias": "pytorch_model-00004-of-00005.bin",
494
+ "gpt_neox.layers.38.attention.query_key_value.bias": "pytorch_model-00004-of-00005.bin",
495
+ "gpt_neox.layers.38.attention.query_key_value.weight": "pytorch_model-00004-of-00005.bin",
496
+ "gpt_neox.layers.38.attention.rotary_emb.inv_freq": "pytorch_model-00004-of-00005.bin",
497
+ "gpt_neox.layers.38.input_layernorm.bias": "pytorch_model-00004-of-00005.bin",
498
+ "gpt_neox.layers.38.input_layernorm.weight": "pytorch_model-00004-of-00005.bin",
499
+ "gpt_neox.layers.38.mlp.dense_4h_to_h.bias": "pytorch_model-00004-of-00005.bin",
500
+ "gpt_neox.layers.38.mlp.dense_4h_to_h.weight": "pytorch_model-00004-of-00005.bin",
501
+ "gpt_neox.layers.38.mlp.dense_h_to_4h.bias": "pytorch_model-00004-of-00005.bin",
502
+ "gpt_neox.layers.38.mlp.dense_h_to_4h.weight": "pytorch_model-00004-of-00005.bin",
503
+ "gpt_neox.layers.38.post_attention_layernorm.bias": "pytorch_model-00004-of-00005.bin",
504
+ "gpt_neox.layers.38.post_attention_layernorm.weight": "pytorch_model-00004-of-00005.bin",
505
+ "gpt_neox.layers.39.attention.bias": "pytorch_model-00004-of-00005.bin",
506
+ "gpt_neox.layers.39.attention.dense.bias": "pytorch_model-00004-of-00005.bin",
507
+ "gpt_neox.layers.39.attention.dense.weight": "pytorch_model-00004-of-00005.bin",
508
+ "gpt_neox.layers.39.attention.masked_bias": "pytorch_model-00004-of-00005.bin",
509
+ "gpt_neox.layers.39.attention.query_key_value.bias": "pytorch_model-00004-of-00005.bin",
510
+ "gpt_neox.layers.39.attention.query_key_value.weight": "pytorch_model-00004-of-00005.bin",
511
+ "gpt_neox.layers.39.attention.rotary_emb.inv_freq": "pytorch_model-00004-of-00005.bin",
512
+ "gpt_neox.layers.39.input_layernorm.bias": "pytorch_model-00004-of-00005.bin",
513
+ "gpt_neox.layers.39.input_layernorm.weight": "pytorch_model-00004-of-00005.bin",
514
+ "gpt_neox.layers.39.mlp.dense_4h_to_h.bias": "pytorch_model-00004-of-00005.bin",
515
+ "gpt_neox.layers.39.mlp.dense_4h_to_h.weight": "pytorch_model-00004-of-00005.bin",
516
+ "gpt_neox.layers.39.mlp.dense_h_to_4h.bias": "pytorch_model-00004-of-00005.bin",
517
+ "gpt_neox.layers.39.mlp.dense_h_to_4h.weight": "pytorch_model-00004-of-00005.bin",
518
+ "gpt_neox.layers.39.post_attention_layernorm.bias": "pytorch_model-00004-of-00005.bin",
519
+ "gpt_neox.layers.39.post_attention_layernorm.weight": "pytorch_model-00004-of-00005.bin",
520
+ "gpt_neox.layers.4.attention.bias": "pytorch_model-00001-of-00005.bin",
521
+ "gpt_neox.layers.4.attention.dense.bias": "pytorch_model-00001-of-00005.bin",
522
+ "gpt_neox.layers.4.attention.dense.weight": "pytorch_model-00001-of-00005.bin",
523
+ "gpt_neox.layers.4.attention.masked_bias": "pytorch_model-00001-of-00005.bin",
524
+ "gpt_neox.layers.4.attention.query_key_value.bias": "pytorch_model-00001-of-00005.bin",
525
+ "gpt_neox.layers.4.attention.query_key_value.weight": "pytorch_model-00001-of-00005.bin",
526
+ "gpt_neox.layers.4.attention.rotary_emb.inv_freq": "pytorch_model-00001-of-00005.bin",
527
+ "gpt_neox.layers.4.input_layernorm.bias": "pytorch_model-00001-of-00005.bin",
528
+ "gpt_neox.layers.4.input_layernorm.weight": "pytorch_model-00001-of-00005.bin",
529
+ "gpt_neox.layers.4.mlp.dense_4h_to_h.bias": "pytorch_model-00001-of-00005.bin",
530
+ "gpt_neox.layers.4.mlp.dense_4h_to_h.weight": "pytorch_model-00001-of-00005.bin",
531
+ "gpt_neox.layers.4.mlp.dense_h_to_4h.bias": "pytorch_model-00001-of-00005.bin",
532
+ "gpt_neox.layers.4.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00005.bin",
533
+ "gpt_neox.layers.4.post_attention_layernorm.bias": "pytorch_model-00001-of-00005.bin",
534
+ "gpt_neox.layers.4.post_attention_layernorm.weight": "pytorch_model-00001-of-00005.bin",
535
+ "gpt_neox.layers.40.attention.bias": "pytorch_model-00004-of-00005.bin",
536
+ "gpt_neox.layers.40.attention.dense.bias": "pytorch_model-00004-of-00005.bin",
537
+ "gpt_neox.layers.40.attention.dense.weight": "pytorch_model-00004-of-00005.bin",
538
+ "gpt_neox.layers.40.attention.masked_bias": "pytorch_model-00004-of-00005.bin",
539
+ "gpt_neox.layers.40.attention.query_key_value.bias": "pytorch_model-00004-of-00005.bin",
540
+ "gpt_neox.layers.40.attention.query_key_value.weight": "pytorch_model-00004-of-00005.bin",
541
+ "gpt_neox.layers.40.attention.rotary_emb.inv_freq": "pytorch_model-00004-of-00005.bin",
542
+ "gpt_neox.layers.40.input_layernorm.bias": "pytorch_model-00004-of-00005.bin",
543
+ "gpt_neox.layers.40.input_layernorm.weight": "pytorch_model-00004-of-00005.bin",
544
+ "gpt_neox.layers.40.mlp.dense_4h_to_h.bias": "pytorch_model-00004-of-00005.bin",
545
+ "gpt_neox.layers.40.mlp.dense_4h_to_h.weight": "pytorch_model-00004-of-00005.bin",
546
+ "gpt_neox.layers.40.mlp.dense_h_to_4h.bias": "pytorch_model-00004-of-00005.bin",
547
+ "gpt_neox.layers.40.mlp.dense_h_to_4h.weight": "pytorch_model-00004-of-00005.bin",
548
+ "gpt_neox.layers.40.post_attention_layernorm.bias": "pytorch_model-00004-of-00005.bin",
549
+ "gpt_neox.layers.40.post_attention_layernorm.weight": "pytorch_model-00004-of-00005.bin",
550
+ "gpt_neox.layers.41.attention.bias": "pytorch_model-00004-of-00005.bin",
551
+ "gpt_neox.layers.41.attention.dense.bias": "pytorch_model-00004-of-00005.bin",
552
+ "gpt_neox.layers.41.attention.dense.weight": "pytorch_model-00004-of-00005.bin",
553
+ "gpt_neox.layers.41.attention.masked_bias": "pytorch_model-00004-of-00005.bin",
554
+ "gpt_neox.layers.41.attention.query_key_value.bias": "pytorch_model-00004-of-00005.bin",
555
+ "gpt_neox.layers.41.attention.query_key_value.weight": "pytorch_model-00004-of-00005.bin",
556
+ "gpt_neox.layers.41.attention.rotary_emb.inv_freq": "pytorch_model-00004-of-00005.bin",
557
+ "gpt_neox.layers.41.input_layernorm.bias": "pytorch_model-00004-of-00005.bin",
558
+ "gpt_neox.layers.41.input_layernorm.weight": "pytorch_model-00004-of-00005.bin",
559
+ "gpt_neox.layers.41.mlp.dense_4h_to_h.bias": "pytorch_model-00004-of-00005.bin",
560
+ "gpt_neox.layers.41.mlp.dense_4h_to_h.weight": "pytorch_model-00004-of-00005.bin",
561
+ "gpt_neox.layers.41.mlp.dense_h_to_4h.bias": "pytorch_model-00004-of-00005.bin",
562
+ "gpt_neox.layers.41.mlp.dense_h_to_4h.weight": "pytorch_model-00004-of-00005.bin",
563
+ "gpt_neox.layers.41.post_attention_layernorm.bias": "pytorch_model-00004-of-00005.bin",
564
+ "gpt_neox.layers.41.post_attention_layernorm.weight": "pytorch_model-00004-of-00005.bin",
565
+ "gpt_neox.layers.42.attention.bias": "pytorch_model-00004-of-00005.bin",
566
+ "gpt_neox.layers.42.attention.dense.bias": "pytorch_model-00004-of-00005.bin",
567
+ "gpt_neox.layers.42.attention.dense.weight": "pytorch_model-00004-of-00005.bin",
568
+ "gpt_neox.layers.42.attention.masked_bias": "pytorch_model-00004-of-00005.bin",
569
+ "gpt_neox.layers.42.attention.query_key_value.bias": "pytorch_model-00004-of-00005.bin",
570
+ "gpt_neox.layers.42.attention.query_key_value.weight": "pytorch_model-00004-of-00005.bin",
571
+ "gpt_neox.layers.42.attention.rotary_emb.inv_freq": "pytorch_model-00004-of-00005.bin",
572
+ "gpt_neox.layers.42.input_layernorm.bias": "pytorch_model-00004-of-00005.bin",
573
+ "gpt_neox.layers.42.input_layernorm.weight": "pytorch_model-00004-of-00005.bin",
574
+ "gpt_neox.layers.42.mlp.dense_4h_to_h.bias": "pytorch_model-00005-of-00005.bin",
575
+ "gpt_neox.layers.42.mlp.dense_4h_to_h.weight": "pytorch_model-00005-of-00005.bin",
576
+ "gpt_neox.layers.42.mlp.dense_h_to_4h.bias": "pytorch_model-00005-of-00005.bin",
577
+ "gpt_neox.layers.42.mlp.dense_h_to_4h.weight": "pytorch_model-00005-of-00005.bin",
578
+ "gpt_neox.layers.42.post_attention_layernorm.bias": "pytorch_model-00004-of-00005.bin",
579
+ "gpt_neox.layers.42.post_attention_layernorm.weight": "pytorch_model-00004-of-00005.bin",
580
+ "gpt_neox.layers.43.attention.bias": "pytorch_model-00005-of-00005.bin",
581
+ "gpt_neox.layers.43.attention.dense.bias": "pytorch_model-00005-of-00005.bin",
582
+ "gpt_neox.layers.43.attention.dense.weight": "pytorch_model-00005-of-00005.bin",
583
+ "gpt_neox.layers.43.attention.masked_bias": "pytorch_model-00005-of-00005.bin",
584
+ "gpt_neox.layers.43.attention.query_key_value.bias": "pytorch_model-00005-of-00005.bin",
585
+ "gpt_neox.layers.43.attention.query_key_value.weight": "pytorch_model-00005-of-00005.bin",
586
+ "gpt_neox.layers.43.attention.rotary_emb.inv_freq": "pytorch_model-00005-of-00005.bin",
587
+ "gpt_neox.layers.43.input_layernorm.bias": "pytorch_model-00005-of-00005.bin",
588
+ "gpt_neox.layers.43.input_layernorm.weight": "pytorch_model-00005-of-00005.bin",
589
+ "gpt_neox.layers.43.mlp.dense_4h_to_h.bias": "pytorch_model-00005-of-00005.bin",
590
+ "gpt_neox.layers.43.mlp.dense_4h_to_h.weight": "pytorch_model-00005-of-00005.bin",
591
+ "gpt_neox.layers.43.mlp.dense_h_to_4h.bias": "pytorch_model-00005-of-00005.bin",
592
+ "gpt_neox.layers.43.mlp.dense_h_to_4h.weight": "pytorch_model-00005-of-00005.bin",
593
+ "gpt_neox.layers.43.post_attention_layernorm.bias": "pytorch_model-00005-of-00005.bin",
594
+ "gpt_neox.layers.43.post_attention_layernorm.weight": "pytorch_model-00005-of-00005.bin",
595
+ "gpt_neox.layers.5.attention.bias": "pytorch_model-00001-of-00005.bin",
596
+ "gpt_neox.layers.5.attention.dense.bias": "pytorch_model-00001-of-00005.bin",
597
+ "gpt_neox.layers.5.attention.dense.weight": "pytorch_model-00001-of-00005.bin",
598
+ "gpt_neox.layers.5.attention.masked_bias": "pytorch_model-00001-of-00005.bin",
599
+ "gpt_neox.layers.5.attention.query_key_value.bias": "pytorch_model-00001-of-00005.bin",
600
+ "gpt_neox.layers.5.attention.query_key_value.weight": "pytorch_model-00001-of-00005.bin",
601
+ "gpt_neox.layers.5.attention.rotary_emb.inv_freq": "pytorch_model-00001-of-00005.bin",
602
+ "gpt_neox.layers.5.input_layernorm.bias": "pytorch_model-00001-of-00005.bin",
603
+ "gpt_neox.layers.5.input_layernorm.weight": "pytorch_model-00001-of-00005.bin",
604
+ "gpt_neox.layers.5.mlp.dense_4h_to_h.bias": "pytorch_model-00001-of-00005.bin",
605
+ "gpt_neox.layers.5.mlp.dense_4h_to_h.weight": "pytorch_model-00001-of-00005.bin",
606
+ "gpt_neox.layers.5.mlp.dense_h_to_4h.bias": "pytorch_model-00001-of-00005.bin",
607
+ "gpt_neox.layers.5.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00005.bin",
608
+ "gpt_neox.layers.5.post_attention_layernorm.bias": "pytorch_model-00001-of-00005.bin",
609
+ "gpt_neox.layers.5.post_attention_layernorm.weight": "pytorch_model-00001-of-00005.bin",
610
+ "gpt_neox.layers.6.attention.bias": "pytorch_model-00001-of-00005.bin",
611
+ "gpt_neox.layers.6.attention.dense.bias": "pytorch_model-00001-of-00005.bin",
612
+ "gpt_neox.layers.6.attention.dense.weight": "pytorch_model-00001-of-00005.bin",
613
+ "gpt_neox.layers.6.attention.masked_bias": "pytorch_model-00001-of-00005.bin",
614
+ "gpt_neox.layers.6.attention.query_key_value.bias": "pytorch_model-00001-of-00005.bin",
615
+ "gpt_neox.layers.6.attention.query_key_value.weight": "pytorch_model-00001-of-00005.bin",
616
+ "gpt_neox.layers.6.attention.rotary_emb.inv_freq": "pytorch_model-00001-of-00005.bin",
617
+ "gpt_neox.layers.6.input_layernorm.bias": "pytorch_model-00001-of-00005.bin",
618
+ "gpt_neox.layers.6.input_layernorm.weight": "pytorch_model-00001-of-00005.bin",
619
+ "gpt_neox.layers.6.mlp.dense_4h_to_h.bias": "pytorch_model-00001-of-00005.bin",
620
+ "gpt_neox.layers.6.mlp.dense_4h_to_h.weight": "pytorch_model-00001-of-00005.bin",
621
+ "gpt_neox.layers.6.mlp.dense_h_to_4h.bias": "pytorch_model-00001-of-00005.bin",
622
+ "gpt_neox.layers.6.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00005.bin",
623
+ "gpt_neox.layers.6.post_attention_layernorm.bias": "pytorch_model-00001-of-00005.bin",
624
+ "gpt_neox.layers.6.post_attention_layernorm.weight": "pytorch_model-00001-of-00005.bin",
625
+ "gpt_neox.layers.7.attention.bias": "pytorch_model-00001-of-00005.bin",
626
+ "gpt_neox.layers.7.attention.dense.bias": "pytorch_model-00001-of-00005.bin",
627
+ "gpt_neox.layers.7.attention.dense.weight": "pytorch_model-00001-of-00005.bin",
628
+ "gpt_neox.layers.7.attention.masked_bias": "pytorch_model-00001-of-00005.bin",
629
+ "gpt_neox.layers.7.attention.query_key_value.bias": "pytorch_model-00001-of-00005.bin",
630
+ "gpt_neox.layers.7.attention.query_key_value.weight": "pytorch_model-00001-of-00005.bin",
631
+ "gpt_neox.layers.7.attention.rotary_emb.inv_freq": "pytorch_model-00001-of-00005.bin",
632
+ "gpt_neox.layers.7.input_layernorm.bias": "pytorch_model-00001-of-00005.bin",
633
+ "gpt_neox.layers.7.input_layernorm.weight": "pytorch_model-00001-of-00005.bin",
634
+ "gpt_neox.layers.7.mlp.dense_4h_to_h.bias": "pytorch_model-00001-of-00005.bin",
635
+ "gpt_neox.layers.7.mlp.dense_4h_to_h.weight": "pytorch_model-00001-of-00005.bin",
636
+ "gpt_neox.layers.7.mlp.dense_h_to_4h.bias": "pytorch_model-00001-of-00005.bin",
637
+ "gpt_neox.layers.7.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00005.bin",
638
+ "gpt_neox.layers.7.post_attention_layernorm.bias": "pytorch_model-00001-of-00005.bin",
639
+ "gpt_neox.layers.7.post_attention_layernorm.weight": "pytorch_model-00001-of-00005.bin",
640
+ "gpt_neox.layers.8.attention.bias": "pytorch_model-00001-of-00005.bin",
641
+ "gpt_neox.layers.8.attention.dense.bias": "pytorch_model-00001-of-00005.bin",
642
+ "gpt_neox.layers.8.attention.dense.weight": "pytorch_model-00001-of-00005.bin",
643
+ "gpt_neox.layers.8.attention.masked_bias": "pytorch_model-00001-of-00005.bin",
644
+ "gpt_neox.layers.8.attention.query_key_value.bias": "pytorch_model-00001-of-00005.bin",
645
+ "gpt_neox.layers.8.attention.query_key_value.weight": "pytorch_model-00001-of-00005.bin",
646
+ "gpt_neox.layers.8.attention.rotary_emb.inv_freq": "pytorch_model-00001-of-00005.bin",
647
+ "gpt_neox.layers.8.input_layernorm.bias": "pytorch_model-00001-of-00005.bin",
648
+ "gpt_neox.layers.8.input_layernorm.weight": "pytorch_model-00001-of-00005.bin",
649
+ "gpt_neox.layers.8.mlp.dense_4h_to_h.bias": "pytorch_model-00001-of-00005.bin",
650
+ "gpt_neox.layers.8.mlp.dense_4h_to_h.weight": "pytorch_model-00001-of-00005.bin",
651
+ "gpt_neox.layers.8.mlp.dense_h_to_4h.bias": "pytorch_model-00001-of-00005.bin",
652
+ "gpt_neox.layers.8.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00005.bin",
653
+ "gpt_neox.layers.8.post_attention_layernorm.bias": "pytorch_model-00001-of-00005.bin",
654
+ "gpt_neox.layers.8.post_attention_layernorm.weight": "pytorch_model-00001-of-00005.bin",
655
+ "gpt_neox.layers.9.attention.bias": "pytorch_model-00001-of-00005.bin",
656
+ "gpt_neox.layers.9.attention.dense.bias": "pytorch_model-00001-of-00005.bin",
657
+ "gpt_neox.layers.9.attention.dense.weight": "pytorch_model-00001-of-00005.bin",
658
+ "gpt_neox.layers.9.attention.masked_bias": "pytorch_model-00001-of-00005.bin",
659
+ "gpt_neox.layers.9.attention.query_key_value.bias": "pytorch_model-00001-of-00005.bin",
660
+ "gpt_neox.layers.9.attention.query_key_value.weight": "pytorch_model-00001-of-00005.bin",
661
+ "gpt_neox.layers.9.attention.rotary_emb.inv_freq": "pytorch_model-00001-of-00005.bin",
662
+ "gpt_neox.layers.9.input_layernorm.bias": "pytorch_model-00001-of-00005.bin",
663
+ "gpt_neox.layers.9.input_layernorm.weight": "pytorch_model-00001-of-00005.bin",
664
+ "gpt_neox.layers.9.mlp.dense_4h_to_h.bias": "pytorch_model-00001-of-00005.bin",
665
+ "gpt_neox.layers.9.mlp.dense_4h_to_h.weight": "pytorch_model-00001-of-00005.bin",
666
+ "gpt_neox.layers.9.mlp.dense_h_to_4h.bias": "pytorch_model-00001-of-00005.bin",
667
+ "gpt_neox.layers.9.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00005.bin",
668
+ "gpt_neox.layers.9.post_attention_layernorm.bias": "pytorch_model-00001-of-00005.bin",
669
+ "gpt_neox.layers.9.post_attention_layernorm.weight": "pytorch_model-00001-of-00005.bin"
670
+ }
671
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ {
2
+ "bos_token": "<|endoftext|>",
3
+ "eos_token": "<|endoftext|>",
4
+ "unk_token": "<|endoftext|>"
5
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "bos_token": "<|endoftext|>",
4
+ "eos_token": "<|endoftext|>",
5
+ "name_or_path": "EleutherAI/gpt-neox-20b",
6
+ "special_tokens_map_file": "/root/.cache/huggingface/transformers/d9026dc928c47ac2a72d46fea7db959acc4bacac2176bf32be5c331604b77d32.3ae9ae72462581d20e36bc528e9c47bb30cd671bb21add40ca0b24a0be9fac22",
7
+ "tokenizer_class": "GPTNeoXTokenizer",
8
+ "unk_token": "<|endoftext|>"
9
+ }