CharlieFRuan commited on
Commit
49c556b
1 Parent(s): c5b2e77

Initial commit

Browse files
This view is limited to 50 files because it contains too many changes.   See raw diff
added_tokens.json ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ {
2
+ "<|im_end|>": 32002,
3
+ "<|im_start|>": 32001,
4
+ "[PAD]": 32000
5
+ }
logs.txt ADDED
The diff for this file is too large to render. See raw diff
 
mlc-chat-config.json ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_type": "llama",
3
+ "quantization": "q0f32",
4
+ "model_config": {
5
+ "hidden_size": 2048,
6
+ "intermediate_size": 5632,
7
+ "num_attention_heads": 32,
8
+ "num_hidden_layers": 22,
9
+ "rms_norm_eps": 1e-05,
10
+ "vocab_size": 32003,
11
+ "position_embedding_base": 10000.0,
12
+ "context_window_size": 2048,
13
+ "prefill_chunk_size": 2048,
14
+ "num_key_value_heads": 4,
15
+ "head_dim": 64,
16
+ "tensor_parallel_shards": 1,
17
+ "max_batch_size": 1
18
+ },
19
+ "vocab_size": 32003,
20
+ "context_window_size": 2048,
21
+ "sliding_window_size": -1,
22
+ "prefill_chunk_size": 2048,
23
+ "attention_sink_size": -1,
24
+ "tensor_parallel_shards": 1,
25
+ "max_batch_size": 80,
26
+ "mean_gen_len": 128,
27
+ "max_gen_len": 512,
28
+ "shift_fill_factor": 0.3,
29
+ "temperature": 0.7,
30
+ "repetition_penalty": 1.0,
31
+ "top_p": 0.95,
32
+ "conv_template": "chatml",
33
+ "pad_token_id": 0,
34
+ "bos_token_id": 1,
35
+ "eos_token_id": 2,
36
+ "tokenizer_files": [
37
+ "tokenizer.model",
38
+ "tokenizer.json",
39
+ "added_tokens.json",
40
+ "tokenizer_config.json"
41
+ ],
42
+ "version": "0.1.0"
43
+ }
ndarray-cache-b16.json ADDED
@@ -0,0 +1,1905 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "metadata": {
3
+ "ParamSize": 135,
4
+ "ParamBytes": 4400242688.0,
5
+ "BitsPerParam": 32.0
6
+ },
7
+ "records": [
8
+ {
9
+ "dataPath": "params_shard_0.bin",
10
+ "format": "raw-shard",
11
+ "nbytes": 131084288,
12
+ "records": [
13
+ {
14
+ "name": "model.embed_tokens.weight",
15
+ "shape": [
16
+ 32003,
17
+ 2048
18
+ ],
19
+ "dtype": "bfloat16",
20
+ "format": "raw",
21
+ "nbytes": 131084288,
22
+ "byteOffset": 0
23
+ }
24
+ ],
25
+ "md5sum": "cc0ed85be2a8c0317b9701355b06d552"
26
+ },
27
+ {
28
+ "dataPath": "params_shard_1.bin",
29
+ "format": "raw-shard",
30
+ "nbytes": 46137344,
31
+ "records": [
32
+ {
33
+ "name": "model.layers.0.mlp.gate_up_proj.weight",
34
+ "shape": [
35
+ 11264,
36
+ 2048
37
+ ],
38
+ "dtype": "bfloat16",
39
+ "format": "raw",
40
+ "nbytes": 46137344,
41
+ "byteOffset": 0
42
+ }
43
+ ],
44
+ "md5sum": "3dd515cad2c62e6956cb338bfada11e0"
45
+ },
46
+ {
47
+ "dataPath": "params_shard_2.bin",
48
+ "format": "raw-shard",
49
+ "nbytes": 23068672,
50
+ "records": [
51
+ {
52
+ "name": "model.layers.0.mlp.down_proj.weight",
53
+ "shape": [
54
+ 2048,
55
+ 5632
56
+ ],
57
+ "dtype": "bfloat16",
58
+ "format": "raw",
59
+ "nbytes": 23068672,
60
+ "byteOffset": 0
61
+ }
62
+ ],
63
+ "md5sum": "1af86eb61f0d2fa2466465dea8dbc9ea"
64
+ },
65
+ {
66
+ "dataPath": "params_shard_3.bin",
67
+ "format": "raw-shard",
68
+ "nbytes": 29368320,
69
+ "records": [
70
+ {
71
+ "name": "model.layers.0.self_attn.qkv_proj.weight",
72
+ "shape": [
73
+ 2560,
74
+ 2048
75
+ ],
76
+ "dtype": "bfloat16",
77
+ "format": "raw",
78
+ "nbytes": 10485760,
79
+ "byteOffset": 0
80
+ },
81
+ {
82
+ "name": "model.layers.0.self_attn.o_proj.weight",
83
+ "shape": [
84
+ 2048,
85
+ 2048
86
+ ],
87
+ "dtype": "bfloat16",
88
+ "format": "raw",
89
+ "nbytes": 8388608,
90
+ "byteOffset": 10485760
91
+ },
92
+ {
93
+ "name": "model.layers.0.input_layernorm.weight",
94
+ "shape": [
95
+ 2048
96
+ ],
97
+ "dtype": "bfloat16",
98
+ "format": "raw",
99
+ "nbytes": 4096,
100
+ "byteOffset": 18874368
101
+ },
102
+ {
103
+ "name": "model.layers.0.post_attention_layernorm.weight",
104
+ "shape": [
105
+ 2048
106
+ ],
107
+ "dtype": "bfloat16",
108
+ "format": "raw",
109
+ "nbytes": 4096,
110
+ "byteOffset": 18878464
111
+ },
112
+ {
113
+ "name": "model.layers.1.self_attn.qkv_proj.weight",
114
+ "shape": [
115
+ 2560,
116
+ 2048
117
+ ],
118
+ "dtype": "bfloat16",
119
+ "format": "raw",
120
+ "nbytes": 10485760,
121
+ "byteOffset": 18882560
122
+ }
123
+ ],
124
+ "md5sum": "f9bed0391d17e40343e2c90c45fff822"
125
+ },
126
+ {
127
+ "dataPath": "params_shard_4.bin",
128
+ "format": "raw-shard",
129
+ "nbytes": 46137344,
130
+ "records": [
131
+ {
132
+ "name": "model.layers.1.mlp.gate_up_proj.weight",
133
+ "shape": [
134
+ 11264,
135
+ 2048
136
+ ],
137
+ "dtype": "bfloat16",
138
+ "format": "raw",
139
+ "nbytes": 46137344,
140
+ "byteOffset": 0
141
+ }
142
+ ],
143
+ "md5sum": "97d2718af5ea9779f503379c2556e516"
144
+ },
145
+ {
146
+ "dataPath": "params_shard_5.bin",
147
+ "format": "raw-shard",
148
+ "nbytes": 31465472,
149
+ "records": [
150
+ {
151
+ "name": "model.layers.1.self_attn.o_proj.weight",
152
+ "shape": [
153
+ 2048,
154
+ 2048
155
+ ],
156
+ "dtype": "bfloat16",
157
+ "format": "raw",
158
+ "nbytes": 8388608,
159
+ "byteOffset": 0
160
+ },
161
+ {
162
+ "name": "model.layers.1.mlp.down_proj.weight",
163
+ "shape": [
164
+ 2048,
165
+ 5632
166
+ ],
167
+ "dtype": "bfloat16",
168
+ "format": "raw",
169
+ "nbytes": 23068672,
170
+ "byteOffset": 8388608
171
+ },
172
+ {
173
+ "name": "model.layers.1.input_layernorm.weight",
174
+ "shape": [
175
+ 2048
176
+ ],
177
+ "dtype": "bfloat16",
178
+ "format": "raw",
179
+ "nbytes": 4096,
180
+ "byteOffset": 31457280
181
+ },
182
+ {
183
+ "name": "model.layers.1.post_attention_layernorm.weight",
184
+ "shape": [
185
+ 2048
186
+ ],
187
+ "dtype": "bfloat16",
188
+ "format": "raw",
189
+ "nbytes": 4096,
190
+ "byteOffset": 31461376
191
+ }
192
+ ],
193
+ "md5sum": "4335e0565f1c118752b5d18bf96f239f"
194
+ },
195
+ {
196
+ "dataPath": "params_shard_6.bin",
197
+ "format": "raw-shard",
198
+ "nbytes": 46137344,
199
+ "records": [
200
+ {
201
+ "name": "model.layers.2.mlp.gate_up_proj.weight",
202
+ "shape": [
203
+ 11264,
204
+ 2048
205
+ ],
206
+ "dtype": "bfloat16",
207
+ "format": "raw",
208
+ "nbytes": 46137344,
209
+ "byteOffset": 0
210
+ }
211
+ ],
212
+ "md5sum": "5d59c33ffddda7bcca41c5c9ad65db51"
213
+ },
214
+ {
215
+ "dataPath": "params_shard_7.bin",
216
+ "format": "raw-shard",
217
+ "nbytes": 23068672,
218
+ "records": [
219
+ {
220
+ "name": "model.layers.2.mlp.down_proj.weight",
221
+ "shape": [
222
+ 2048,
223
+ 5632
224
+ ],
225
+ "dtype": "bfloat16",
226
+ "format": "raw",
227
+ "nbytes": 23068672,
228
+ "byteOffset": 0
229
+ }
230
+ ],
231
+ "md5sum": "225dd6d9dd691ca0defb95bf03045fcd"
232
+ },
233
+ {
234
+ "dataPath": "params_shard_8.bin",
235
+ "format": "raw-shard",
236
+ "nbytes": 29368320,
237
+ "records": [
238
+ {
239
+ "name": "model.layers.2.self_attn.qkv_proj.weight",
240
+ "shape": [
241
+ 2560,
242
+ 2048
243
+ ],
244
+ "dtype": "bfloat16",
245
+ "format": "raw",
246
+ "nbytes": 10485760,
247
+ "byteOffset": 0
248
+ },
249
+ {
250
+ "name": "model.layers.2.self_attn.o_proj.weight",
251
+ "shape": [
252
+ 2048,
253
+ 2048
254
+ ],
255
+ "dtype": "bfloat16",
256
+ "format": "raw",
257
+ "nbytes": 8388608,
258
+ "byteOffset": 10485760
259
+ },
260
+ {
261
+ "name": "model.layers.2.input_layernorm.weight",
262
+ "shape": [
263
+ 2048
264
+ ],
265
+ "dtype": "bfloat16",
266
+ "format": "raw",
267
+ "nbytes": 4096,
268
+ "byteOffset": 18874368
269
+ },
270
+ {
271
+ "name": "model.layers.2.post_attention_layernorm.weight",
272
+ "shape": [
273
+ 2048
274
+ ],
275
+ "dtype": "bfloat16",
276
+ "format": "raw",
277
+ "nbytes": 4096,
278
+ "byteOffset": 18878464
279
+ },
280
+ {
281
+ "name": "model.layers.3.self_attn.qkv_proj.weight",
282
+ "shape": [
283
+ 2560,
284
+ 2048
285
+ ],
286
+ "dtype": "bfloat16",
287
+ "format": "raw",
288
+ "nbytes": 10485760,
289
+ "byteOffset": 18882560
290
+ }
291
+ ],
292
+ "md5sum": "24956e7e3440d13c787912f075b91808"
293
+ },
294
+ {
295
+ "dataPath": "params_shard_9.bin",
296
+ "format": "raw-shard",
297
+ "nbytes": 46137344,
298
+ "records": [
299
+ {
300
+ "name": "model.layers.3.mlp.gate_up_proj.weight",
301
+ "shape": [
302
+ 11264,
303
+ 2048
304
+ ],
305
+ "dtype": "bfloat16",
306
+ "format": "raw",
307
+ "nbytes": 46137344,
308
+ "byteOffset": 0
309
+ }
310
+ ],
311
+ "md5sum": "ccf10355f04707b6ea13fe1905848786"
312
+ },
313
+ {
314
+ "dataPath": "params_shard_10.bin",
315
+ "format": "raw-shard",
316
+ "nbytes": 31465472,
317
+ "records": [
318
+ {
319
+ "name": "model.layers.3.self_attn.o_proj.weight",
320
+ "shape": [
321
+ 2048,
322
+ 2048
323
+ ],
324
+ "dtype": "bfloat16",
325
+ "format": "raw",
326
+ "nbytes": 8388608,
327
+ "byteOffset": 0
328
+ },
329
+ {
330
+ "name": "model.layers.3.mlp.down_proj.weight",
331
+ "shape": [
332
+ 2048,
333
+ 5632
334
+ ],
335
+ "dtype": "bfloat16",
336
+ "format": "raw",
337
+ "nbytes": 23068672,
338
+ "byteOffset": 8388608
339
+ },
340
+ {
341
+ "name": "model.layers.3.input_layernorm.weight",
342
+ "shape": [
343
+ 2048
344
+ ],
345
+ "dtype": "bfloat16",
346
+ "format": "raw",
347
+ "nbytes": 4096,
348
+ "byteOffset": 31457280
349
+ },
350
+ {
351
+ "name": "model.layers.3.post_attention_layernorm.weight",
352
+ "shape": [
353
+ 2048
354
+ ],
355
+ "dtype": "bfloat16",
356
+ "format": "raw",
357
+ "nbytes": 4096,
358
+ "byteOffset": 31461376
359
+ }
360
+ ],
361
+ "md5sum": "63a5e91ead395e5212dec520fe6eb3b9"
362
+ },
363
+ {
364
+ "dataPath": "params_shard_11.bin",
365
+ "format": "raw-shard",
366
+ "nbytes": 46137344,
367
+ "records": [
368
+ {
369
+ "name": "model.layers.4.mlp.gate_up_proj.weight",
370
+ "shape": [
371
+ 11264,
372
+ 2048
373
+ ],
374
+ "dtype": "bfloat16",
375
+ "format": "raw",
376
+ "nbytes": 46137344,
377
+ "byteOffset": 0
378
+ }
379
+ ],
380
+ "md5sum": "bd1768bd70d7a687d57c6eb5ad700f98"
381
+ },
382
+ {
383
+ "dataPath": "params_shard_12.bin",
384
+ "format": "raw-shard",
385
+ "nbytes": 23068672,
386
+ "records": [
387
+ {
388
+ "name": "model.layers.4.mlp.down_proj.weight",
389
+ "shape": [
390
+ 2048,
391
+ 5632
392
+ ],
393
+ "dtype": "bfloat16",
394
+ "format": "raw",
395
+ "nbytes": 23068672,
396
+ "byteOffset": 0
397
+ }
398
+ ],
399
+ "md5sum": "378e8ccbebfe1c7c5a6abd44e17a4b33"
400
+ },
401
+ {
402
+ "dataPath": "params_shard_13.bin",
403
+ "format": "raw-shard",
404
+ "nbytes": 29368320,
405
+ "records": [
406
+ {
407
+ "name": "model.layers.4.self_attn.qkv_proj.weight",
408
+ "shape": [
409
+ 2560,
410
+ 2048
411
+ ],
412
+ "dtype": "bfloat16",
413
+ "format": "raw",
414
+ "nbytes": 10485760,
415
+ "byteOffset": 0
416
+ },
417
+ {
418
+ "name": "model.layers.4.self_attn.o_proj.weight",
419
+ "shape": [
420
+ 2048,
421
+ 2048
422
+ ],
423
+ "dtype": "bfloat16",
424
+ "format": "raw",
425
+ "nbytes": 8388608,
426
+ "byteOffset": 10485760
427
+ },
428
+ {
429
+ "name": "model.layers.4.input_layernorm.weight",
430
+ "shape": [
431
+ 2048
432
+ ],
433
+ "dtype": "bfloat16",
434
+ "format": "raw",
435
+ "nbytes": 4096,
436
+ "byteOffset": 18874368
437
+ },
438
+ {
439
+ "name": "model.layers.4.post_attention_layernorm.weight",
440
+ "shape": [
441
+ 2048
442
+ ],
443
+ "dtype": "bfloat16",
444
+ "format": "raw",
445
+ "nbytes": 4096,
446
+ "byteOffset": 18878464
447
+ },
448
+ {
449
+ "name": "model.layers.5.self_attn.qkv_proj.weight",
450
+ "shape": [
451
+ 2560,
452
+ 2048
453
+ ],
454
+ "dtype": "bfloat16",
455
+ "format": "raw",
456
+ "nbytes": 10485760,
457
+ "byteOffset": 18882560
458
+ }
459
+ ],
460
+ "md5sum": "69ae4f8c3d765f016cc84e46902a5a61"
461
+ },
462
+ {
463
+ "dataPath": "params_shard_14.bin",
464
+ "format": "raw-shard",
465
+ "nbytes": 46137344,
466
+ "records": [
467
+ {
468
+ "name": "model.layers.5.mlp.gate_up_proj.weight",
469
+ "shape": [
470
+ 11264,
471
+ 2048
472
+ ],
473
+ "dtype": "bfloat16",
474
+ "format": "raw",
475
+ "nbytes": 46137344,
476
+ "byteOffset": 0
477
+ }
478
+ ],
479
+ "md5sum": "21882b96e7a6182502df1c8409f5ecf4"
480
+ },
481
+ {
482
+ "dataPath": "params_shard_15.bin",
483
+ "format": "raw-shard",
484
+ "nbytes": 31465472,
485
+ "records": [
486
+ {
487
+ "name": "model.layers.5.self_attn.o_proj.weight",
488
+ "shape": [
489
+ 2048,
490
+ 2048
491
+ ],
492
+ "dtype": "bfloat16",
493
+ "format": "raw",
494
+ "nbytes": 8388608,
495
+ "byteOffset": 0
496
+ },
497
+ {
498
+ "name": "model.layers.5.mlp.down_proj.weight",
499
+ "shape": [
500
+ 2048,
501
+ 5632
502
+ ],
503
+ "dtype": "bfloat16",
504
+ "format": "raw",
505
+ "nbytes": 23068672,
506
+ "byteOffset": 8388608
507
+ },
508
+ {
509
+ "name": "model.layers.5.input_layernorm.weight",
510
+ "shape": [
511
+ 2048
512
+ ],
513
+ "dtype": "bfloat16",
514
+ "format": "raw",
515
+ "nbytes": 4096,
516
+ "byteOffset": 31457280
517
+ },
518
+ {
519
+ "name": "model.layers.5.post_attention_layernorm.weight",
520
+ "shape": [
521
+ 2048
522
+ ],
523
+ "dtype": "bfloat16",
524
+ "format": "raw",
525
+ "nbytes": 4096,
526
+ "byteOffset": 31461376
527
+ }
528
+ ],
529
+ "md5sum": "94fb7197266794b1d62a683e6f83a8f5"
530
+ },
531
+ {
532
+ "dataPath": "params_shard_16.bin",
533
+ "format": "raw-shard",
534
+ "nbytes": 46137344,
535
+ "records": [
536
+ {
537
+ "name": "model.layers.6.mlp.gate_up_proj.weight",
538
+ "shape": [
539
+ 11264,
540
+ 2048
541
+ ],
542
+ "dtype": "bfloat16",
543
+ "format": "raw",
544
+ "nbytes": 46137344,
545
+ "byteOffset": 0
546
+ }
547
+ ],
548
+ "md5sum": "bea00f9a26131c903821b452f4d9fea1"
549
+ },
550
+ {
551
+ "dataPath": "params_shard_17.bin",
552
+ "format": "raw-shard",
553
+ "nbytes": 23068672,
554
+ "records": [
555
+ {
556
+ "name": "model.layers.6.mlp.down_proj.weight",
557
+ "shape": [
558
+ 2048,
559
+ 5632
560
+ ],
561
+ "dtype": "bfloat16",
562
+ "format": "raw",
563
+ "nbytes": 23068672,
564
+ "byteOffset": 0
565
+ }
566
+ ],
567
+ "md5sum": "233adc3d89e9fd0e63e8c024f2656d18"
568
+ },
569
+ {
570
+ "dataPath": "params_shard_18.bin",
571
+ "format": "raw-shard",
572
+ "nbytes": 29368320,
573
+ "records": [
574
+ {
575
+ "name": "model.layers.6.self_attn.qkv_proj.weight",
576
+ "shape": [
577
+ 2560,
578
+ 2048
579
+ ],
580
+ "dtype": "bfloat16",
581
+ "format": "raw",
582
+ "nbytes": 10485760,
583
+ "byteOffset": 0
584
+ },
585
+ {
586
+ "name": "model.layers.6.self_attn.o_proj.weight",
587
+ "shape": [
588
+ 2048,
589
+ 2048
590
+ ],
591
+ "dtype": "bfloat16",
592
+ "format": "raw",
593
+ "nbytes": 8388608,
594
+ "byteOffset": 10485760
595
+ },
596
+ {
597
+ "name": "model.layers.6.input_layernorm.weight",
598
+ "shape": [
599
+ 2048
600
+ ],
601
+ "dtype": "bfloat16",
602
+ "format": "raw",
603
+ "nbytes": 4096,
604
+ "byteOffset": 18874368
605
+ },
606
+ {
607
+ "name": "model.layers.6.post_attention_layernorm.weight",
608
+ "shape": [
609
+ 2048
610
+ ],
611
+ "dtype": "bfloat16",
612
+ "format": "raw",
613
+ "nbytes": 4096,
614
+ "byteOffset": 18878464
615
+ },
616
+ {
617
+ "name": "model.layers.7.self_attn.qkv_proj.weight",
618
+ "shape": [
619
+ 2560,
620
+ 2048
621
+ ],
622
+ "dtype": "bfloat16",
623
+ "format": "raw",
624
+ "nbytes": 10485760,
625
+ "byteOffset": 18882560
626
+ }
627
+ ],
628
+ "md5sum": "abbe67b5d794b62799b3bcbdaf1d2f13"
629
+ },
630
+ {
631
+ "dataPath": "params_shard_19.bin",
632
+ "format": "raw-shard",
633
+ "nbytes": 46137344,
634
+ "records": [
635
+ {
636
+ "name": "model.layers.7.mlp.gate_up_proj.weight",
637
+ "shape": [
638
+ 11264,
639
+ 2048
640
+ ],
641
+ "dtype": "bfloat16",
642
+ "format": "raw",
643
+ "nbytes": 46137344,
644
+ "byteOffset": 0
645
+ }
646
+ ],
647
+ "md5sum": "b9925f234796242b06783b10faa4f066"
648
+ },
649
+ {
650
+ "dataPath": "params_shard_20.bin",
651
+ "format": "raw-shard",
652
+ "nbytes": 31465472,
653
+ "records": [
654
+ {
655
+ "name": "model.layers.7.self_attn.o_proj.weight",
656
+ "shape": [
657
+ 2048,
658
+ 2048
659
+ ],
660
+ "dtype": "bfloat16",
661
+ "format": "raw",
662
+ "nbytes": 8388608,
663
+ "byteOffset": 0
664
+ },
665
+ {
666
+ "name": "model.layers.7.mlp.down_proj.weight",
667
+ "shape": [
668
+ 2048,
669
+ 5632
670
+ ],
671
+ "dtype": "bfloat16",
672
+ "format": "raw",
673
+ "nbytes": 23068672,
674
+ "byteOffset": 8388608
675
+ },
676
+ {
677
+ "name": "model.layers.7.input_layernorm.weight",
678
+ "shape": [
679
+ 2048
680
+ ],
681
+ "dtype": "bfloat16",
682
+ "format": "raw",
683
+ "nbytes": 4096,
684
+ "byteOffset": 31457280
685
+ },
686
+ {
687
+ "name": "model.layers.7.post_attention_layernorm.weight",
688
+ "shape": [
689
+ 2048
690
+ ],
691
+ "dtype": "bfloat16",
692
+ "format": "raw",
693
+ "nbytes": 4096,
694
+ "byteOffset": 31461376
695
+ }
696
+ ],
697
+ "md5sum": "449553c5c774c9612c71db8e6fe0408e"
698
+ },
699
+ {
700
+ "dataPath": "params_shard_21.bin",
701
+ "format": "raw-shard",
702
+ "nbytes": 46137344,
703
+ "records": [
704
+ {
705
+ "name": "model.layers.8.mlp.gate_up_proj.weight",
706
+ "shape": [
707
+ 11264,
708
+ 2048
709
+ ],
710
+ "dtype": "bfloat16",
711
+ "format": "raw",
712
+ "nbytes": 46137344,
713
+ "byteOffset": 0
714
+ }
715
+ ],
716
+ "md5sum": "2d45e9f3e8159430be240362d67ad6af"
717
+ },
718
+ {
719
+ "dataPath": "params_shard_22.bin",
720
+ "format": "raw-shard",
721
+ "nbytes": 23068672,
722
+ "records": [
723
+ {
724
+ "name": "model.layers.8.mlp.down_proj.weight",
725
+ "shape": [
726
+ 2048,
727
+ 5632
728
+ ],
729
+ "dtype": "bfloat16",
730
+ "format": "raw",
731
+ "nbytes": 23068672,
732
+ "byteOffset": 0
733
+ }
734
+ ],
735
+ "md5sum": "62d066a0af5c29b2e152d1f0e1fa83a5"
736
+ },
737
+ {
738
+ "dataPath": "params_shard_23.bin",
739
+ "format": "raw-shard",
740
+ "nbytes": 29368320,
741
+ "records": [
742
+ {
743
+ "name": "model.layers.8.self_attn.qkv_proj.weight",
744
+ "shape": [
745
+ 2560,
746
+ 2048
747
+ ],
748
+ "dtype": "bfloat16",
749
+ "format": "raw",
750
+ "nbytes": 10485760,
751
+ "byteOffset": 0
752
+ },
753
+ {
754
+ "name": "model.layers.8.self_attn.o_proj.weight",
755
+ "shape": [
756
+ 2048,
757
+ 2048
758
+ ],
759
+ "dtype": "bfloat16",
760
+ "format": "raw",
761
+ "nbytes": 8388608,
762
+ "byteOffset": 10485760
763
+ },
764
+ {
765
+ "name": "model.layers.8.input_layernorm.weight",
766
+ "shape": [
767
+ 2048
768
+ ],
769
+ "dtype": "bfloat16",
770
+ "format": "raw",
771
+ "nbytes": 4096,
772
+ "byteOffset": 18874368
773
+ },
774
+ {
775
+ "name": "model.layers.8.post_attention_layernorm.weight",
776
+ "shape": [
777
+ 2048
778
+ ],
779
+ "dtype": "bfloat16",
780
+ "format": "raw",
781
+ "nbytes": 4096,
782
+ "byteOffset": 18878464
783
+ },
784
+ {
785
+ "name": "model.layers.9.self_attn.qkv_proj.weight",
786
+ "shape": [
787
+ 2560,
788
+ 2048
789
+ ],
790
+ "dtype": "bfloat16",
791
+ "format": "raw",
792
+ "nbytes": 10485760,
793
+ "byteOffset": 18882560
794
+ }
795
+ ],
796
+ "md5sum": "7bc70a5644474679d9b8c803f86803e6"
797
+ },
798
+ {
799
+ "dataPath": "params_shard_24.bin",
800
+ "format": "raw-shard",
801
+ "nbytes": 46137344,
802
+ "records": [
803
+ {
804
+ "name": "model.layers.9.mlp.gate_up_proj.weight",
805
+ "shape": [
806
+ 11264,
807
+ 2048
808
+ ],
809
+ "dtype": "bfloat16",
810
+ "format": "raw",
811
+ "nbytes": 46137344,
812
+ "byteOffset": 0
813
+ }
814
+ ],
815
+ "md5sum": "a385c498afbccecefdb8faefc328a383"
816
+ },
817
+ {
818
+ "dataPath": "params_shard_25.bin",
819
+ "format": "raw-shard",
820
+ "nbytes": 31465472,
821
+ "records": [
822
+ {
823
+ "name": "model.layers.9.self_attn.o_proj.weight",
824
+ "shape": [
825
+ 2048,
826
+ 2048
827
+ ],
828
+ "dtype": "bfloat16",
829
+ "format": "raw",
830
+ "nbytes": 8388608,
831
+ "byteOffset": 0
832
+ },
833
+ {
834
+ "name": "model.layers.9.mlp.down_proj.weight",
835
+ "shape": [
836
+ 2048,
837
+ 5632
838
+ ],
839
+ "dtype": "bfloat16",
840
+ "format": "raw",
841
+ "nbytes": 23068672,
842
+ "byteOffset": 8388608
843
+ },
844
+ {
845
+ "name": "model.layers.9.input_layernorm.weight",
846
+ "shape": [
847
+ 2048
848
+ ],
849
+ "dtype": "bfloat16",
850
+ "format": "raw",
851
+ "nbytes": 4096,
852
+ "byteOffset": 31457280
853
+ },
854
+ {
855
+ "name": "model.layers.9.post_attention_layernorm.weight",
856
+ "shape": [
857
+ 2048
858
+ ],
859
+ "dtype": "bfloat16",
860
+ "format": "raw",
861
+ "nbytes": 4096,
862
+ "byteOffset": 31461376
863
+ }
864
+ ],
865
+ "md5sum": "482f31eacf412fc7c13d5fef45ef2909"
866
+ },
867
+ {
868
+ "dataPath": "params_shard_26.bin",
869
+ "format": "raw-shard",
870
+ "nbytes": 46137344,
871
+ "records": [
872
+ {
873
+ "name": "model.layers.10.mlp.gate_up_proj.weight",
874
+ "shape": [
875
+ 11264,
876
+ 2048
877
+ ],
878
+ "dtype": "bfloat16",
879
+ "format": "raw",
880
+ "nbytes": 46137344,
881
+ "byteOffset": 0
882
+ }
883
+ ],
884
+ "md5sum": "850943605e71e9be8f0691c3b55959d3"
885
+ },
886
+ {
887
+ "dataPath": "params_shard_27.bin",
888
+ "format": "raw-shard",
889
+ "nbytes": 23068672,
890
+ "records": [
891
+ {
892
+ "name": "model.layers.10.mlp.down_proj.weight",
893
+ "shape": [
894
+ 2048,
895
+ 5632
896
+ ],
897
+ "dtype": "bfloat16",
898
+ "format": "raw",
899
+ "nbytes": 23068672,
900
+ "byteOffset": 0
901
+ }
902
+ ],
903
+ "md5sum": "9207fc9eb5cd6be8fac65a1c7685ff73"
904
+ },
905
+ {
906
+ "dataPath": "params_shard_28.bin",
907
+ "format": "raw-shard",
908
+ "nbytes": 29368320,
909
+ "records": [
910
+ {
911
+ "name": "model.layers.10.self_attn.qkv_proj.weight",
912
+ "shape": [
913
+ 2560,
914
+ 2048
915
+ ],
916
+ "dtype": "bfloat16",
917
+ "format": "raw",
918
+ "nbytes": 10485760,
919
+ "byteOffset": 0
920
+ },
921
+ {
922
+ "name": "model.layers.10.self_attn.o_proj.weight",
923
+ "shape": [
924
+ 2048,
925
+ 2048
926
+ ],
927
+ "dtype": "bfloat16",
928
+ "format": "raw",
929
+ "nbytes": 8388608,
930
+ "byteOffset": 10485760
931
+ },
932
+ {
933
+ "name": "model.layers.10.input_layernorm.weight",
934
+ "shape": [
935
+ 2048
936
+ ],
937
+ "dtype": "bfloat16",
938
+ "format": "raw",
939
+ "nbytes": 4096,
940
+ "byteOffset": 18874368
941
+ },
942
+ {
943
+ "name": "model.layers.10.post_attention_layernorm.weight",
944
+ "shape": [
945
+ 2048
946
+ ],
947
+ "dtype": "bfloat16",
948
+ "format": "raw",
949
+ "nbytes": 4096,
950
+ "byteOffset": 18878464
951
+ },
952
+ {
953
+ "name": "model.layers.11.self_attn.qkv_proj.weight",
954
+ "shape": [
955
+ 2560,
956
+ 2048
957
+ ],
958
+ "dtype": "bfloat16",
959
+ "format": "raw",
960
+ "nbytes": 10485760,
961
+ "byteOffset": 18882560
962
+ }
963
+ ],
964
+ "md5sum": "f1212bb094d6783e2fc69b90346e22ff"
965
+ },
966
+ {
967
+ "dataPath": "params_shard_29.bin",
968
+ "format": "raw-shard",
969
+ "nbytes": 46137344,
970
+ "records": [
971
+ {
972
+ "name": "model.layers.11.mlp.gate_up_proj.weight",
973
+ "shape": [
974
+ 11264,
975
+ 2048
976
+ ],
977
+ "dtype": "bfloat16",
978
+ "format": "raw",
979
+ "nbytes": 46137344,
980
+ "byteOffset": 0
981
+ }
982
+ ],
983
+ "md5sum": "736af883b80453625d8250b9aa77ac26"
984
+ },
985
+ {
986
+ "dataPath": "params_shard_30.bin",
987
+ "format": "raw-shard",
988
+ "nbytes": 31465472,
989
+ "records": [
990
+ {
991
+ "name": "model.layers.11.self_attn.o_proj.weight",
992
+ "shape": [
993
+ 2048,
994
+ 2048
995
+ ],
996
+ "dtype": "bfloat16",
997
+ "format": "raw",
998
+ "nbytes": 8388608,
999
+ "byteOffset": 0
1000
+ },
1001
+ {
1002
+ "name": "model.layers.11.mlp.down_proj.weight",
1003
+ "shape": [
1004
+ 2048,
1005
+ 5632
1006
+ ],
1007
+ "dtype": "bfloat16",
1008
+ "format": "raw",
1009
+ "nbytes": 23068672,
1010
+ "byteOffset": 8388608
1011
+ },
1012
+ {
1013
+ "name": "model.layers.11.input_layernorm.weight",
1014
+ "shape": [
1015
+ 2048
1016
+ ],
1017
+ "dtype": "bfloat16",
1018
+ "format": "raw",
1019
+ "nbytes": 4096,
1020
+ "byteOffset": 31457280
1021
+ },
1022
+ {
1023
+ "name": "model.layers.11.post_attention_layernorm.weight",
1024
+ "shape": [
1025
+ 2048
1026
+ ],
1027
+ "dtype": "bfloat16",
1028
+ "format": "raw",
1029
+ "nbytes": 4096,
1030
+ "byteOffset": 31461376
1031
+ }
1032
+ ],
1033
+ "md5sum": "36986d44545c8f12516cdaefdb74694b"
1034
+ },
1035
+ {
1036
+ "dataPath": "params_shard_31.bin",
1037
+ "format": "raw-shard",
1038
+ "nbytes": 46137344,
1039
+ "records": [
1040
+ {
1041
+ "name": "model.layers.12.mlp.gate_up_proj.weight",
1042
+ "shape": [
1043
+ 11264,
1044
+ 2048
1045
+ ],
1046
+ "dtype": "bfloat16",
1047
+ "format": "raw",
1048
+ "nbytes": 46137344,
1049
+ "byteOffset": 0
1050
+ }
1051
+ ],
1052
+ "md5sum": "5f97fe1675d789ee5a5a5881b1775515"
1053
+ },
1054
+ {
1055
+ "dataPath": "params_shard_32.bin",
1056
+ "format": "raw-shard",
1057
+ "nbytes": 23068672,
1058
+ "records": [
1059
+ {
1060
+ "name": "model.layers.12.mlp.down_proj.weight",
1061
+ "shape": [
1062
+ 2048,
1063
+ 5632
1064
+ ],
1065
+ "dtype": "bfloat16",
1066
+ "format": "raw",
1067
+ "nbytes": 23068672,
1068
+ "byteOffset": 0
1069
+ }
1070
+ ],
1071
+ "md5sum": "051b501162303570dff375b79148951d"
1072
+ },
1073
+ {
1074
+ "dataPath": "params_shard_33.bin",
1075
+ "format": "raw-shard",
1076
+ "nbytes": 29368320,
1077
+ "records": [
1078
+ {
1079
+ "name": "model.layers.12.self_attn.qkv_proj.weight",
1080
+ "shape": [
1081
+ 2560,
1082
+ 2048
1083
+ ],
1084
+ "dtype": "bfloat16",
1085
+ "format": "raw",
1086
+ "nbytes": 10485760,
1087
+ "byteOffset": 0
1088
+ },
1089
+ {
1090
+ "name": "model.layers.12.self_attn.o_proj.weight",
1091
+ "shape": [
1092
+ 2048,
1093
+ 2048
1094
+ ],
1095
+ "dtype": "bfloat16",
1096
+ "format": "raw",
1097
+ "nbytes": 8388608,
1098
+ "byteOffset": 10485760
1099
+ },
1100
+ {
1101
+ "name": "model.layers.12.input_layernorm.weight",
1102
+ "shape": [
1103
+ 2048
1104
+ ],
1105
+ "dtype": "bfloat16",
1106
+ "format": "raw",
1107
+ "nbytes": 4096,
1108
+ "byteOffset": 18874368
1109
+ },
1110
+ {
1111
+ "name": "model.layers.12.post_attention_layernorm.weight",
1112
+ "shape": [
1113
+ 2048
1114
+ ],
1115
+ "dtype": "bfloat16",
1116
+ "format": "raw",
1117
+ "nbytes": 4096,
1118
+ "byteOffset": 18878464
1119
+ },
1120
+ {
1121
+ "name": "model.layers.13.self_attn.qkv_proj.weight",
1122
+ "shape": [
1123
+ 2560,
1124
+ 2048
1125
+ ],
1126
+ "dtype": "bfloat16",
1127
+ "format": "raw",
1128
+ "nbytes": 10485760,
1129
+ "byteOffset": 18882560
1130
+ }
1131
+ ],
1132
+ "md5sum": "4ae2730f899b0d292e208596ea98b590"
1133
+ },
1134
+ {
1135
+ "dataPath": "params_shard_34.bin",
1136
+ "format": "raw-shard",
1137
+ "nbytes": 46137344,
1138
+ "records": [
1139
+ {
1140
+ "name": "model.layers.13.mlp.gate_up_proj.weight",
1141
+ "shape": [
1142
+ 11264,
1143
+ 2048
1144
+ ],
1145
+ "dtype": "bfloat16",
1146
+ "format": "raw",
1147
+ "nbytes": 46137344,
1148
+ "byteOffset": 0
1149
+ }
1150
+ ],
1151
+ "md5sum": "15f42972b33f6cadbc0b5a17e20ac9b6"
1152
+ },
1153
+ {
1154
+ "dataPath": "params_shard_35.bin",
1155
+ "format": "raw-shard",
1156
+ "nbytes": 31465472,
1157
+ "records": [
1158
+ {
1159
+ "name": "model.layers.13.self_attn.o_proj.weight",
1160
+ "shape": [
1161
+ 2048,
1162
+ 2048
1163
+ ],
1164
+ "dtype": "bfloat16",
1165
+ "format": "raw",
1166
+ "nbytes": 8388608,
1167
+ "byteOffset": 0
1168
+ },
1169
+ {
1170
+ "name": "model.layers.13.mlp.down_proj.weight",
1171
+ "shape": [
1172
+ 2048,
1173
+ 5632
1174
+ ],
1175
+ "dtype": "bfloat16",
1176
+ "format": "raw",
1177
+ "nbytes": 23068672,
1178
+ "byteOffset": 8388608
1179
+ },
1180
+ {
1181
+ "name": "model.layers.13.input_layernorm.weight",
1182
+ "shape": [
1183
+ 2048
1184
+ ],
1185
+ "dtype": "bfloat16",
1186
+ "format": "raw",
1187
+ "nbytes": 4096,
1188
+ "byteOffset": 31457280
1189
+ },
1190
+ {
1191
+ "name": "model.layers.13.post_attention_layernorm.weight",
1192
+ "shape": [
1193
+ 2048
1194
+ ],
1195
+ "dtype": "bfloat16",
1196
+ "format": "raw",
1197
+ "nbytes": 4096,
1198
+ "byteOffset": 31461376
1199
+ }
1200
+ ],
1201
+ "md5sum": "82ec8f8938814a4e75058620d84285c7"
1202
+ },
1203
+ {
1204
+ "dataPath": "params_shard_36.bin",
1205
+ "format": "raw-shard",
1206
+ "nbytes": 46137344,
1207
+ "records": [
1208
+ {
1209
+ "name": "model.layers.14.mlp.gate_up_proj.weight",
1210
+ "shape": [
1211
+ 11264,
1212
+ 2048
1213
+ ],
1214
+ "dtype": "bfloat16",
1215
+ "format": "raw",
1216
+ "nbytes": 46137344,
1217
+ "byteOffset": 0
1218
+ }
1219
+ ],
1220
+ "md5sum": "ccd48809d794270ecb6929d9547aea97"
1221
+ },
1222
+ {
1223
+ "dataPath": "params_shard_37.bin",
1224
+ "format": "raw-shard",
1225
+ "nbytes": 23068672,
1226
+ "records": [
1227
+ {
1228
+ "name": "model.layers.14.mlp.down_proj.weight",
1229
+ "shape": [
1230
+ 2048,
1231
+ 5632
1232
+ ],
1233
+ "dtype": "bfloat16",
1234
+ "format": "raw",
1235
+ "nbytes": 23068672,
1236
+ "byteOffset": 0
1237
+ }
1238
+ ],
1239
+ "md5sum": "d0ce7dab232e7305f3f243febbb4324c"
1240
+ },
1241
+ {
1242
+ "dataPath": "params_shard_38.bin",
1243
+ "format": "raw-shard",
1244
+ "nbytes": 29368320,
1245
+ "records": [
1246
+ {
1247
+ "name": "model.layers.14.self_attn.qkv_proj.weight",
1248
+ "shape": [
1249
+ 2560,
1250
+ 2048
1251
+ ],
1252
+ "dtype": "bfloat16",
1253
+ "format": "raw",
1254
+ "nbytes": 10485760,
1255
+ "byteOffset": 0
1256
+ },
1257
+ {
1258
+ "name": "model.layers.14.self_attn.o_proj.weight",
1259
+ "shape": [
1260
+ 2048,
1261
+ 2048
1262
+ ],
1263
+ "dtype": "bfloat16",
1264
+ "format": "raw",
1265
+ "nbytes": 8388608,
1266
+ "byteOffset": 10485760
1267
+ },
1268
+ {
1269
+ "name": "model.layers.14.input_layernorm.weight",
1270
+ "shape": [
1271
+ 2048
1272
+ ],
1273
+ "dtype": "bfloat16",
1274
+ "format": "raw",
1275
+ "nbytes": 4096,
1276
+ "byteOffset": 18874368
1277
+ },
1278
+ {
1279
+ "name": "model.layers.14.post_attention_layernorm.weight",
1280
+ "shape": [
1281
+ 2048
1282
+ ],
1283
+ "dtype": "bfloat16",
1284
+ "format": "raw",
1285
+ "nbytes": 4096,
1286
+ "byteOffset": 18878464
1287
+ },
1288
+ {
1289
+ "name": "model.layers.15.self_attn.qkv_proj.weight",
1290
+ "shape": [
1291
+ 2560,
1292
+ 2048
1293
+ ],
1294
+ "dtype": "bfloat16",
1295
+ "format": "raw",
1296
+ "nbytes": 10485760,
1297
+ "byteOffset": 18882560
1298
+ }
1299
+ ],
1300
+ "md5sum": "0f007605b9207b32a5c79249505897ef"
1301
+ },
1302
+ {
1303
+ "dataPath": "params_shard_39.bin",
1304
+ "format": "raw-shard",
1305
+ "nbytes": 46137344,
1306
+ "records": [
1307
+ {
1308
+ "name": "model.layers.15.mlp.gate_up_proj.weight",
1309
+ "shape": [
1310
+ 11264,
1311
+ 2048
1312
+ ],
1313
+ "dtype": "bfloat16",
1314
+ "format": "raw",
1315
+ "nbytes": 46137344,
1316
+ "byteOffset": 0
1317
+ }
1318
+ ],
1319
+ "md5sum": "e42d4c04fb873f035b307b92df26974e"
1320
+ },
1321
+ {
1322
+ "dataPath": "params_shard_40.bin",
1323
+ "format": "raw-shard",
1324
+ "nbytes": 31465472,
1325
+ "records": [
1326
+ {
1327
+ "name": "model.layers.15.self_attn.o_proj.weight",
1328
+ "shape": [
1329
+ 2048,
1330
+ 2048
1331
+ ],
1332
+ "dtype": "bfloat16",
1333
+ "format": "raw",
1334
+ "nbytes": 8388608,
1335
+ "byteOffset": 0
1336
+ },
1337
+ {
1338
+ "name": "model.layers.15.mlp.down_proj.weight",
1339
+ "shape": [
1340
+ 2048,
1341
+ 5632
1342
+ ],
1343
+ "dtype": "bfloat16",
1344
+ "format": "raw",
1345
+ "nbytes": 23068672,
1346
+ "byteOffset": 8388608
1347
+ },
1348
+ {
1349
+ "name": "model.layers.15.input_layernorm.weight",
1350
+ "shape": [
1351
+ 2048
1352
+ ],
1353
+ "dtype": "bfloat16",
1354
+ "format": "raw",
1355
+ "nbytes": 4096,
1356
+ "byteOffset": 31457280
1357
+ },
1358
+ {
1359
+ "name": "model.layers.15.post_attention_layernorm.weight",
1360
+ "shape": [
1361
+ 2048
1362
+ ],
1363
+ "dtype": "bfloat16",
1364
+ "format": "raw",
1365
+ "nbytes": 4096,
1366
+ "byteOffset": 31461376
1367
+ }
1368
+ ],
1369
+ "md5sum": "7168ff2801ea67e99469b52abb83f25d"
1370
+ },
1371
+ {
1372
+ "dataPath": "params_shard_41.bin",
1373
+ "format": "raw-shard",
1374
+ "nbytes": 46137344,
1375
+ "records": [
1376
+ {
1377
+ "name": "model.layers.16.mlp.gate_up_proj.weight",
1378
+ "shape": [
1379
+ 11264,
1380
+ 2048
1381
+ ],
1382
+ "dtype": "bfloat16",
1383
+ "format": "raw",
1384
+ "nbytes": 46137344,
1385
+ "byteOffset": 0
1386
+ }
1387
+ ],
1388
+ "md5sum": "623e1a60bfb63c0d49297304aa0aed3e"
1389
+ },
1390
+ {
1391
+ "dataPath": "params_shard_42.bin",
1392
+ "format": "raw-shard",
1393
+ "nbytes": 23068672,
1394
+ "records": [
1395
+ {
1396
+ "name": "model.layers.16.mlp.down_proj.weight",
1397
+ "shape": [
1398
+ 2048,
1399
+ 5632
1400
+ ],
1401
+ "dtype": "bfloat16",
1402
+ "format": "raw",
1403
+ "nbytes": 23068672,
1404
+ "byteOffset": 0
1405
+ }
1406
+ ],
1407
+ "md5sum": "3c4c8058c508560fd9f604e942dd3caa"
1408
+ },
1409
+ {
1410
+ "dataPath": "params_shard_43.bin",
1411
+ "format": "raw-shard",
1412
+ "nbytes": 29368320,
1413
+ "records": [
1414
+ {
1415
+ "name": "model.layers.16.self_attn.qkv_proj.weight",
1416
+ "shape": [
1417
+ 2560,
1418
+ 2048
1419
+ ],
1420
+ "dtype": "bfloat16",
1421
+ "format": "raw",
1422
+ "nbytes": 10485760,
1423
+ "byteOffset": 0
1424
+ },
1425
+ {
1426
+ "name": "model.layers.16.self_attn.o_proj.weight",
1427
+ "shape": [
1428
+ 2048,
1429
+ 2048
1430
+ ],
1431
+ "dtype": "bfloat16",
1432
+ "format": "raw",
1433
+ "nbytes": 8388608,
1434
+ "byteOffset": 10485760
1435
+ },
1436
+ {
1437
+ "name": "model.layers.16.input_layernorm.weight",
1438
+ "shape": [
1439
+ 2048
1440
+ ],
1441
+ "dtype": "bfloat16",
1442
+ "format": "raw",
1443
+ "nbytes": 4096,
1444
+ "byteOffset": 18874368
1445
+ },
1446
+ {
1447
+ "name": "model.layers.16.post_attention_layernorm.weight",
1448
+ "shape": [
1449
+ 2048
1450
+ ],
1451
+ "dtype": "bfloat16",
1452
+ "format": "raw",
1453
+ "nbytes": 4096,
1454
+ "byteOffset": 18878464
1455
+ },
1456
+ {
1457
+ "name": "model.layers.17.self_attn.qkv_proj.weight",
1458
+ "shape": [
1459
+ 2560,
1460
+ 2048
1461
+ ],
1462
+ "dtype": "bfloat16",
1463
+ "format": "raw",
1464
+ "nbytes": 10485760,
1465
+ "byteOffset": 18882560
1466
+ }
1467
+ ],
1468
+ "md5sum": "80dec21b661afcd092d85dcde0f0ec8a"
1469
+ },
1470
+ {
1471
+ "dataPath": "params_shard_44.bin",
1472
+ "format": "raw-shard",
1473
+ "nbytes": 46137344,
1474
+ "records": [
1475
+ {
1476
+ "name": "model.layers.17.mlp.gate_up_proj.weight",
1477
+ "shape": [
1478
+ 11264,
1479
+ 2048
1480
+ ],
1481
+ "dtype": "bfloat16",
1482
+ "format": "raw",
1483
+ "nbytes": 46137344,
1484
+ "byteOffset": 0
1485
+ }
1486
+ ],
1487
+ "md5sum": "9242db19ee58801b87f6cdd63fd1139b"
1488
+ },
1489
+ {
1490
+ "dataPath": "params_shard_45.bin",
1491
+ "format": "raw-shard",
1492
+ "nbytes": 31465472,
1493
+ "records": [
1494
+ {
1495
+ "name": "model.layers.17.self_attn.o_proj.weight",
1496
+ "shape": [
1497
+ 2048,
1498
+ 2048
1499
+ ],
1500
+ "dtype": "bfloat16",
1501
+ "format": "raw",
1502
+ "nbytes": 8388608,
1503
+ "byteOffset": 0
1504
+ },
1505
+ {
1506
+ "name": "model.layers.17.mlp.down_proj.weight",
1507
+ "shape": [
1508
+ 2048,
1509
+ 5632
1510
+ ],
1511
+ "dtype": "bfloat16",
1512
+ "format": "raw",
1513
+ "nbytes": 23068672,
1514
+ "byteOffset": 8388608
1515
+ },
1516
+ {
1517
+ "name": "model.layers.17.input_layernorm.weight",
1518
+ "shape": [
1519
+ 2048
1520
+ ],
1521
+ "dtype": "bfloat16",
1522
+ "format": "raw",
1523
+ "nbytes": 4096,
1524
+ "byteOffset": 31457280
1525
+ },
1526
+ {
1527
+ "name": "model.layers.17.post_attention_layernorm.weight",
1528
+ "shape": [
1529
+ 2048
1530
+ ],
1531
+ "dtype": "bfloat16",
1532
+ "format": "raw",
1533
+ "nbytes": 4096,
1534
+ "byteOffset": 31461376
1535
+ }
1536
+ ],
1537
+ "md5sum": "f83f0c6e381c128e2cbdc59dfd54fa47"
1538
+ },
1539
+ {
1540
+ "dataPath": "params_shard_46.bin",
1541
+ "format": "raw-shard",
1542
+ "nbytes": 46137344,
1543
+ "records": [
1544
+ {
1545
+ "name": "model.layers.18.mlp.gate_up_proj.weight",
1546
+ "shape": [
1547
+ 11264,
1548
+ 2048
1549
+ ],
1550
+ "dtype": "bfloat16",
1551
+ "format": "raw",
1552
+ "nbytes": 46137344,
1553
+ "byteOffset": 0
1554
+ }
1555
+ ],
1556
+ "md5sum": "2663610937313216a42645bc74895a87"
1557
+ },
1558
+ {
1559
+ "dataPath": "params_shard_47.bin",
1560
+ "format": "raw-shard",
1561
+ "nbytes": 23068672,
1562
+ "records": [
1563
+ {
1564
+ "name": "model.layers.18.mlp.down_proj.weight",
1565
+ "shape": [
1566
+ 2048,
1567
+ 5632
1568
+ ],
1569
+ "dtype": "bfloat16",
1570
+ "format": "raw",
1571
+ "nbytes": 23068672,
1572
+ "byteOffset": 0
1573
+ }
1574
+ ],
1575
+ "md5sum": "973defd93492a1311559695a1c45a149"
1576
+ },
1577
+ {
1578
+ "dataPath": "params_shard_48.bin",
1579
+ "format": "raw-shard",
1580
+ "nbytes": 29368320,
1581
+ "records": [
1582
+ {
1583
+ "name": "model.layers.18.self_attn.qkv_proj.weight",
1584
+ "shape": [
1585
+ 2560,
1586
+ 2048
1587
+ ],
1588
+ "dtype": "bfloat16",
1589
+ "format": "raw",
1590
+ "nbytes": 10485760,
1591
+ "byteOffset": 0
1592
+ },
1593
+ {
1594
+ "name": "model.layers.18.self_attn.o_proj.weight",
1595
+ "shape": [
1596
+ 2048,
1597
+ 2048
1598
+ ],
1599
+ "dtype": "bfloat16",
1600
+ "format": "raw",
1601
+ "nbytes": 8388608,
1602
+ "byteOffset": 10485760
1603
+ },
1604
+ {
1605
+ "name": "model.layers.18.input_layernorm.weight",
1606
+ "shape": [
1607
+ 2048
1608
+ ],
1609
+ "dtype": "bfloat16",
1610
+ "format": "raw",
1611
+ "nbytes": 4096,
1612
+ "byteOffset": 18874368
1613
+ },
1614
+ {
1615
+ "name": "model.layers.18.post_attention_layernorm.weight",
1616
+ "shape": [
1617
+ 2048
1618
+ ],
1619
+ "dtype": "bfloat16",
1620
+ "format": "raw",
1621
+ "nbytes": 4096,
1622
+ "byteOffset": 18878464
1623
+ },
1624
+ {
1625
+ "name": "model.layers.19.self_attn.qkv_proj.weight",
1626
+ "shape": [
1627
+ 2560,
1628
+ 2048
1629
+ ],
1630
+ "dtype": "bfloat16",
1631
+ "format": "raw",
1632
+ "nbytes": 10485760,
1633
+ "byteOffset": 18882560
1634
+ }
1635
+ ],
1636
+ "md5sum": "a3274aef0ddf44dbd215f1588d7a8bfe"
1637
+ },
1638
+ {
1639
+ "dataPath": "params_shard_49.bin",
1640
+ "format": "raw-shard",
1641
+ "nbytes": 46137344,
1642
+ "records": [
1643
+ {
1644
+ "name": "model.layers.19.mlp.gate_up_proj.weight",
1645
+ "shape": [
1646
+ 11264,
1647
+ 2048
1648
+ ],
1649
+ "dtype": "bfloat16",
1650
+ "format": "raw",
1651
+ "nbytes": 46137344,
1652
+ "byteOffset": 0
1653
+ }
1654
+ ],
1655
+ "md5sum": "d8b408bc677af3eb67cca3d5f31151e2"
1656
+ },
1657
+ {
1658
+ "dataPath": "params_shard_50.bin",
1659
+ "format": "raw-shard",
1660
+ "nbytes": 31465472,
1661
+ "records": [
1662
+ {
1663
+ "name": "model.layers.19.self_attn.o_proj.weight",
1664
+ "shape": [
1665
+ 2048,
1666
+ 2048
1667
+ ],
1668
+ "dtype": "bfloat16",
1669
+ "format": "raw",
1670
+ "nbytes": 8388608,
1671
+ "byteOffset": 0
1672
+ },
1673
+ {
1674
+ "name": "model.layers.19.mlp.down_proj.weight",
1675
+ "shape": [
1676
+ 2048,
1677
+ 5632
1678
+ ],
1679
+ "dtype": "bfloat16",
1680
+ "format": "raw",
1681
+ "nbytes": 23068672,
1682
+ "byteOffset": 8388608
1683
+ },
1684
+ {
1685
+ "name": "model.layers.19.input_layernorm.weight",
1686
+ "shape": [
1687
+ 2048
1688
+ ],
1689
+ "dtype": "bfloat16",
1690
+ "format": "raw",
1691
+ "nbytes": 4096,
1692
+ "byteOffset": 31457280
1693
+ },
1694
+ {
1695
+ "name": "model.layers.19.post_attention_layernorm.weight",
1696
+ "shape": [
1697
+ 2048
1698
+ ],
1699
+ "dtype": "bfloat16",
1700
+ "format": "raw",
1701
+ "nbytes": 4096,
1702
+ "byteOffset": 31461376
1703
+ }
1704
+ ],
1705
+ "md5sum": "0ea6e376f976c09ec4c316553e5fe49f"
1706
+ },
1707
+ {
1708
+ "dataPath": "params_shard_51.bin",
1709
+ "format": "raw-shard",
1710
+ "nbytes": 46137344,
1711
+ "records": [
1712
+ {
1713
+ "name": "model.layers.20.mlp.gate_up_proj.weight",
1714
+ "shape": [
1715
+ 11264,
1716
+ 2048
1717
+ ],
1718
+ "dtype": "bfloat16",
1719
+ "format": "raw",
1720
+ "nbytes": 46137344,
1721
+ "byteOffset": 0
1722
+ }
1723
+ ],
1724
+ "md5sum": "c433565a3056b5b776c65a1536c565bc"
1725
+ },
1726
+ {
1727
+ "dataPath": "params_shard_52.bin",
1728
+ "format": "raw-shard",
1729
+ "nbytes": 23068672,
1730
+ "records": [
1731
+ {
1732
+ "name": "model.layers.20.mlp.down_proj.weight",
1733
+ "shape": [
1734
+ 2048,
1735
+ 5632
1736
+ ],
1737
+ "dtype": "bfloat16",
1738
+ "format": "raw",
1739
+ "nbytes": 23068672,
1740
+ "byteOffset": 0
1741
+ }
1742
+ ],
1743
+ "md5sum": "1b51d01988351e7bba03eb34e46a2585"
1744
+ },
1745
+ {
1746
+ "dataPath": "params_shard_53.bin",
1747
+ "format": "raw-shard",
1748
+ "nbytes": 29368320,
1749
+ "records": [
1750
+ {
1751
+ "name": "model.layers.20.self_attn.qkv_proj.weight",
1752
+ "shape": [
1753
+ 2560,
1754
+ 2048
1755
+ ],
1756
+ "dtype": "bfloat16",
1757
+ "format": "raw",
1758
+ "nbytes": 10485760,
1759
+ "byteOffset": 0
1760
+ },
1761
+ {
1762
+ "name": "model.layers.20.self_attn.o_proj.weight",
1763
+ "shape": [
1764
+ 2048,
1765
+ 2048
1766
+ ],
1767
+ "dtype": "bfloat16",
1768
+ "format": "raw",
1769
+ "nbytes": 8388608,
1770
+ "byteOffset": 10485760
1771
+ },
1772
+ {
1773
+ "name": "model.layers.20.input_layernorm.weight",
1774
+ "shape": [
1775
+ 2048
1776
+ ],
1777
+ "dtype": "bfloat16",
1778
+ "format": "raw",
1779
+ "nbytes": 4096,
1780
+ "byteOffset": 18874368
1781
+ },
1782
+ {
1783
+ "name": "model.layers.20.post_attention_layernorm.weight",
1784
+ "shape": [
1785
+ 2048
1786
+ ],
1787
+ "dtype": "bfloat16",
1788
+ "format": "raw",
1789
+ "nbytes": 4096,
1790
+ "byteOffset": 18878464
1791
+ },
1792
+ {
1793
+ "name": "model.layers.21.self_attn.qkv_proj.weight",
1794
+ "shape": [
1795
+ 2560,
1796
+ 2048
1797
+ ],
1798
+ "dtype": "bfloat16",
1799
+ "format": "raw",
1800
+ "nbytes": 10485760,
1801
+ "byteOffset": 18882560
1802
+ }
1803
+ ],
1804
+ "md5sum": "65304737b5497527ce7b0d2377866424"
1805
+ },
1806
+ {
1807
+ "dataPath": "params_shard_54.bin",
1808
+ "format": "raw-shard",
1809
+ "nbytes": 46137344,
1810
+ "records": [
1811
+ {
1812
+ "name": "model.layers.21.mlp.gate_up_proj.weight",
1813
+ "shape": [
1814
+ 11264,
1815
+ 2048
1816
+ ],
1817
+ "dtype": "bfloat16",
1818
+ "format": "raw",
1819
+ "nbytes": 46137344,
1820
+ "byteOffset": 0
1821
+ }
1822
+ ],
1823
+ "md5sum": "6e5cd3b68492da8c01ef96dc86faa8a7"
1824
+ },
1825
+ {
1826
+ "dataPath": "params_shard_55.bin",
1827
+ "format": "raw-shard",
1828
+ "nbytes": 131084288,
1829
+ "records": [
1830
+ {
1831
+ "name": "lm_head.weight",
1832
+ "shape": [
1833
+ 32003,
1834
+ 2048
1835
+ ],
1836
+ "dtype": "bfloat16",
1837
+ "format": "raw",
1838
+ "nbytes": 131084288,
1839
+ "byteOffset": 0
1840
+ }
1841
+ ],
1842
+ "md5sum": "cbafe195db523a0e9302dd15bde0f529"
1843
+ },
1844
+ {
1845
+ "dataPath": "params_shard_56.bin",
1846
+ "format": "raw-shard",
1847
+ "nbytes": 31469568,
1848
+ "records": [
1849
+ {
1850
+ "name": "model.layers.21.self_attn.o_proj.weight",
1851
+ "shape": [
1852
+ 2048,
1853
+ 2048
1854
+ ],
1855
+ "dtype": "bfloat16",
1856
+ "format": "raw",
1857
+ "nbytes": 8388608,
1858
+ "byteOffset": 0
1859
+ },
1860
+ {
1861
+ "name": "model.layers.21.mlp.down_proj.weight",
1862
+ "shape": [
1863
+ 2048,
1864
+ 5632
1865
+ ],
1866
+ "dtype": "bfloat16",
1867
+ "format": "raw",
1868
+ "nbytes": 23068672,
1869
+ "byteOffset": 8388608
1870
+ },
1871
+ {
1872
+ "name": "model.layers.21.input_layernorm.weight",
1873
+ "shape": [
1874
+ 2048
1875
+ ],
1876
+ "dtype": "bfloat16",
1877
+ "format": "raw",
1878
+ "nbytes": 4096,
1879
+ "byteOffset": 31457280
1880
+ },
1881
+ {
1882
+ "name": "model.layers.21.post_attention_layernorm.weight",
1883
+ "shape": [
1884
+ 2048
1885
+ ],
1886
+ "dtype": "bfloat16",
1887
+ "format": "raw",
1888
+ "nbytes": 4096,
1889
+ "byteOffset": 31461376
1890
+ },
1891
+ {
1892
+ "name": "model.norm.weight",
1893
+ "shape": [
1894
+ 2048
1895
+ ],
1896
+ "dtype": "bfloat16",
1897
+ "format": "raw",
1898
+ "nbytes": 4096,
1899
+ "byteOffset": 31465472
1900
+ }
1901
+ ],
1902
+ "md5sum": "2d491aad8e6aa64a244bca58e8049b34"
1903
+ }
1904
+ ]
1905
+ }
ndarray-cache.json ADDED
@@ -0,0 +1,1905 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "metadata": {
3
+ "ParamSize": 135,
4
+ "ParamBytes": 4400242688.0,
5
+ "BitsPerParam": 32.0
6
+ },
7
+ "records": [
8
+ {
9
+ "dataPath": "params_shard_0.bin",
10
+ "format": "raw-shard",
11
+ "nbytes": 131084288,
12
+ "records": [
13
+ {
14
+ "name": "model.embed_tokens.weight",
15
+ "shape": [
16
+ 32003,
17
+ 2048
18
+ ],
19
+ "dtype": "float32",
20
+ "format": "f32-to-bf16",
21
+ "nbytes": 131084288,
22
+ "byteOffset": 0
23
+ }
24
+ ],
25
+ "md5sum": "cc0ed85be2a8c0317b9701355b06d552"
26
+ },
27
+ {
28
+ "dataPath": "params_shard_1.bin",
29
+ "format": "raw-shard",
30
+ "nbytes": 46137344,
31
+ "records": [
32
+ {
33
+ "name": "model.layers.0.mlp.gate_up_proj.weight",
34
+ "shape": [
35
+ 11264,
36
+ 2048
37
+ ],
38
+ "dtype": "float32",
39
+ "format": "f32-to-bf16",
40
+ "nbytes": 46137344,
41
+ "byteOffset": 0
42
+ }
43
+ ],
44
+ "md5sum": "3dd515cad2c62e6956cb338bfada11e0"
45
+ },
46
+ {
47
+ "dataPath": "params_shard_2.bin",
48
+ "format": "raw-shard",
49
+ "nbytes": 23068672,
50
+ "records": [
51
+ {
52
+ "name": "model.layers.0.mlp.down_proj.weight",
53
+ "shape": [
54
+ 2048,
55
+ 5632
56
+ ],
57
+ "dtype": "float32",
58
+ "format": "f32-to-bf16",
59
+ "nbytes": 23068672,
60
+ "byteOffset": 0
61
+ }
62
+ ],
63
+ "md5sum": "1af86eb61f0d2fa2466465dea8dbc9ea"
64
+ },
65
+ {
66
+ "dataPath": "params_shard_3.bin",
67
+ "format": "raw-shard",
68
+ "nbytes": 29368320,
69
+ "records": [
70
+ {
71
+ "name": "model.layers.0.self_attn.qkv_proj.weight",
72
+ "shape": [
73
+ 2560,
74
+ 2048
75
+ ],
76
+ "dtype": "float32",
77
+ "format": "f32-to-bf16",
78
+ "nbytes": 10485760,
79
+ "byteOffset": 0
80
+ },
81
+ {
82
+ "name": "model.layers.0.self_attn.o_proj.weight",
83
+ "shape": [
84
+ 2048,
85
+ 2048
86
+ ],
87
+ "dtype": "float32",
88
+ "format": "f32-to-bf16",
89
+ "nbytes": 8388608,
90
+ "byteOffset": 10485760
91
+ },
92
+ {
93
+ "name": "model.layers.0.input_layernorm.weight",
94
+ "shape": [
95
+ 2048
96
+ ],
97
+ "dtype": "float32",
98
+ "format": "f32-to-bf16",
99
+ "nbytes": 4096,
100
+ "byteOffset": 18874368
101
+ },
102
+ {
103
+ "name": "model.layers.0.post_attention_layernorm.weight",
104
+ "shape": [
105
+ 2048
106
+ ],
107
+ "dtype": "float32",
108
+ "format": "f32-to-bf16",
109
+ "nbytes": 4096,
110
+ "byteOffset": 18878464
111
+ },
112
+ {
113
+ "name": "model.layers.1.self_attn.qkv_proj.weight",
114
+ "shape": [
115
+ 2560,
116
+ 2048
117
+ ],
118
+ "dtype": "float32",
119
+ "format": "f32-to-bf16",
120
+ "nbytes": 10485760,
121
+ "byteOffset": 18882560
122
+ }
123
+ ],
124
+ "md5sum": "f9bed0391d17e40343e2c90c45fff822"
125
+ },
126
+ {
127
+ "dataPath": "params_shard_4.bin",
128
+ "format": "raw-shard",
129
+ "nbytes": 46137344,
130
+ "records": [
131
+ {
132
+ "name": "model.layers.1.mlp.gate_up_proj.weight",
133
+ "shape": [
134
+ 11264,
135
+ 2048
136
+ ],
137
+ "dtype": "float32",
138
+ "format": "f32-to-bf16",
139
+ "nbytes": 46137344,
140
+ "byteOffset": 0
141
+ }
142
+ ],
143
+ "md5sum": "97d2718af5ea9779f503379c2556e516"
144
+ },
145
+ {
146
+ "dataPath": "params_shard_5.bin",
147
+ "format": "raw-shard",
148
+ "nbytes": 31465472,
149
+ "records": [
150
+ {
151
+ "name": "model.layers.1.self_attn.o_proj.weight",
152
+ "shape": [
153
+ 2048,
154
+ 2048
155
+ ],
156
+ "dtype": "float32",
157
+ "format": "f32-to-bf16",
158
+ "nbytes": 8388608,
159
+ "byteOffset": 0
160
+ },
161
+ {
162
+ "name": "model.layers.1.mlp.down_proj.weight",
163
+ "shape": [
164
+ 2048,
165
+ 5632
166
+ ],
167
+ "dtype": "float32",
168
+ "format": "f32-to-bf16",
169
+ "nbytes": 23068672,
170
+ "byteOffset": 8388608
171
+ },
172
+ {
173
+ "name": "model.layers.1.input_layernorm.weight",
174
+ "shape": [
175
+ 2048
176
+ ],
177
+ "dtype": "float32",
178
+ "format": "f32-to-bf16",
179
+ "nbytes": 4096,
180
+ "byteOffset": 31457280
181
+ },
182
+ {
183
+ "name": "model.layers.1.post_attention_layernorm.weight",
184
+ "shape": [
185
+ 2048
186
+ ],
187
+ "dtype": "float32",
188
+ "format": "f32-to-bf16",
189
+ "nbytes": 4096,
190
+ "byteOffset": 31461376
191
+ }
192
+ ],
193
+ "md5sum": "4335e0565f1c118752b5d18bf96f239f"
194
+ },
195
+ {
196
+ "dataPath": "params_shard_6.bin",
197
+ "format": "raw-shard",
198
+ "nbytes": 46137344,
199
+ "records": [
200
+ {
201
+ "name": "model.layers.2.mlp.gate_up_proj.weight",
202
+ "shape": [
203
+ 11264,
204
+ 2048
205
+ ],
206
+ "dtype": "float32",
207
+ "format": "f32-to-bf16",
208
+ "nbytes": 46137344,
209
+ "byteOffset": 0
210
+ }
211
+ ],
212
+ "md5sum": "5d59c33ffddda7bcca41c5c9ad65db51"
213
+ },
214
+ {
215
+ "dataPath": "params_shard_7.bin",
216
+ "format": "raw-shard",
217
+ "nbytes": 23068672,
218
+ "records": [
219
+ {
220
+ "name": "model.layers.2.mlp.down_proj.weight",
221
+ "shape": [
222
+ 2048,
223
+ 5632
224
+ ],
225
+ "dtype": "float32",
226
+ "format": "f32-to-bf16",
227
+ "nbytes": 23068672,
228
+ "byteOffset": 0
229
+ }
230
+ ],
231
+ "md5sum": "225dd6d9dd691ca0defb95bf03045fcd"
232
+ },
233
+ {
234
+ "dataPath": "params_shard_8.bin",
235
+ "format": "raw-shard",
236
+ "nbytes": 29368320,
237
+ "records": [
238
+ {
239
+ "name": "model.layers.2.self_attn.qkv_proj.weight",
240
+ "shape": [
241
+ 2560,
242
+ 2048
243
+ ],
244
+ "dtype": "float32",
245
+ "format": "f32-to-bf16",
246
+ "nbytes": 10485760,
247
+ "byteOffset": 0
248
+ },
249
+ {
250
+ "name": "model.layers.2.self_attn.o_proj.weight",
251
+ "shape": [
252
+ 2048,
253
+ 2048
254
+ ],
255
+ "dtype": "float32",
256
+ "format": "f32-to-bf16",
257
+ "nbytes": 8388608,
258
+ "byteOffset": 10485760
259
+ },
260
+ {
261
+ "name": "model.layers.2.input_layernorm.weight",
262
+ "shape": [
263
+ 2048
264
+ ],
265
+ "dtype": "float32",
266
+ "format": "f32-to-bf16",
267
+ "nbytes": 4096,
268
+ "byteOffset": 18874368
269
+ },
270
+ {
271
+ "name": "model.layers.2.post_attention_layernorm.weight",
272
+ "shape": [
273
+ 2048
274
+ ],
275
+ "dtype": "float32",
276
+ "format": "f32-to-bf16",
277
+ "nbytes": 4096,
278
+ "byteOffset": 18878464
279
+ },
280
+ {
281
+ "name": "model.layers.3.self_attn.qkv_proj.weight",
282
+ "shape": [
283
+ 2560,
284
+ 2048
285
+ ],
286
+ "dtype": "float32",
287
+ "format": "f32-to-bf16",
288
+ "nbytes": 10485760,
289
+ "byteOffset": 18882560
290
+ }
291
+ ],
292
+ "md5sum": "24956e7e3440d13c787912f075b91808"
293
+ },
294
+ {
295
+ "dataPath": "params_shard_9.bin",
296
+ "format": "raw-shard",
297
+ "nbytes": 46137344,
298
+ "records": [
299
+ {
300
+ "name": "model.layers.3.mlp.gate_up_proj.weight",
301
+ "shape": [
302
+ 11264,
303
+ 2048
304
+ ],
305
+ "dtype": "float32",
306
+ "format": "f32-to-bf16",
307
+ "nbytes": 46137344,
308
+ "byteOffset": 0
309
+ }
310
+ ],
311
+ "md5sum": "ccf10355f04707b6ea13fe1905848786"
312
+ },
313
+ {
314
+ "dataPath": "params_shard_10.bin",
315
+ "format": "raw-shard",
316
+ "nbytes": 31465472,
317
+ "records": [
318
+ {
319
+ "name": "model.layers.3.self_attn.o_proj.weight",
320
+ "shape": [
321
+ 2048,
322
+ 2048
323
+ ],
324
+ "dtype": "float32",
325
+ "format": "f32-to-bf16",
326
+ "nbytes": 8388608,
327
+ "byteOffset": 0
328
+ },
329
+ {
330
+ "name": "model.layers.3.mlp.down_proj.weight",
331
+ "shape": [
332
+ 2048,
333
+ 5632
334
+ ],
335
+ "dtype": "float32",
336
+ "format": "f32-to-bf16",
337
+ "nbytes": 23068672,
338
+ "byteOffset": 8388608
339
+ },
340
+ {
341
+ "name": "model.layers.3.input_layernorm.weight",
342
+ "shape": [
343
+ 2048
344
+ ],
345
+ "dtype": "float32",
346
+ "format": "f32-to-bf16",
347
+ "nbytes": 4096,
348
+ "byteOffset": 31457280
349
+ },
350
+ {
351
+ "name": "model.layers.3.post_attention_layernorm.weight",
352
+ "shape": [
353
+ 2048
354
+ ],
355
+ "dtype": "float32",
356
+ "format": "f32-to-bf16",
357
+ "nbytes": 4096,
358
+ "byteOffset": 31461376
359
+ }
360
+ ],
361
+ "md5sum": "63a5e91ead395e5212dec520fe6eb3b9"
362
+ },
363
+ {
364
+ "dataPath": "params_shard_11.bin",
365
+ "format": "raw-shard",
366
+ "nbytes": 46137344,
367
+ "records": [
368
+ {
369
+ "name": "model.layers.4.mlp.gate_up_proj.weight",
370
+ "shape": [
371
+ 11264,
372
+ 2048
373
+ ],
374
+ "dtype": "float32",
375
+ "format": "f32-to-bf16",
376
+ "nbytes": 46137344,
377
+ "byteOffset": 0
378
+ }
379
+ ],
380
+ "md5sum": "bd1768bd70d7a687d57c6eb5ad700f98"
381
+ },
382
+ {
383
+ "dataPath": "params_shard_12.bin",
384
+ "format": "raw-shard",
385
+ "nbytes": 23068672,
386
+ "records": [
387
+ {
388
+ "name": "model.layers.4.mlp.down_proj.weight",
389
+ "shape": [
390
+ 2048,
391
+ 5632
392
+ ],
393
+ "dtype": "float32",
394
+ "format": "f32-to-bf16",
395
+ "nbytes": 23068672,
396
+ "byteOffset": 0
397
+ }
398
+ ],
399
+ "md5sum": "378e8ccbebfe1c7c5a6abd44e17a4b33"
400
+ },
401
+ {
402
+ "dataPath": "params_shard_13.bin",
403
+ "format": "raw-shard",
404
+ "nbytes": 29368320,
405
+ "records": [
406
+ {
407
+ "name": "model.layers.4.self_attn.qkv_proj.weight",
408
+ "shape": [
409
+ 2560,
410
+ 2048
411
+ ],
412
+ "dtype": "float32",
413
+ "format": "f32-to-bf16",
414
+ "nbytes": 10485760,
415
+ "byteOffset": 0
416
+ },
417
+ {
418
+ "name": "model.layers.4.self_attn.o_proj.weight",
419
+ "shape": [
420
+ 2048,
421
+ 2048
422
+ ],
423
+ "dtype": "float32",
424
+ "format": "f32-to-bf16",
425
+ "nbytes": 8388608,
426
+ "byteOffset": 10485760
427
+ },
428
+ {
429
+ "name": "model.layers.4.input_layernorm.weight",
430
+ "shape": [
431
+ 2048
432
+ ],
433
+ "dtype": "float32",
434
+ "format": "f32-to-bf16",
435
+ "nbytes": 4096,
436
+ "byteOffset": 18874368
437
+ },
438
+ {
439
+ "name": "model.layers.4.post_attention_layernorm.weight",
440
+ "shape": [
441
+ 2048
442
+ ],
443
+ "dtype": "float32",
444
+ "format": "f32-to-bf16",
445
+ "nbytes": 4096,
446
+ "byteOffset": 18878464
447
+ },
448
+ {
449
+ "name": "model.layers.5.self_attn.qkv_proj.weight",
450
+ "shape": [
451
+ 2560,
452
+ 2048
453
+ ],
454
+ "dtype": "float32",
455
+ "format": "f32-to-bf16",
456
+ "nbytes": 10485760,
457
+ "byteOffset": 18882560
458
+ }
459
+ ],
460
+ "md5sum": "69ae4f8c3d765f016cc84e46902a5a61"
461
+ },
462
+ {
463
+ "dataPath": "params_shard_14.bin",
464
+ "format": "raw-shard",
465
+ "nbytes": 46137344,
466
+ "records": [
467
+ {
468
+ "name": "model.layers.5.mlp.gate_up_proj.weight",
469
+ "shape": [
470
+ 11264,
471
+ 2048
472
+ ],
473
+ "dtype": "float32",
474
+ "format": "f32-to-bf16",
475
+ "nbytes": 46137344,
476
+ "byteOffset": 0
477
+ }
478
+ ],
479
+ "md5sum": "21882b96e7a6182502df1c8409f5ecf4"
480
+ },
481
+ {
482
+ "dataPath": "params_shard_15.bin",
483
+ "format": "raw-shard",
484
+ "nbytes": 31465472,
485
+ "records": [
486
+ {
487
+ "name": "model.layers.5.self_attn.o_proj.weight",
488
+ "shape": [
489
+ 2048,
490
+ 2048
491
+ ],
492
+ "dtype": "float32",
493
+ "format": "f32-to-bf16",
494
+ "nbytes": 8388608,
495
+ "byteOffset": 0
496
+ },
497
+ {
498
+ "name": "model.layers.5.mlp.down_proj.weight",
499
+ "shape": [
500
+ 2048,
501
+ 5632
502
+ ],
503
+ "dtype": "float32",
504
+ "format": "f32-to-bf16",
505
+ "nbytes": 23068672,
506
+ "byteOffset": 8388608
507
+ },
508
+ {
509
+ "name": "model.layers.5.input_layernorm.weight",
510
+ "shape": [
511
+ 2048
512
+ ],
513
+ "dtype": "float32",
514
+ "format": "f32-to-bf16",
515
+ "nbytes": 4096,
516
+ "byteOffset": 31457280
517
+ },
518
+ {
519
+ "name": "model.layers.5.post_attention_layernorm.weight",
520
+ "shape": [
521
+ 2048
522
+ ],
523
+ "dtype": "float32",
524
+ "format": "f32-to-bf16",
525
+ "nbytes": 4096,
526
+ "byteOffset": 31461376
527
+ }
528
+ ],
529
+ "md5sum": "94fb7197266794b1d62a683e6f83a8f5"
530
+ },
531
+ {
532
+ "dataPath": "params_shard_16.bin",
533
+ "format": "raw-shard",
534
+ "nbytes": 46137344,
535
+ "records": [
536
+ {
537
+ "name": "model.layers.6.mlp.gate_up_proj.weight",
538
+ "shape": [
539
+ 11264,
540
+ 2048
541
+ ],
542
+ "dtype": "float32",
543
+ "format": "f32-to-bf16",
544
+ "nbytes": 46137344,
545
+ "byteOffset": 0
546
+ }
547
+ ],
548
+ "md5sum": "bea00f9a26131c903821b452f4d9fea1"
549
+ },
550
+ {
551
+ "dataPath": "params_shard_17.bin",
552
+ "format": "raw-shard",
553
+ "nbytes": 23068672,
554
+ "records": [
555
+ {
556
+ "name": "model.layers.6.mlp.down_proj.weight",
557
+ "shape": [
558
+ 2048,
559
+ 5632
560
+ ],
561
+ "dtype": "float32",
562
+ "format": "f32-to-bf16",
563
+ "nbytes": 23068672,
564
+ "byteOffset": 0
565
+ }
566
+ ],
567
+ "md5sum": "233adc3d89e9fd0e63e8c024f2656d18"
568
+ },
569
+ {
570
+ "dataPath": "params_shard_18.bin",
571
+ "format": "raw-shard",
572
+ "nbytes": 29368320,
573
+ "records": [
574
+ {
575
+ "name": "model.layers.6.self_attn.qkv_proj.weight",
576
+ "shape": [
577
+ 2560,
578
+ 2048
579
+ ],
580
+ "dtype": "float32",
581
+ "format": "f32-to-bf16",
582
+ "nbytes": 10485760,
583
+ "byteOffset": 0
584
+ },
585
+ {
586
+ "name": "model.layers.6.self_attn.o_proj.weight",
587
+ "shape": [
588
+ 2048,
589
+ 2048
590
+ ],
591
+ "dtype": "float32",
592
+ "format": "f32-to-bf16",
593
+ "nbytes": 8388608,
594
+ "byteOffset": 10485760
595
+ },
596
+ {
597
+ "name": "model.layers.6.input_layernorm.weight",
598
+ "shape": [
599
+ 2048
600
+ ],
601
+ "dtype": "float32",
602
+ "format": "f32-to-bf16",
603
+ "nbytes": 4096,
604
+ "byteOffset": 18874368
605
+ },
606
+ {
607
+ "name": "model.layers.6.post_attention_layernorm.weight",
608
+ "shape": [
609
+ 2048
610
+ ],
611
+ "dtype": "float32",
612
+ "format": "f32-to-bf16",
613
+ "nbytes": 4096,
614
+ "byteOffset": 18878464
615
+ },
616
+ {
617
+ "name": "model.layers.7.self_attn.qkv_proj.weight",
618
+ "shape": [
619
+ 2560,
620
+ 2048
621
+ ],
622
+ "dtype": "float32",
623
+ "format": "f32-to-bf16",
624
+ "nbytes": 10485760,
625
+ "byteOffset": 18882560
626
+ }
627
+ ],
628
+ "md5sum": "abbe67b5d794b62799b3bcbdaf1d2f13"
629
+ },
630
+ {
631
+ "dataPath": "params_shard_19.bin",
632
+ "format": "raw-shard",
633
+ "nbytes": 46137344,
634
+ "records": [
635
+ {
636
+ "name": "model.layers.7.mlp.gate_up_proj.weight",
637
+ "shape": [
638
+ 11264,
639
+ 2048
640
+ ],
641
+ "dtype": "float32",
642
+ "format": "f32-to-bf16",
643
+ "nbytes": 46137344,
644
+ "byteOffset": 0
645
+ }
646
+ ],
647
+ "md5sum": "b9925f234796242b06783b10faa4f066"
648
+ },
649
+ {
650
+ "dataPath": "params_shard_20.bin",
651
+ "format": "raw-shard",
652
+ "nbytes": 31465472,
653
+ "records": [
654
+ {
655
+ "name": "model.layers.7.self_attn.o_proj.weight",
656
+ "shape": [
657
+ 2048,
658
+ 2048
659
+ ],
660
+ "dtype": "float32",
661
+ "format": "f32-to-bf16",
662
+ "nbytes": 8388608,
663
+ "byteOffset": 0
664
+ },
665
+ {
666
+ "name": "model.layers.7.mlp.down_proj.weight",
667
+ "shape": [
668
+ 2048,
669
+ 5632
670
+ ],
671
+ "dtype": "float32",
672
+ "format": "f32-to-bf16",
673
+ "nbytes": 23068672,
674
+ "byteOffset": 8388608
675
+ },
676
+ {
677
+ "name": "model.layers.7.input_layernorm.weight",
678
+ "shape": [
679
+ 2048
680
+ ],
681
+ "dtype": "float32",
682
+ "format": "f32-to-bf16",
683
+ "nbytes": 4096,
684
+ "byteOffset": 31457280
685
+ },
686
+ {
687
+ "name": "model.layers.7.post_attention_layernorm.weight",
688
+ "shape": [
689
+ 2048
690
+ ],
691
+ "dtype": "float32",
692
+ "format": "f32-to-bf16",
693
+ "nbytes": 4096,
694
+ "byteOffset": 31461376
695
+ }
696
+ ],
697
+ "md5sum": "449553c5c774c9612c71db8e6fe0408e"
698
+ },
699
+ {
700
+ "dataPath": "params_shard_21.bin",
701
+ "format": "raw-shard",
702
+ "nbytes": 46137344,
703
+ "records": [
704
+ {
705
+ "name": "model.layers.8.mlp.gate_up_proj.weight",
706
+ "shape": [
707
+ 11264,
708
+ 2048
709
+ ],
710
+ "dtype": "float32",
711
+ "format": "f32-to-bf16",
712
+ "nbytes": 46137344,
713
+ "byteOffset": 0
714
+ }
715
+ ],
716
+ "md5sum": "2d45e9f3e8159430be240362d67ad6af"
717
+ },
718
+ {
719
+ "dataPath": "params_shard_22.bin",
720
+ "format": "raw-shard",
721
+ "nbytes": 23068672,
722
+ "records": [
723
+ {
724
+ "name": "model.layers.8.mlp.down_proj.weight",
725
+ "shape": [
726
+ 2048,
727
+ 5632
728
+ ],
729
+ "dtype": "float32",
730
+ "format": "f32-to-bf16",
731
+ "nbytes": 23068672,
732
+ "byteOffset": 0
733
+ }
734
+ ],
735
+ "md5sum": "62d066a0af5c29b2e152d1f0e1fa83a5"
736
+ },
737
+ {
738
+ "dataPath": "params_shard_23.bin",
739
+ "format": "raw-shard",
740
+ "nbytes": 29368320,
741
+ "records": [
742
+ {
743
+ "name": "model.layers.8.self_attn.qkv_proj.weight",
744
+ "shape": [
745
+ 2560,
746
+ 2048
747
+ ],
748
+ "dtype": "float32",
749
+ "format": "f32-to-bf16",
750
+ "nbytes": 10485760,
751
+ "byteOffset": 0
752
+ },
753
+ {
754
+ "name": "model.layers.8.self_attn.o_proj.weight",
755
+ "shape": [
756
+ 2048,
757
+ 2048
758
+ ],
759
+ "dtype": "float32",
760
+ "format": "f32-to-bf16",
761
+ "nbytes": 8388608,
762
+ "byteOffset": 10485760
763
+ },
764
+ {
765
+ "name": "model.layers.8.input_layernorm.weight",
766
+ "shape": [
767
+ 2048
768
+ ],
769
+ "dtype": "float32",
770
+ "format": "f32-to-bf16",
771
+ "nbytes": 4096,
772
+ "byteOffset": 18874368
773
+ },
774
+ {
775
+ "name": "model.layers.8.post_attention_layernorm.weight",
776
+ "shape": [
777
+ 2048
778
+ ],
779
+ "dtype": "float32",
780
+ "format": "f32-to-bf16",
781
+ "nbytes": 4096,
782
+ "byteOffset": 18878464
783
+ },
784
+ {
785
+ "name": "model.layers.9.self_attn.qkv_proj.weight",
786
+ "shape": [
787
+ 2560,
788
+ 2048
789
+ ],
790
+ "dtype": "float32",
791
+ "format": "f32-to-bf16",
792
+ "nbytes": 10485760,
793
+ "byteOffset": 18882560
794
+ }
795
+ ],
796
+ "md5sum": "7bc70a5644474679d9b8c803f86803e6"
797
+ },
798
+ {
799
+ "dataPath": "params_shard_24.bin",
800
+ "format": "raw-shard",
801
+ "nbytes": 46137344,
802
+ "records": [
803
+ {
804
+ "name": "model.layers.9.mlp.gate_up_proj.weight",
805
+ "shape": [
806
+ 11264,
807
+ 2048
808
+ ],
809
+ "dtype": "float32",
810
+ "format": "f32-to-bf16",
811
+ "nbytes": 46137344,
812
+ "byteOffset": 0
813
+ }
814
+ ],
815
+ "md5sum": "a385c498afbccecefdb8faefc328a383"
816
+ },
817
+ {
818
+ "dataPath": "params_shard_25.bin",
819
+ "format": "raw-shard",
820
+ "nbytes": 31465472,
821
+ "records": [
822
+ {
823
+ "name": "model.layers.9.self_attn.o_proj.weight",
824
+ "shape": [
825
+ 2048,
826
+ 2048
827
+ ],
828
+ "dtype": "float32",
829
+ "format": "f32-to-bf16",
830
+ "nbytes": 8388608,
831
+ "byteOffset": 0
832
+ },
833
+ {
834
+ "name": "model.layers.9.mlp.down_proj.weight",
835
+ "shape": [
836
+ 2048,
837
+ 5632
838
+ ],
839
+ "dtype": "float32",
840
+ "format": "f32-to-bf16",
841
+ "nbytes": 23068672,
842
+ "byteOffset": 8388608
843
+ },
844
+ {
845
+ "name": "model.layers.9.input_layernorm.weight",
846
+ "shape": [
847
+ 2048
848
+ ],
849
+ "dtype": "float32",
850
+ "format": "f32-to-bf16",
851
+ "nbytes": 4096,
852
+ "byteOffset": 31457280
853
+ },
854
+ {
855
+ "name": "model.layers.9.post_attention_layernorm.weight",
856
+ "shape": [
857
+ 2048
858
+ ],
859
+ "dtype": "float32",
860
+ "format": "f32-to-bf16",
861
+ "nbytes": 4096,
862
+ "byteOffset": 31461376
863
+ }
864
+ ],
865
+ "md5sum": "482f31eacf412fc7c13d5fef45ef2909"
866
+ },
867
+ {
868
+ "dataPath": "params_shard_26.bin",
869
+ "format": "raw-shard",
870
+ "nbytes": 46137344,
871
+ "records": [
872
+ {
873
+ "name": "model.layers.10.mlp.gate_up_proj.weight",
874
+ "shape": [
875
+ 11264,
876
+ 2048
877
+ ],
878
+ "dtype": "float32",
879
+ "format": "f32-to-bf16",
880
+ "nbytes": 46137344,
881
+ "byteOffset": 0
882
+ }
883
+ ],
884
+ "md5sum": "850943605e71e9be8f0691c3b55959d3"
885
+ },
886
+ {
887
+ "dataPath": "params_shard_27.bin",
888
+ "format": "raw-shard",
889
+ "nbytes": 23068672,
890
+ "records": [
891
+ {
892
+ "name": "model.layers.10.mlp.down_proj.weight",
893
+ "shape": [
894
+ 2048,
895
+ 5632
896
+ ],
897
+ "dtype": "float32",
898
+ "format": "f32-to-bf16",
899
+ "nbytes": 23068672,
900
+ "byteOffset": 0
901
+ }
902
+ ],
903
+ "md5sum": "9207fc9eb5cd6be8fac65a1c7685ff73"
904
+ },
905
+ {
906
+ "dataPath": "params_shard_28.bin",
907
+ "format": "raw-shard",
908
+ "nbytes": 29368320,
909
+ "records": [
910
+ {
911
+ "name": "model.layers.10.self_attn.qkv_proj.weight",
912
+ "shape": [
913
+ 2560,
914
+ 2048
915
+ ],
916
+ "dtype": "float32",
917
+ "format": "f32-to-bf16",
918
+ "nbytes": 10485760,
919
+ "byteOffset": 0
920
+ },
921
+ {
922
+ "name": "model.layers.10.self_attn.o_proj.weight",
923
+ "shape": [
924
+ 2048,
925
+ 2048
926
+ ],
927
+ "dtype": "float32",
928
+ "format": "f32-to-bf16",
929
+ "nbytes": 8388608,
930
+ "byteOffset": 10485760
931
+ },
932
+ {
933
+ "name": "model.layers.10.input_layernorm.weight",
934
+ "shape": [
935
+ 2048
936
+ ],
937
+ "dtype": "float32",
938
+ "format": "f32-to-bf16",
939
+ "nbytes": 4096,
940
+ "byteOffset": 18874368
941
+ },
942
+ {
943
+ "name": "model.layers.10.post_attention_layernorm.weight",
944
+ "shape": [
945
+ 2048
946
+ ],
947
+ "dtype": "float32",
948
+ "format": "f32-to-bf16",
949
+ "nbytes": 4096,
950
+ "byteOffset": 18878464
951
+ },
952
+ {
953
+ "name": "model.layers.11.self_attn.qkv_proj.weight",
954
+ "shape": [
955
+ 2560,
956
+ 2048
957
+ ],
958
+ "dtype": "float32",
959
+ "format": "f32-to-bf16",
960
+ "nbytes": 10485760,
961
+ "byteOffset": 18882560
962
+ }
963
+ ],
964
+ "md5sum": "f1212bb094d6783e2fc69b90346e22ff"
965
+ },
966
+ {
967
+ "dataPath": "params_shard_29.bin",
968
+ "format": "raw-shard",
969
+ "nbytes": 46137344,
970
+ "records": [
971
+ {
972
+ "name": "model.layers.11.mlp.gate_up_proj.weight",
973
+ "shape": [
974
+ 11264,
975
+ 2048
976
+ ],
977
+ "dtype": "float32",
978
+ "format": "f32-to-bf16",
979
+ "nbytes": 46137344,
980
+ "byteOffset": 0
981
+ }
982
+ ],
983
+ "md5sum": "736af883b80453625d8250b9aa77ac26"
984
+ },
985
+ {
986
+ "dataPath": "params_shard_30.bin",
987
+ "format": "raw-shard",
988
+ "nbytes": 31465472,
989
+ "records": [
990
+ {
991
+ "name": "model.layers.11.self_attn.o_proj.weight",
992
+ "shape": [
993
+ 2048,
994
+ 2048
995
+ ],
996
+ "dtype": "float32",
997
+ "format": "f32-to-bf16",
998
+ "nbytes": 8388608,
999
+ "byteOffset": 0
1000
+ },
1001
+ {
1002
+ "name": "model.layers.11.mlp.down_proj.weight",
1003
+ "shape": [
1004
+ 2048,
1005
+ 5632
1006
+ ],
1007
+ "dtype": "float32",
1008
+ "format": "f32-to-bf16",
1009
+ "nbytes": 23068672,
1010
+ "byteOffset": 8388608
1011
+ },
1012
+ {
1013
+ "name": "model.layers.11.input_layernorm.weight",
1014
+ "shape": [
1015
+ 2048
1016
+ ],
1017
+ "dtype": "float32",
1018
+ "format": "f32-to-bf16",
1019
+ "nbytes": 4096,
1020
+ "byteOffset": 31457280
1021
+ },
1022
+ {
1023
+ "name": "model.layers.11.post_attention_layernorm.weight",
1024
+ "shape": [
1025
+ 2048
1026
+ ],
1027
+ "dtype": "float32",
1028
+ "format": "f32-to-bf16",
1029
+ "nbytes": 4096,
1030
+ "byteOffset": 31461376
1031
+ }
1032
+ ],
1033
+ "md5sum": "36986d44545c8f12516cdaefdb74694b"
1034
+ },
1035
+ {
1036
+ "dataPath": "params_shard_31.bin",
1037
+ "format": "raw-shard",
1038
+ "nbytes": 46137344,
1039
+ "records": [
1040
+ {
1041
+ "name": "model.layers.12.mlp.gate_up_proj.weight",
1042
+ "shape": [
1043
+ 11264,
1044
+ 2048
1045
+ ],
1046
+ "dtype": "float32",
1047
+ "format": "f32-to-bf16",
1048
+ "nbytes": 46137344,
1049
+ "byteOffset": 0
1050
+ }
1051
+ ],
1052
+ "md5sum": "5f97fe1675d789ee5a5a5881b1775515"
1053
+ },
1054
+ {
1055
+ "dataPath": "params_shard_32.bin",
1056
+ "format": "raw-shard",
1057
+ "nbytes": 23068672,
1058
+ "records": [
1059
+ {
1060
+ "name": "model.layers.12.mlp.down_proj.weight",
1061
+ "shape": [
1062
+ 2048,
1063
+ 5632
1064
+ ],
1065
+ "dtype": "float32",
1066
+ "format": "f32-to-bf16",
1067
+ "nbytes": 23068672,
1068
+ "byteOffset": 0
1069
+ }
1070
+ ],
1071
+ "md5sum": "051b501162303570dff375b79148951d"
1072
+ },
1073
+ {
1074
+ "dataPath": "params_shard_33.bin",
1075
+ "format": "raw-shard",
1076
+ "nbytes": 29368320,
1077
+ "records": [
1078
+ {
1079
+ "name": "model.layers.12.self_attn.qkv_proj.weight",
1080
+ "shape": [
1081
+ 2560,
1082
+ 2048
1083
+ ],
1084
+ "dtype": "float32",
1085
+ "format": "f32-to-bf16",
1086
+ "nbytes": 10485760,
1087
+ "byteOffset": 0
1088
+ },
1089
+ {
1090
+ "name": "model.layers.12.self_attn.o_proj.weight",
1091
+ "shape": [
1092
+ 2048,
1093
+ 2048
1094
+ ],
1095
+ "dtype": "float32",
1096
+ "format": "f32-to-bf16",
1097
+ "nbytes": 8388608,
1098
+ "byteOffset": 10485760
1099
+ },
1100
+ {
1101
+ "name": "model.layers.12.input_layernorm.weight",
1102
+ "shape": [
1103
+ 2048
1104
+ ],
1105
+ "dtype": "float32",
1106
+ "format": "f32-to-bf16",
1107
+ "nbytes": 4096,
1108
+ "byteOffset": 18874368
1109
+ },
1110
+ {
1111
+ "name": "model.layers.12.post_attention_layernorm.weight",
1112
+ "shape": [
1113
+ 2048
1114
+ ],
1115
+ "dtype": "float32",
1116
+ "format": "f32-to-bf16",
1117
+ "nbytes": 4096,
1118
+ "byteOffset": 18878464
1119
+ },
1120
+ {
1121
+ "name": "model.layers.13.self_attn.qkv_proj.weight",
1122
+ "shape": [
1123
+ 2560,
1124
+ 2048
1125
+ ],
1126
+ "dtype": "float32",
1127
+ "format": "f32-to-bf16",
1128
+ "nbytes": 10485760,
1129
+ "byteOffset": 18882560
1130
+ }
1131
+ ],
1132
+ "md5sum": "4ae2730f899b0d292e208596ea98b590"
1133
+ },
1134
+ {
1135
+ "dataPath": "params_shard_34.bin",
1136
+ "format": "raw-shard",
1137
+ "nbytes": 46137344,
1138
+ "records": [
1139
+ {
1140
+ "name": "model.layers.13.mlp.gate_up_proj.weight",
1141
+ "shape": [
1142
+ 11264,
1143
+ 2048
1144
+ ],
1145
+ "dtype": "float32",
1146
+ "format": "f32-to-bf16",
1147
+ "nbytes": 46137344,
1148
+ "byteOffset": 0
1149
+ }
1150
+ ],
1151
+ "md5sum": "15f42972b33f6cadbc0b5a17e20ac9b6"
1152
+ },
1153
+ {
1154
+ "dataPath": "params_shard_35.bin",
1155
+ "format": "raw-shard",
1156
+ "nbytes": 31465472,
1157
+ "records": [
1158
+ {
1159
+ "name": "model.layers.13.self_attn.o_proj.weight",
1160
+ "shape": [
1161
+ 2048,
1162
+ 2048
1163
+ ],
1164
+ "dtype": "float32",
1165
+ "format": "f32-to-bf16",
1166
+ "nbytes": 8388608,
1167
+ "byteOffset": 0
1168
+ },
1169
+ {
1170
+ "name": "model.layers.13.mlp.down_proj.weight",
1171
+ "shape": [
1172
+ 2048,
1173
+ 5632
1174
+ ],
1175
+ "dtype": "float32",
1176
+ "format": "f32-to-bf16",
1177
+ "nbytes": 23068672,
1178
+ "byteOffset": 8388608
1179
+ },
1180
+ {
1181
+ "name": "model.layers.13.input_layernorm.weight",
1182
+ "shape": [
1183
+ 2048
1184
+ ],
1185
+ "dtype": "float32",
1186
+ "format": "f32-to-bf16",
1187
+ "nbytes": 4096,
1188
+ "byteOffset": 31457280
1189
+ },
1190
+ {
1191
+ "name": "model.layers.13.post_attention_layernorm.weight",
1192
+ "shape": [
1193
+ 2048
1194
+ ],
1195
+ "dtype": "float32",
1196
+ "format": "f32-to-bf16",
1197
+ "nbytes": 4096,
1198
+ "byteOffset": 31461376
1199
+ }
1200
+ ],
1201
+ "md5sum": "82ec8f8938814a4e75058620d84285c7"
1202
+ },
1203
+ {
1204
+ "dataPath": "params_shard_36.bin",
1205
+ "format": "raw-shard",
1206
+ "nbytes": 46137344,
1207
+ "records": [
1208
+ {
1209
+ "name": "model.layers.14.mlp.gate_up_proj.weight",
1210
+ "shape": [
1211
+ 11264,
1212
+ 2048
1213
+ ],
1214
+ "dtype": "float32",
1215
+ "format": "f32-to-bf16",
1216
+ "nbytes": 46137344,
1217
+ "byteOffset": 0
1218
+ }
1219
+ ],
1220
+ "md5sum": "ccd48809d794270ecb6929d9547aea97"
1221
+ },
1222
+ {
1223
+ "dataPath": "params_shard_37.bin",
1224
+ "format": "raw-shard",
1225
+ "nbytes": 23068672,
1226
+ "records": [
1227
+ {
1228
+ "name": "model.layers.14.mlp.down_proj.weight",
1229
+ "shape": [
1230
+ 2048,
1231
+ 5632
1232
+ ],
1233
+ "dtype": "float32",
1234
+ "format": "f32-to-bf16",
1235
+ "nbytes": 23068672,
1236
+ "byteOffset": 0
1237
+ }
1238
+ ],
1239
+ "md5sum": "d0ce7dab232e7305f3f243febbb4324c"
1240
+ },
1241
+ {
1242
+ "dataPath": "params_shard_38.bin",
1243
+ "format": "raw-shard",
1244
+ "nbytes": 29368320,
1245
+ "records": [
1246
+ {
1247
+ "name": "model.layers.14.self_attn.qkv_proj.weight",
1248
+ "shape": [
1249
+ 2560,
1250
+ 2048
1251
+ ],
1252
+ "dtype": "float32",
1253
+ "format": "f32-to-bf16",
1254
+ "nbytes": 10485760,
1255
+ "byteOffset": 0
1256
+ },
1257
+ {
1258
+ "name": "model.layers.14.self_attn.o_proj.weight",
1259
+ "shape": [
1260
+ 2048,
1261
+ 2048
1262
+ ],
1263
+ "dtype": "float32",
1264
+ "format": "f32-to-bf16",
1265
+ "nbytes": 8388608,
1266
+ "byteOffset": 10485760
1267
+ },
1268
+ {
1269
+ "name": "model.layers.14.input_layernorm.weight",
1270
+ "shape": [
1271
+ 2048
1272
+ ],
1273
+ "dtype": "float32",
1274
+ "format": "f32-to-bf16",
1275
+ "nbytes": 4096,
1276
+ "byteOffset": 18874368
1277
+ },
1278
+ {
1279
+ "name": "model.layers.14.post_attention_layernorm.weight",
1280
+ "shape": [
1281
+ 2048
1282
+ ],
1283
+ "dtype": "float32",
1284
+ "format": "f32-to-bf16",
1285
+ "nbytes": 4096,
1286
+ "byteOffset": 18878464
1287
+ },
1288
+ {
1289
+ "name": "model.layers.15.self_attn.qkv_proj.weight",
1290
+ "shape": [
1291
+ 2560,
1292
+ 2048
1293
+ ],
1294
+ "dtype": "float32",
1295
+ "format": "f32-to-bf16",
1296
+ "nbytes": 10485760,
1297
+ "byteOffset": 18882560
1298
+ }
1299
+ ],
1300
+ "md5sum": "0f007605b9207b32a5c79249505897ef"
1301
+ },
1302
+ {
1303
+ "dataPath": "params_shard_39.bin",
1304
+ "format": "raw-shard",
1305
+ "nbytes": 46137344,
1306
+ "records": [
1307
+ {
1308
+ "name": "model.layers.15.mlp.gate_up_proj.weight",
1309
+ "shape": [
1310
+ 11264,
1311
+ 2048
1312
+ ],
1313
+ "dtype": "float32",
1314
+ "format": "f32-to-bf16",
1315
+ "nbytes": 46137344,
1316
+ "byteOffset": 0
1317
+ }
1318
+ ],
1319
+ "md5sum": "e42d4c04fb873f035b307b92df26974e"
1320
+ },
1321
+ {
1322
+ "dataPath": "params_shard_40.bin",
1323
+ "format": "raw-shard",
1324
+ "nbytes": 31465472,
1325
+ "records": [
1326
+ {
1327
+ "name": "model.layers.15.self_attn.o_proj.weight",
1328
+ "shape": [
1329
+ 2048,
1330
+ 2048
1331
+ ],
1332
+ "dtype": "float32",
1333
+ "format": "f32-to-bf16",
1334
+ "nbytes": 8388608,
1335
+ "byteOffset": 0
1336
+ },
1337
+ {
1338
+ "name": "model.layers.15.mlp.down_proj.weight",
1339
+ "shape": [
1340
+ 2048,
1341
+ 5632
1342
+ ],
1343
+ "dtype": "float32",
1344
+ "format": "f32-to-bf16",
1345
+ "nbytes": 23068672,
1346
+ "byteOffset": 8388608
1347
+ },
1348
+ {
1349
+ "name": "model.layers.15.input_layernorm.weight",
1350
+ "shape": [
1351
+ 2048
1352
+ ],
1353
+ "dtype": "float32",
1354
+ "format": "f32-to-bf16",
1355
+ "nbytes": 4096,
1356
+ "byteOffset": 31457280
1357
+ },
1358
+ {
1359
+ "name": "model.layers.15.post_attention_layernorm.weight",
1360
+ "shape": [
1361
+ 2048
1362
+ ],
1363
+ "dtype": "float32",
1364
+ "format": "f32-to-bf16",
1365
+ "nbytes": 4096,
1366
+ "byteOffset": 31461376
1367
+ }
1368
+ ],
1369
+ "md5sum": "7168ff2801ea67e99469b52abb83f25d"
1370
+ },
1371
+ {
1372
+ "dataPath": "params_shard_41.bin",
1373
+ "format": "raw-shard",
1374
+ "nbytes": 46137344,
1375
+ "records": [
1376
+ {
1377
+ "name": "model.layers.16.mlp.gate_up_proj.weight",
1378
+ "shape": [
1379
+ 11264,
1380
+ 2048
1381
+ ],
1382
+ "dtype": "float32",
1383
+ "format": "f32-to-bf16",
1384
+ "nbytes": 46137344,
1385
+ "byteOffset": 0
1386
+ }
1387
+ ],
1388
+ "md5sum": "623e1a60bfb63c0d49297304aa0aed3e"
1389
+ },
1390
+ {
1391
+ "dataPath": "params_shard_42.bin",
1392
+ "format": "raw-shard",
1393
+ "nbytes": 23068672,
1394
+ "records": [
1395
+ {
1396
+ "name": "model.layers.16.mlp.down_proj.weight",
1397
+ "shape": [
1398
+ 2048,
1399
+ 5632
1400
+ ],
1401
+ "dtype": "float32",
1402
+ "format": "f32-to-bf16",
1403
+ "nbytes": 23068672,
1404
+ "byteOffset": 0
1405
+ }
1406
+ ],
1407
+ "md5sum": "3c4c8058c508560fd9f604e942dd3caa"
1408
+ },
1409
+ {
1410
+ "dataPath": "params_shard_43.bin",
1411
+ "format": "raw-shard",
1412
+ "nbytes": 29368320,
1413
+ "records": [
1414
+ {
1415
+ "name": "model.layers.16.self_attn.qkv_proj.weight",
1416
+ "shape": [
1417
+ 2560,
1418
+ 2048
1419
+ ],
1420
+ "dtype": "float32",
1421
+ "format": "f32-to-bf16",
1422
+ "nbytes": 10485760,
1423
+ "byteOffset": 0
1424
+ },
1425
+ {
1426
+ "name": "model.layers.16.self_attn.o_proj.weight",
1427
+ "shape": [
1428
+ 2048,
1429
+ 2048
1430
+ ],
1431
+ "dtype": "float32",
1432
+ "format": "f32-to-bf16",
1433
+ "nbytes": 8388608,
1434
+ "byteOffset": 10485760
1435
+ },
1436
+ {
1437
+ "name": "model.layers.16.input_layernorm.weight",
1438
+ "shape": [
1439
+ 2048
1440
+ ],
1441
+ "dtype": "float32",
1442
+ "format": "f32-to-bf16",
1443
+ "nbytes": 4096,
1444
+ "byteOffset": 18874368
1445
+ },
1446
+ {
1447
+ "name": "model.layers.16.post_attention_layernorm.weight",
1448
+ "shape": [
1449
+ 2048
1450
+ ],
1451
+ "dtype": "float32",
1452
+ "format": "f32-to-bf16",
1453
+ "nbytes": 4096,
1454
+ "byteOffset": 18878464
1455
+ },
1456
+ {
1457
+ "name": "model.layers.17.self_attn.qkv_proj.weight",
1458
+ "shape": [
1459
+ 2560,
1460
+ 2048
1461
+ ],
1462
+ "dtype": "float32",
1463
+ "format": "f32-to-bf16",
1464
+ "nbytes": 10485760,
1465
+ "byteOffset": 18882560
1466
+ }
1467
+ ],
1468
+ "md5sum": "80dec21b661afcd092d85dcde0f0ec8a"
1469
+ },
1470
+ {
1471
+ "dataPath": "params_shard_44.bin",
1472
+ "format": "raw-shard",
1473
+ "nbytes": 46137344,
1474
+ "records": [
1475
+ {
1476
+ "name": "model.layers.17.mlp.gate_up_proj.weight",
1477
+ "shape": [
1478
+ 11264,
1479
+ 2048
1480
+ ],
1481
+ "dtype": "float32",
1482
+ "format": "f32-to-bf16",
1483
+ "nbytes": 46137344,
1484
+ "byteOffset": 0
1485
+ }
1486
+ ],
1487
+ "md5sum": "9242db19ee58801b87f6cdd63fd1139b"
1488
+ },
1489
+ {
1490
+ "dataPath": "params_shard_45.bin",
1491
+ "format": "raw-shard",
1492
+ "nbytes": 31465472,
1493
+ "records": [
1494
+ {
1495
+ "name": "model.layers.17.self_attn.o_proj.weight",
1496
+ "shape": [
1497
+ 2048,
1498
+ 2048
1499
+ ],
1500
+ "dtype": "float32",
1501
+ "format": "f32-to-bf16",
1502
+ "nbytes": 8388608,
1503
+ "byteOffset": 0
1504
+ },
1505
+ {
1506
+ "name": "model.layers.17.mlp.down_proj.weight",
1507
+ "shape": [
1508
+ 2048,
1509
+ 5632
1510
+ ],
1511
+ "dtype": "float32",
1512
+ "format": "f32-to-bf16",
1513
+ "nbytes": 23068672,
1514
+ "byteOffset": 8388608
1515
+ },
1516
+ {
1517
+ "name": "model.layers.17.input_layernorm.weight",
1518
+ "shape": [
1519
+ 2048
1520
+ ],
1521
+ "dtype": "float32",
1522
+ "format": "f32-to-bf16",
1523
+ "nbytes": 4096,
1524
+ "byteOffset": 31457280
1525
+ },
1526
+ {
1527
+ "name": "model.layers.17.post_attention_layernorm.weight",
1528
+ "shape": [
1529
+ 2048
1530
+ ],
1531
+ "dtype": "float32",
1532
+ "format": "f32-to-bf16",
1533
+ "nbytes": 4096,
1534
+ "byteOffset": 31461376
1535
+ }
1536
+ ],
1537
+ "md5sum": "f83f0c6e381c128e2cbdc59dfd54fa47"
1538
+ },
1539
+ {
1540
+ "dataPath": "params_shard_46.bin",
1541
+ "format": "raw-shard",
1542
+ "nbytes": 46137344,
1543
+ "records": [
1544
+ {
1545
+ "name": "model.layers.18.mlp.gate_up_proj.weight",
1546
+ "shape": [
1547
+ 11264,
1548
+ 2048
1549
+ ],
1550
+ "dtype": "float32",
1551
+ "format": "f32-to-bf16",
1552
+ "nbytes": 46137344,
1553
+ "byteOffset": 0
1554
+ }
1555
+ ],
1556
+ "md5sum": "2663610937313216a42645bc74895a87"
1557
+ },
1558
+ {
1559
+ "dataPath": "params_shard_47.bin",
1560
+ "format": "raw-shard",
1561
+ "nbytes": 23068672,
1562
+ "records": [
1563
+ {
1564
+ "name": "model.layers.18.mlp.down_proj.weight",
1565
+ "shape": [
1566
+ 2048,
1567
+ 5632
1568
+ ],
1569
+ "dtype": "float32",
1570
+ "format": "f32-to-bf16",
1571
+ "nbytes": 23068672,
1572
+ "byteOffset": 0
1573
+ }
1574
+ ],
1575
+ "md5sum": "973defd93492a1311559695a1c45a149"
1576
+ },
1577
+ {
1578
+ "dataPath": "params_shard_48.bin",
1579
+ "format": "raw-shard",
1580
+ "nbytes": 29368320,
1581
+ "records": [
1582
+ {
1583
+ "name": "model.layers.18.self_attn.qkv_proj.weight",
1584
+ "shape": [
1585
+ 2560,
1586
+ 2048
1587
+ ],
1588
+ "dtype": "float32",
1589
+ "format": "f32-to-bf16",
1590
+ "nbytes": 10485760,
1591
+ "byteOffset": 0
1592
+ },
1593
+ {
1594
+ "name": "model.layers.18.self_attn.o_proj.weight",
1595
+ "shape": [
1596
+ 2048,
1597
+ 2048
1598
+ ],
1599
+ "dtype": "float32",
1600
+ "format": "f32-to-bf16",
1601
+ "nbytes": 8388608,
1602
+ "byteOffset": 10485760
1603
+ },
1604
+ {
1605
+ "name": "model.layers.18.input_layernorm.weight",
1606
+ "shape": [
1607
+ 2048
1608
+ ],
1609
+ "dtype": "float32",
1610
+ "format": "f32-to-bf16",
1611
+ "nbytes": 4096,
1612
+ "byteOffset": 18874368
1613
+ },
1614
+ {
1615
+ "name": "model.layers.18.post_attention_layernorm.weight",
1616
+ "shape": [
1617
+ 2048
1618
+ ],
1619
+ "dtype": "float32",
1620
+ "format": "f32-to-bf16",
1621
+ "nbytes": 4096,
1622
+ "byteOffset": 18878464
1623
+ },
1624
+ {
1625
+ "name": "model.layers.19.self_attn.qkv_proj.weight",
1626
+ "shape": [
1627
+ 2560,
1628
+ 2048
1629
+ ],
1630
+ "dtype": "float32",
1631
+ "format": "f32-to-bf16",
1632
+ "nbytes": 10485760,
1633
+ "byteOffset": 18882560
1634
+ }
1635
+ ],
1636
+ "md5sum": "a3274aef0ddf44dbd215f1588d7a8bfe"
1637
+ },
1638
+ {
1639
+ "dataPath": "params_shard_49.bin",
1640
+ "format": "raw-shard",
1641
+ "nbytes": 46137344,
1642
+ "records": [
1643
+ {
1644
+ "name": "model.layers.19.mlp.gate_up_proj.weight",
1645
+ "shape": [
1646
+ 11264,
1647
+ 2048
1648
+ ],
1649
+ "dtype": "float32",
1650
+ "format": "f32-to-bf16",
1651
+ "nbytes": 46137344,
1652
+ "byteOffset": 0
1653
+ }
1654
+ ],
1655
+ "md5sum": "d8b408bc677af3eb67cca3d5f31151e2"
1656
+ },
1657
+ {
1658
+ "dataPath": "params_shard_50.bin",
1659
+ "format": "raw-shard",
1660
+ "nbytes": 31465472,
1661
+ "records": [
1662
+ {
1663
+ "name": "model.layers.19.self_attn.o_proj.weight",
1664
+ "shape": [
1665
+ 2048,
1666
+ 2048
1667
+ ],
1668
+ "dtype": "float32",
1669
+ "format": "f32-to-bf16",
1670
+ "nbytes": 8388608,
1671
+ "byteOffset": 0
1672
+ },
1673
+ {
1674
+ "name": "model.layers.19.mlp.down_proj.weight",
1675
+ "shape": [
1676
+ 2048,
1677
+ 5632
1678
+ ],
1679
+ "dtype": "float32",
1680
+ "format": "f32-to-bf16",
1681
+ "nbytes": 23068672,
1682
+ "byteOffset": 8388608
1683
+ },
1684
+ {
1685
+ "name": "model.layers.19.input_layernorm.weight",
1686
+ "shape": [
1687
+ 2048
1688
+ ],
1689
+ "dtype": "float32",
1690
+ "format": "f32-to-bf16",
1691
+ "nbytes": 4096,
1692
+ "byteOffset": 31457280
1693
+ },
1694
+ {
1695
+ "name": "model.layers.19.post_attention_layernorm.weight",
1696
+ "shape": [
1697
+ 2048
1698
+ ],
1699
+ "dtype": "float32",
1700
+ "format": "f32-to-bf16",
1701
+ "nbytes": 4096,
1702
+ "byteOffset": 31461376
1703
+ }
1704
+ ],
1705
+ "md5sum": "0ea6e376f976c09ec4c316553e5fe49f"
1706
+ },
1707
+ {
1708
+ "dataPath": "params_shard_51.bin",
1709
+ "format": "raw-shard",
1710
+ "nbytes": 46137344,
1711
+ "records": [
1712
+ {
1713
+ "name": "model.layers.20.mlp.gate_up_proj.weight",
1714
+ "shape": [
1715
+ 11264,
1716
+ 2048
1717
+ ],
1718
+ "dtype": "float32",
1719
+ "format": "f32-to-bf16",
1720
+ "nbytes": 46137344,
1721
+ "byteOffset": 0
1722
+ }
1723
+ ],
1724
+ "md5sum": "c433565a3056b5b776c65a1536c565bc"
1725
+ },
1726
+ {
1727
+ "dataPath": "params_shard_52.bin",
1728
+ "format": "raw-shard",
1729
+ "nbytes": 23068672,
1730
+ "records": [
1731
+ {
1732
+ "name": "model.layers.20.mlp.down_proj.weight",
1733
+ "shape": [
1734
+ 2048,
1735
+ 5632
1736
+ ],
1737
+ "dtype": "float32",
1738
+ "format": "f32-to-bf16",
1739
+ "nbytes": 23068672,
1740
+ "byteOffset": 0
1741
+ }
1742
+ ],
1743
+ "md5sum": "1b51d01988351e7bba03eb34e46a2585"
1744
+ },
1745
+ {
1746
+ "dataPath": "params_shard_53.bin",
1747
+ "format": "raw-shard",
1748
+ "nbytes": 29368320,
1749
+ "records": [
1750
+ {
1751
+ "name": "model.layers.20.self_attn.qkv_proj.weight",
1752
+ "shape": [
1753
+ 2560,
1754
+ 2048
1755
+ ],
1756
+ "dtype": "float32",
1757
+ "format": "f32-to-bf16",
1758
+ "nbytes": 10485760,
1759
+ "byteOffset": 0
1760
+ },
1761
+ {
1762
+ "name": "model.layers.20.self_attn.o_proj.weight",
1763
+ "shape": [
1764
+ 2048,
1765
+ 2048
1766
+ ],
1767
+ "dtype": "float32",
1768
+ "format": "f32-to-bf16",
1769
+ "nbytes": 8388608,
1770
+ "byteOffset": 10485760
1771
+ },
1772
+ {
1773
+ "name": "model.layers.20.input_layernorm.weight",
1774
+ "shape": [
1775
+ 2048
1776
+ ],
1777
+ "dtype": "float32",
1778
+ "format": "f32-to-bf16",
1779
+ "nbytes": 4096,
1780
+ "byteOffset": 18874368
1781
+ },
1782
+ {
1783
+ "name": "model.layers.20.post_attention_layernorm.weight",
1784
+ "shape": [
1785
+ 2048
1786
+ ],
1787
+ "dtype": "float32",
1788
+ "format": "f32-to-bf16",
1789
+ "nbytes": 4096,
1790
+ "byteOffset": 18878464
1791
+ },
1792
+ {
1793
+ "name": "model.layers.21.self_attn.qkv_proj.weight",
1794
+ "shape": [
1795
+ 2560,
1796
+ 2048
1797
+ ],
1798
+ "dtype": "float32",
1799
+ "format": "f32-to-bf16",
1800
+ "nbytes": 10485760,
1801
+ "byteOffset": 18882560
1802
+ }
1803
+ ],
1804
+ "md5sum": "65304737b5497527ce7b0d2377866424"
1805
+ },
1806
+ {
1807
+ "dataPath": "params_shard_54.bin",
1808
+ "format": "raw-shard",
1809
+ "nbytes": 46137344,
1810
+ "records": [
1811
+ {
1812
+ "name": "model.layers.21.mlp.gate_up_proj.weight",
1813
+ "shape": [
1814
+ 11264,
1815
+ 2048
1816
+ ],
1817
+ "dtype": "float32",
1818
+ "format": "f32-to-bf16",
1819
+ "nbytes": 46137344,
1820
+ "byteOffset": 0
1821
+ }
1822
+ ],
1823
+ "md5sum": "6e5cd3b68492da8c01ef96dc86faa8a7"
1824
+ },
1825
+ {
1826
+ "dataPath": "params_shard_55.bin",
1827
+ "format": "raw-shard",
1828
+ "nbytes": 131084288,
1829
+ "records": [
1830
+ {
1831
+ "name": "lm_head.weight",
1832
+ "shape": [
1833
+ 32003,
1834
+ 2048
1835
+ ],
1836
+ "dtype": "float32",
1837
+ "format": "f32-to-bf16",
1838
+ "nbytes": 131084288,
1839
+ "byteOffset": 0
1840
+ }
1841
+ ],
1842
+ "md5sum": "cbafe195db523a0e9302dd15bde0f529"
1843
+ },
1844
+ {
1845
+ "dataPath": "params_shard_56.bin",
1846
+ "format": "raw-shard",
1847
+ "nbytes": 31469568,
1848
+ "records": [
1849
+ {
1850
+ "name": "model.layers.21.self_attn.o_proj.weight",
1851
+ "shape": [
1852
+ 2048,
1853
+ 2048
1854
+ ],
1855
+ "dtype": "float32",
1856
+ "format": "f32-to-bf16",
1857
+ "nbytes": 8388608,
1858
+ "byteOffset": 0
1859
+ },
1860
+ {
1861
+ "name": "model.layers.21.mlp.down_proj.weight",
1862
+ "shape": [
1863
+ 2048,
1864
+ 5632
1865
+ ],
1866
+ "dtype": "float32",
1867
+ "format": "f32-to-bf16",
1868
+ "nbytes": 23068672,
1869
+ "byteOffset": 8388608
1870
+ },
1871
+ {
1872
+ "name": "model.layers.21.input_layernorm.weight",
1873
+ "shape": [
1874
+ 2048
1875
+ ],
1876
+ "dtype": "float32",
1877
+ "format": "f32-to-bf16",
1878
+ "nbytes": 4096,
1879
+ "byteOffset": 31457280
1880
+ },
1881
+ {
1882
+ "name": "model.layers.21.post_attention_layernorm.weight",
1883
+ "shape": [
1884
+ 2048
1885
+ ],
1886
+ "dtype": "float32",
1887
+ "format": "f32-to-bf16",
1888
+ "nbytes": 4096,
1889
+ "byteOffset": 31461376
1890
+ },
1891
+ {
1892
+ "name": "model.norm.weight",
1893
+ "shape": [
1894
+ 2048
1895
+ ],
1896
+ "dtype": "float32",
1897
+ "format": "f32-to-bf16",
1898
+ "nbytes": 4096,
1899
+ "byteOffset": 31465472
1900
+ }
1901
+ ],
1902
+ "md5sum": "2d491aad8e6aa64a244bca58e8049b34"
1903
+ }
1904
+ ]
1905
+ }
params_shard_0.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b64216c4ab368291a7ab47e9f5211ac9d68486e29dfbdcb8ba34091f4d1b006e
3
+ size 131084288
params_shard_1.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:73ea86147062593b4d95f7540aa7588cf51ff0640dcea9264e5dca10c327be1e
3
+ size 46137344
params_shard_10.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4e5a0b89123380d745ecd28896600eac1c574d2f3aee10ca7d1f11c3390fe57c
3
+ size 31465472
params_shard_11.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4400f88089658e8ac6b839723b7dd06987b9610fcd9877769eafecd3838844e0
3
+ size 46137344
params_shard_12.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fdb5deada8e4efef28d789639e85b3c714b6127905e3c925a00be38f40be4e81
3
+ size 23068672
params_shard_13.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a2c067fd146b7d563c276b3b567255be0fd67ba00dbd96af8fdf95cc64d37db9
3
+ size 29368320
params_shard_14.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ab62095d41757b0a5d1ea1dbc401889051f117e551e6ae09b5c0298be6ca36cf
3
+ size 46137344
params_shard_15.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f1e46f0477819fad33250344f813605eebacb33b8bb7c1f582e0e3e2aa67ffdf
3
+ size 31465472
params_shard_16.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bab67c75bb9a84ef0ba1ea3e7cbb412b567c26b443d6f03a7ecb36e3689540d5
3
+ size 46137344
params_shard_17.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e972a9690d6a54c8b859987e54985cafe5f6c76e0ef4247378a4a6023be64f8b
3
+ size 23068672
params_shard_18.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a5319cd789970474e1a562d32913dbd2850fe357daecc789a6f40184f491578c
3
+ size 29368320
params_shard_19.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f15296df5b298e7c87ec515105c67f00fb442d71c77d8625aa67a45957c47b30
3
+ size 46137344
params_shard_2.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b9cbf7590a55bd755e9132e351c89c45398425258ab11947d803e72e106b315a
3
+ size 23068672
params_shard_20.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7b95eedadf55b0a62b0b62ca8da7274b58ea5232486126e5b00de484d45900cf
3
+ size 31465472
params_shard_21.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:05a2b5fa95d6481fc8e4a55b458817e2c54d82a29353c7a45eb581bca8b7215b
3
+ size 46137344
params_shard_22.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ecf94df86f1426b56cd1645da76abe694de9da1f60723fb863faff5eb72b2f60
3
+ size 23068672
params_shard_23.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f537444774fc29b80486870eec88e9e47dd5d3a2ab20db0276c623a69a670b84
3
+ size 29368320
params_shard_24.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2291b90b62aa3df8b763ce79b3aa869a5c609f50fc1e02a5821c0f55602b0314
3
+ size 46137344
params_shard_25.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:59b274b0ca468a7222b49caa5e3b95f050e530418c6061769fd83415f153942b
3
+ size 31465472
params_shard_26.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5a1fe19a3632118c519ec85f3c8512bab0af0634e00b1a8aed8a286e4f2f5ef5
3
+ size 46137344
params_shard_27.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ff404cbb772b1d59ace8b3bca0200816f97fb4852c22e236b37f058a47710d36
3
+ size 23068672
params_shard_28.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a9d05de03626afb1854c3d7b1ae580d6d91b757d0f844d306f967cdfd4e76354
3
+ size 29368320
params_shard_29.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:93258d481a3a73b5c75e2d56869666b2b01ac79e8036171d7390d241ee341fd2
3
+ size 46137344
params_shard_3.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b6dc17e438204399927f3a491ec961fd37b0a62aec85befef82c287457a9fe28
3
+ size 29368320
params_shard_30.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:047f8566d04a2e47885e5f7c6b4106ce6d743a35738cf26173d92335ebce8d94
3
+ size 31465472
params_shard_31.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9f95e934ffeab6a8728ee7e68d3c3d65c4c861e0ade1253d8a0d8fbfcf662261
3
+ size 46137344
params_shard_32.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:22c15357d0b9b821aae8d5131bcbddc3cf74948d264561f470a40b131a45626f
3
+ size 23068672
params_shard_33.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:02d1593b796b7f65a145e970cb9fe1742eeb9266885c95041cbf2c3cc0b6474f
3
+ size 29368320
params_shard_34.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2ed65f40574fbe77272a21f2fd19e910628a8cc6f964170cc84a5516b49c9d1b
3
+ size 46137344
params_shard_35.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e3f7f72b6c9835dcb7c71ad001ce4f403404cda11ffc0ac039cb1dee8d609ba9
3
+ size 31465472
params_shard_36.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2c1dc611a53db189f3f0be955f725898dc834285185b992a8e67b77870d70784
3
+ size 46137344
params_shard_37.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0f41f392eaa0926b54b097c037f663df8eae00334e7c8880df56565655c06a01
3
+ size 23068672
params_shard_38.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fa6f356f8a92db05cb3972599a17009d7de26b37fea9d27f0d9445327440aee8
3
+ size 29368320
params_shard_39.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2e8a1ad6733f27f0a4e6ba1811d616f9ec7614ca3dd84cf62230e48424971671
3
+ size 46137344
params_shard_4.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:54d2c150809ef991171e11d451d8d43223880ba1e0211efa27b96576a6af97ce
3
+ size 46137344
params_shard_40.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1b1528e6e3387906a70102369b4c9ece1bc1c2bf95cc7397d9e6de85bcc8298c
3
+ size 31465472
params_shard_41.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5d8c385de2c5c8c71bd0b31804fa0dc233946b16a337ea0b9044cce94351dbf0
3
+ size 46137344
params_shard_42.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6fcbf2a1076f63542730440890df93d5336dd3d2afd9afedb1199f847dcb4633
3
+ size 23068672
params_shard_43.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4dbe333e617bcaafbced2946ac86c44b88046abf2b3976906558a7b860f09b54
3
+ size 29368320
params_shard_44.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:74594103d09c5771c2a1f8ba9f609cc812e5eacf61c193ccd49de1a62e723591
3
+ size 46137344
params_shard_45.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7c1ab7882df270f3b265a4aa11bf2f72096552ca82beb5ad7684d2b699232193
3
+ size 31465472
params_shard_46.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d30e4d5a53fc35a1e9796d1bdcdf604ebc57b470d2123fddf09c0e91a1c40ed4
3
+ size 46137344
params_shard_47.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9035b1ac0fdaf96688d0490d64c41e8f340e223aa339a74ca1a9e3f833dd4f16
3
+ size 23068672
params_shard_48.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:56cdd4da796567e9aa48f972d601881cdffb1f4a705a1817bbb5c2920daec955
3
+ size 29368320
params_shard_49.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9d4d4988494e6ac5327cbee17d7957532f81d61809f6a19cff396110a16ec9b7
3
+ size 46137344