File size: 248,091 Bytes
dd362ee
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
/home/cfruan/.conda/envs/mlc-source-311/bin/python -m mlc_chat gen_config /ssd1/cfruan/mlc-llm-repos/mlc-llm-head/dist/models/phi-1_5 --quantization q4f16_1 --conv-template phi-2 --output /tmp/tmpxe445xtc
[2023-12-28 23:33:19] INFO auto_config.py:115: Found model configuration: /ssd1/cfruan/mlc-llm-repos/mlc-llm-head/dist/models/phi-1_5/config.json
[2023-12-28 23:33:19] INFO auto_config.py:151: Found model type: phi-msft. Use `--model-type` to override.
[2023-12-28 23:33:19] INFO phi_model.py:59: context_window_size not found in config.json. Falling back to n_positions (2048)
[2023-12-28 23:33:19] INFO gen_config.py:129: Not found tokenizer config: /ssd1/cfruan/mlc-llm-repos/mlc-llm-head/dist/models/phi-1_5/tokenizer.model
[2023-12-28 23:33:19] INFO gen_config.py:127: Found tokenizer config: /ssd1/cfruan/mlc-llm-repos/mlc-llm-head/dist/models/phi-1_5/tokenizer.json. Copying to /tmp/tmpxe445xtc/tokenizer.json
[2023-12-28 23:33:19] INFO gen_config.py:127: Found tokenizer config: /ssd1/cfruan/mlc-llm-repos/mlc-llm-head/dist/models/phi-1_5/vocab.json. Copying to /tmp/tmpxe445xtc/vocab.json
[2023-12-28 23:33:19] INFO gen_config.py:127: Found tokenizer config: /ssd1/cfruan/mlc-llm-repos/mlc-llm-head/dist/models/phi-1_5/merges.txt. Copying to /tmp/tmpxe445xtc/merges.txt
[2023-12-28 23:33:19] INFO gen_config.py:127: Found tokenizer config: /ssd1/cfruan/mlc-llm-repos/mlc-llm-head/dist/models/phi-1_5/added_tokens.json. Copying to /tmp/tmpxe445xtc/added_tokens.json
[2023-12-28 23:33:19] INFO gen_config.py:127: Found tokenizer config: /ssd1/cfruan/mlc-llm-repos/mlc-llm-head/dist/models/phi-1_5/tokenizer_config.json. Copying to /tmp/tmpxe445xtc/tokenizer_config.json
[2023-12-28 23:33:19] INFO gen_config.py:69: [System default] Setting pad_token_id: 0
[2023-12-28 23:33:19] INFO gen_config.py:69: [System default] Setting bos_token_id: 1
[2023-12-28 23:33:19] INFO gen_config.py:69: [System default] Setting eos_token_id: 2
[2023-12-28 23:33:19] INFO gen_config.py:69: [System default] Setting temperature: 0.7
[2023-12-28 23:33:19] INFO gen_config.py:69: [System default] Setting repetition_penalty: 1.0
[2023-12-28 23:33:19] INFO gen_config.py:69: [System default] Setting top_p: 0.95
[2023-12-28 23:33:19] INFO gen_config.py:69: [System default] Setting mean_gen_len: 128
[2023-12-28 23:33:19] INFO gen_config.py:69: [System default] Setting max_gen_len: 512
[2023-12-28 23:33:19] INFO gen_config.py:69: [System default] Setting shift_fill_factor: 0.3
[2023-12-28 23:33:19] INFO gen_config.py:157: Dumping configuration file to: /tmp/tmpxe445xtc/mlc-chat-config.json
/home/cfruan/.conda/envs/mlc-source-311/bin/python -m mlc_chat convert_weight /ssd1/cfruan/mlc-llm-repos/mlc-llm-head/dist/models/phi-1_5 --quantization q4f16_1 --source-format auto --output /tmp/tmpxe445xtc
[2023-12-28 23:33:20] INFO auto_config.py:115: Found model configuration: /ssd1/cfruan/mlc-llm-repos/mlc-llm-head/dist/models/phi-1_5/config.json
[2023-12-28 23:33:20] INFO auto_device.py:76: Found device: cuda:0
[2023-12-28 23:33:20] INFO auto_device.py:76: Found device: cuda:1
[2023-12-28 23:33:20] INFO auto_device.py:85: Not found device: rocm:0
[2023-12-28 23:33:20] INFO auto_device.py:85: Not found device: metal:0
[2023-12-28 23:33:21] INFO auto_device.py:76: Found device: vulkan:0
[2023-12-28 23:33:21] INFO auto_device.py:76: Found device: vulkan:1
[2023-12-28 23:33:21] INFO auto_device.py:76: Found device: vulkan:2
[2023-12-28 23:33:21] INFO auto_device.py:85: Not found device: opencl:0
[2023-12-28 23:33:21] INFO auto_device.py:33: Using device: cuda:0
[2023-12-28 23:33:21] INFO auto_weight.py:70: Finding weights in: /ssd1/cfruan/mlc-llm-repos/mlc-llm-head/dist/models/phi-1_5
[2023-12-28 23:33:21] INFO auto_weight.py:129: Found source weight format: huggingface-torch. Source configuration: /ssd1/cfruan/mlc-llm-repos/mlc-llm-head/dist/models/phi-1_5/pytorch_model.bin
[2023-12-28 23:33:21] INFO auto_weight.py:149: Not found Huggingface Safetensor
[2023-12-28 23:33:21] INFO auto_weight.py:106: Using source weight configuration: /ssd1/cfruan/mlc-llm-repos/mlc-llm-head/dist/models/phi-1_5/pytorch_model.bin. Use `--source` to override.
[2023-12-28 23:33:21] INFO auto_weight.py:110: Using source weight format: huggingface-torch. Use `--source-format` to override.
[2023-12-28 23:33:21] INFO auto_config.py:151: Found model type: phi-msft. Use `--model-type` to override.
[2023-12-28 23:33:21] INFO phi_model.py:59: context_window_size not found in config.json. Falling back to n_positions (2048)
[2023-12-28 23:33:24] INFO huggingface_loader.py:169: Loading HF parameters from: /ssd1/cfruan/mlc-llm-repos/mlc-llm-head/dist/models/phi-1_5/pytorch_model.bin
Weight conversion with arguments:
  --config          /ssd1/cfruan/mlc-llm-repos/mlc-llm-head/dist/models/phi-1_5/config.json
  --quantization    GroupQuantize(name='q4f16_1', kind='group-quant', group_size=32, quantize_dtype='int4', storage_dtype='uint32', model_dtype='float16', num_elem_per_storage=8, num_storage_per_group=4, max_int_value=7)
  --model-type      phi-msft
  --device          cuda:0
  --source          /ssd1/cfruan/mlc-llm-repos/mlc-llm-head/dist/models/phi-1_5/pytorch_model.bin
  --source-format   huggingface-torch
  --output          /tmp/tmpxe445xtc

  0%|                                                                                                                                 | 0/245 [00:00<?, ?it/s]
                                                                                                                                                              
[2023-12-28 23:33:25] INFO group_quantization.py:200: Compiling quantize function for key: (51200, 2048, 'float16', 'cuda')

  0%|                                                                                                                                 | 0/245 [00:00<?, ?it/s]
                                                                                                                                                              
[2023-12-28 23:33:26] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.embd.q_weight", shape: (51200, 256), dtype: uint32

  0%|                                                                                                                                 | 0/245 [00:01<?, ?it/s]
                                                                                                                                                              
[2023-12-28 23:33:26] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.embd.q_scale", shape: (51200, 64), dtype: float16

  0%|                                                                                                                                 | 0/245 [00:01<?, ?it/s]
  0%|▍                                                                                                                        | 1/245 [00:01<04:43,  1.16s/it]
                                                                                                                                                              
[2023-12-28 23:33:26] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.0.ln.weight", shape: (2048,), dtype: float16

  0%|▍                                                                                                                        | 1/245 [00:01<04:43,  1.16s/it]
                                                                                                                                                              
[2023-12-28 23:33:26] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.0.ln.bias", shape: (2048,), dtype: float16

  0%|▍                                                                                                                        | 1/245 [00:01<04:43,  1.16s/it]
                                                                                                                                                              
[2023-12-28 23:33:26] INFO group_quantization.py:200: Compiling quantize function for key: (6144, 2048, 'float16', 'cuda')

  0%|▍                                                                                                                        | 1/245 [00:01<04:43,  1.16s/it]
                                                                                                                                                              
[2023-12-28 23:33:27] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.0.mixer.Wqkv.q_weight", shape: (6144, 256), dtype: uint32

  0%|▍                                                                                                                        | 1/245 [00:01<04:43,  1.16s/it]
                                                                                                                                                              
[2023-12-28 23:33:27] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.0.mixer.Wqkv.q_scale", shape: (6144, 64), dtype: float16

  0%|▍                                                                                                                        | 1/245 [00:01<04:43,  1.16s/it]
  2%|β–ˆβ–‰                                                                                                                       | 4/245 [00:01<01:17,  3.10it/s]
                                                                                                                                                              
[2023-12-28 23:33:27] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.0.mixer.Wqkv.bias", shape: (6144,), dtype: float16

  2%|β–ˆβ–‰                                                                                                                       | 4/245 [00:01<01:17,  3.10it/s]
                                                                                                                                                              
[2023-12-28 23:33:27] INFO group_quantization.py:200: Compiling quantize function for key: (2048, 2048, 'float16', 'cuda')

  2%|β–ˆβ–‰                                                                                                                       | 4/245 [00:01<01:17,  3.10it/s]
                                                                                                                                                              
[2023-12-28 23:33:27] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.0.mixer.out_proj.q_weight", shape: (2048, 256), dtype: uint32

  2%|β–ˆβ–‰                                                                                                                       | 4/245 [00:01<01:17,  3.10it/s]
                                                                                                                                                              
[2023-12-28 23:33:27] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.0.mixer.out_proj.q_scale", shape: (2048, 64), dtype: float16

  2%|β–ˆβ–‰                                                                                                                       | 4/245 [00:01<01:17,  3.10it/s]
  2%|β–ˆβ–ˆβ–‰                                                                                                                      | 6/245 [00:01<01:02,  3.83it/s]
                                                                                                                                                              
[2023-12-28 23:33:27] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.0.mixer.out_proj.bias", shape: (2048,), dtype: float16

  2%|β–ˆβ–ˆβ–‰                                                                                                                      | 6/245 [00:01<01:02,  3.83it/s]
                                                                                                                                                              
[2023-12-28 23:33:27] INFO group_quantization.py:200: Compiling quantize function for key: (8192, 2048, 'float16', 'cuda')

  2%|β–ˆβ–ˆβ–‰                                                                                                                      | 6/245 [00:01<01:02,  3.83it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.0.mlp.fc1.q_weight", shape: (8192, 256), dtype: uint32

  2%|β–ˆβ–ˆβ–‰                                                                                                                      | 6/245 [00:02<01:02,  3.83it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.0.mlp.fc1.q_scale", shape: (8192, 64), dtype: float16

  2%|β–ˆβ–ˆβ–‰                                                                                                                      | 6/245 [00:02<01:02,  3.83it/s]
  3%|β–ˆβ–ˆβ–ˆβ–‰                                                                                                                     | 8/245 [00:02<00:55,  4.29it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.0.mlp.fc1.bias", shape: (8192,), dtype: float16

  3%|β–ˆβ–ˆβ–ˆβ–‰                                                                                                                     | 8/245 [00:02<00:55,  4.29it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO group_quantization.py:200: Compiling quantize function for key: (2048, 8192, 'float16', 'cuda')

  3%|β–ˆβ–ˆβ–ˆβ–‰                                                                                                                     | 8/245 [00:02<00:55,  4.29it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.0.mlp.fc2.q_weight", shape: (2048, 1024), dtype: uint32

  3%|β–ˆβ–ˆβ–ˆβ–‰                                                                                                                     | 8/245 [00:02<00:55,  4.29it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.0.mlp.fc2.q_scale", shape: (2048, 256), dtype: float16

  3%|β–ˆβ–ˆβ–ˆβ–‰                                                                                                                     | 8/245 [00:02<00:55,  4.29it/s]
  4%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰                                                                                                                   | 10/245 [00:02<00:51,  4.59it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.0.mlp.fc2.bias", shape: (2048,), dtype: float16

  4%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰                                                                                                                   | 10/245 [00:02<00:51,  4.59it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.1.ln.weight", shape: (2048,), dtype: float16

  4%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰                                                                                                                   | 10/245 [00:02<00:51,  4.59it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.1.ln.bias", shape: (2048,), dtype: float16

  4%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰                                                                                                                   | 10/245 [00:02<00:51,  4.59it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.1.mixer.Wqkv.q_weight", shape: (6144, 256), dtype: uint32

  4%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰                                                                                                                   | 10/245 [00:02<00:51,  4.59it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.1.mixer.Wqkv.q_scale", shape: (6144, 64), dtype: float16

  4%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰                                                                                                                   | 10/245 [00:02<00:51,  4.59it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.1.mixer.Wqkv.bias", shape: (6144,), dtype: float16

  4%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰                                                                                                                   | 10/245 [00:02<00:51,  4.59it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.1.mixer.out_proj.q_weight", shape: (2048, 256), dtype: uint32

  4%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰                                                                                                                   | 10/245 [00:02<00:51,  4.59it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.1.mixer.out_proj.q_scale", shape: (2048, 64), dtype: float16

  4%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰                                                                                                                   | 10/245 [00:02<00:51,  4.59it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.1.mixer.out_proj.bias", shape: (2048,), dtype: float16

  4%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰                                                                                                                   | 10/245 [00:02<00:51,  4.59it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.1.mlp.fc1.q_weight", shape: (8192, 256), dtype: uint32

  4%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰                                                                                                                   | 10/245 [00:02<00:51,  4.59it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.1.mlp.fc1.q_scale", shape: (8192, 64), dtype: float16

  4%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰                                                                                                                   | 10/245 [00:02<00:51,  4.59it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.1.mlp.fc1.bias", shape: (8192,), dtype: float16

  4%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰                                                                                                                   | 10/245 [00:02<00:51,  4.59it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.1.mlp.fc2.q_weight", shape: (2048, 1024), dtype: uint32

  4%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰                                                                                                                   | 10/245 [00:02<00:51,  4.59it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.1.mlp.fc2.q_scale", shape: (2048, 256), dtype: float16

  4%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰                                                                                                                   | 10/245 [00:02<00:51,  4.59it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.1.mlp.fc2.bias", shape: (2048,), dtype: float16

  4%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰                                                                                                                   | 10/245 [00:02<00:51,  4.59it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.2.ln.weight", shape: (2048,), dtype: float16

  4%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰                                                                                                                   | 10/245 [00:02<00:51,  4.59it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.2.ln.bias", shape: (2048,), dtype: float16

  4%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰                                                                                                                   | 10/245 [00:02<00:51,  4.59it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.2.mixer.Wqkv.q_weight", shape: (6144, 256), dtype: uint32

  4%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰                                                                                                                   | 10/245 [00:02<00:51,  4.59it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.2.mixer.Wqkv.q_scale", shape: (6144, 64), dtype: float16

  4%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰                                                                                                                   | 10/245 [00:02<00:51,  4.59it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.2.mixer.Wqkv.bias", shape: (6144,), dtype: float16

  4%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰                                                                                                                   | 10/245 [00:02<00:51,  4.59it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.2.mixer.out_proj.q_weight", shape: (2048, 256), dtype: uint32

  4%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰                                                                                                                   | 10/245 [00:02<00:51,  4.59it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.2.mixer.out_proj.q_scale", shape: (2048, 64), dtype: float16

  4%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰                                                                                                                   | 10/245 [00:02<00:51,  4.59it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.2.mixer.out_proj.bias", shape: (2048,), dtype: float16

  4%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰                                                                                                                   | 10/245 [00:02<00:51,  4.59it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.2.mlp.fc1.q_weight", shape: (8192, 256), dtype: uint32

  4%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰                                                                                                                   | 10/245 [00:02<00:51,  4.59it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.2.mlp.fc1.q_scale", shape: (8192, 64), dtype: float16

  4%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰                                                                                                                   | 10/245 [00:02<00:51,  4.59it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.2.mlp.fc1.bias", shape: (8192,), dtype: float16

  4%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰                                                                                                                   | 10/245 [00:02<00:51,  4.59it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.2.mlp.fc2.q_weight", shape: (2048, 1024), dtype: uint32

  4%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰                                                                                                                   | 10/245 [00:02<00:51,  4.59it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.2.mlp.fc2.q_scale", shape: (2048, 256), dtype: float16

  4%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰                                                                                                                   | 10/245 [00:02<00:51,  4.59it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.2.mlp.fc2.bias", shape: (2048,), dtype: float16

  4%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰                                                                                                                   | 10/245 [00:02<00:51,  4.59it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.3.ln.weight", shape: (2048,), dtype: float16

  4%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰                                                                                                                   | 10/245 [00:02<00:51,  4.59it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.3.ln.bias", shape: (2048,), dtype: float16

  4%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰                                                                                                                   | 10/245 [00:02<00:51,  4.59it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.3.mixer.Wqkv.q_weight", shape: (6144, 256), dtype: uint32

  4%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰                                                                                                                   | 10/245 [00:02<00:51,  4.59it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.3.mixer.Wqkv.q_scale", shape: (6144, 64), dtype: float16

  4%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰                                                                                                                   | 10/245 [00:02<00:51,  4.59it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.3.mixer.Wqkv.bias", shape: (6144,), dtype: float16

  4%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰                                                                                                                   | 10/245 [00:02<00:51,  4.59it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.3.mixer.out_proj.q_weight", shape: (2048, 256), dtype: uint32

  4%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰                                                                                                                   | 10/245 [00:02<00:51,  4.59it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.3.mixer.out_proj.q_scale", shape: (2048, 64), dtype: float16

  4%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰                                                                                                                   | 10/245 [00:02<00:51,  4.59it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.3.mixer.out_proj.bias", shape: (2048,), dtype: float16

  4%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰                                                                                                                   | 10/245 [00:02<00:51,  4.59it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.3.mlp.fc1.q_weight", shape: (8192, 256), dtype: uint32

  4%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰                                                                                                                   | 10/245 [00:02<00:51,  4.59it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.3.mlp.fc1.q_scale", shape: (8192, 64), dtype: float16

  4%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰                                                                                                                   | 10/245 [00:02<00:51,  4.59it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.3.mlp.fc1.bias", shape: (8192,), dtype: float16

  4%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰                                                                                                                   | 10/245 [00:02<00:51,  4.59it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.3.mlp.fc2.q_weight", shape: (2048, 1024), dtype: uint32

  4%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰                                                                                                                   | 10/245 [00:02<00:51,  4.59it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.3.mlp.fc2.q_scale", shape: (2048, 256), dtype: float16

  4%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰                                                                                                                   | 10/245 [00:02<00:51,  4.59it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.3.mlp.fc2.bias", shape: (2048,), dtype: float16

  4%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰                                                                                                                   | 10/245 [00:02<00:51,  4.59it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.4.ln.weight", shape: (2048,), dtype: float16

  4%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰                                                                                                                   | 10/245 [00:02<00:51,  4.59it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.4.ln.bias", shape: (2048,), dtype: float16

  4%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰                                                                                                                   | 10/245 [00:02<00:51,  4.59it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.4.mixer.Wqkv.q_weight", shape: (6144, 256), dtype: uint32

  4%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰                                                                                                                   | 10/245 [00:02<00:51,  4.59it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.4.mixer.Wqkv.q_scale", shape: (6144, 64), dtype: float16

  4%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰                                                                                                                   | 10/245 [00:02<00:51,  4.59it/s]
 18%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                                                                                                  | 44/245 [00:02<00:05, 39.49it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.4.mixer.Wqkv.bias", shape: (6144,), dtype: float16

 18%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                                                                                                  | 44/245 [00:02<00:05, 39.49it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.4.mixer.out_proj.q_weight", shape: (2048, 256), dtype: uint32

 18%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                                                                                                  | 44/245 [00:02<00:05, 39.49it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.4.mixer.out_proj.q_scale", shape: (2048, 64), dtype: float16

 18%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                                                                                                  | 44/245 [00:02<00:05, 39.49it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.4.mixer.out_proj.bias", shape: (2048,), dtype: float16

 18%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                                                                                                  | 44/245 [00:02<00:05, 39.49it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.4.mlp.fc1.q_weight", shape: (8192, 256), dtype: uint32

 18%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                                                                                                  | 44/245 [00:02<00:05, 39.49it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.4.mlp.fc1.q_scale", shape: (8192, 64), dtype: float16

 18%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                                                                                                  | 44/245 [00:02<00:05, 39.49it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.4.mlp.fc1.bias", shape: (8192,), dtype: float16

 18%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                                                                                                  | 44/245 [00:02<00:05, 39.49it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.4.mlp.fc2.q_weight", shape: (2048, 1024), dtype: uint32

 18%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                                                                                                  | 44/245 [00:02<00:05, 39.49it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.4.mlp.fc2.q_scale", shape: (2048, 256), dtype: float16

 18%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                                                                                                  | 44/245 [00:02<00:05, 39.49it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.4.mlp.fc2.bias", shape: (2048,), dtype: float16

 18%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                                                                                                  | 44/245 [00:02<00:05, 39.49it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.5.ln.weight", shape: (2048,), dtype: float16

 18%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                                                                                                  | 44/245 [00:02<00:05, 39.49it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.5.ln.bias", shape: (2048,), dtype: float16

 18%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                                                                                                  | 44/245 [00:02<00:05, 39.49it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.5.mixer.Wqkv.q_weight", shape: (6144, 256), dtype: uint32

 18%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                                                                                                  | 44/245 [00:02<00:05, 39.49it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.5.mixer.Wqkv.q_scale", shape: (6144, 64), dtype: float16

 18%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                                                                                                  | 44/245 [00:02<00:05, 39.49it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.5.mixer.Wqkv.bias", shape: (6144,), dtype: float16

 18%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                                                                                                  | 44/245 [00:02<00:05, 39.49it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.5.mixer.out_proj.q_weight", shape: (2048, 256), dtype: uint32

 18%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                                                                                                  | 44/245 [00:02<00:05, 39.49it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.5.mixer.out_proj.q_scale", shape: (2048, 64), dtype: float16

 18%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                                                                                                  | 44/245 [00:02<00:05, 39.49it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.5.mixer.out_proj.bias", shape: (2048,), dtype: float16

 18%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                                                                                                  | 44/245 [00:02<00:05, 39.49it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.5.mlp.fc1.q_weight", shape: (8192, 256), dtype: uint32

 18%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                                                                                                  | 44/245 [00:02<00:05, 39.49it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.5.mlp.fc1.q_scale", shape: (8192, 64), dtype: float16

 18%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                                                                                                  | 44/245 [00:02<00:05, 39.49it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.5.mlp.fc1.bias", shape: (8192,), dtype: float16

 18%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                                                                                                  | 44/245 [00:02<00:05, 39.49it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.5.mlp.fc2.q_weight", shape: (2048, 1024), dtype: uint32

 18%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                                                                                                  | 44/245 [00:02<00:05, 39.49it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.5.mlp.fc2.q_scale", shape: (2048, 256), dtype: float16

 18%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                                                                                                  | 44/245 [00:02<00:05, 39.49it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.5.mlp.fc2.bias", shape: (2048,), dtype: float16

 18%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                                                                                                  | 44/245 [00:02<00:05, 39.49it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.6.ln.weight", shape: (2048,), dtype: float16

 18%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                                                                                                  | 44/245 [00:02<00:05, 39.49it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.6.ln.bias", shape: (2048,), dtype: float16

 18%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                                                                                                  | 44/245 [00:02<00:05, 39.49it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.6.mixer.Wqkv.q_weight", shape: (6144, 256), dtype: uint32

 18%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                                                                                                  | 44/245 [00:02<00:05, 39.49it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.6.mixer.Wqkv.q_scale", shape: (6144, 64), dtype: float16

 18%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                                                                                                  | 44/245 [00:02<00:05, 39.49it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.6.mixer.Wqkv.bias", shape: (6144,), dtype: float16

 18%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                                                                                                  | 44/245 [00:02<00:05, 39.49it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.6.mixer.out_proj.q_weight", shape: (2048, 256), dtype: uint32

 18%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                                                                                                  | 44/245 [00:02<00:05, 39.49it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.6.mixer.out_proj.q_scale", shape: (2048, 64), dtype: float16

 18%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                                                                                                  | 44/245 [00:02<00:05, 39.49it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.6.mixer.out_proj.bias", shape: (2048,), dtype: float16

 18%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                                                                                                  | 44/245 [00:02<00:05, 39.49it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.6.mlp.fc1.q_weight", shape: (8192, 256), dtype: uint32

 18%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                                                                                                  | 44/245 [00:02<00:05, 39.49it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.6.mlp.fc1.q_scale", shape: (8192, 64), dtype: float16

 18%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                                                                                                  | 44/245 [00:02<00:05, 39.49it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.6.mlp.fc1.bias", shape: (8192,), dtype: float16

 18%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                                                                                                  | 44/245 [00:02<00:05, 39.49it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.6.mlp.fc2.q_weight", shape: (2048, 1024), dtype: uint32

 18%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                                                                                                  | 44/245 [00:02<00:05, 39.49it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.6.mlp.fc2.q_scale", shape: (2048, 256), dtype: float16

 18%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                                                                                                  | 44/245 [00:02<00:05, 39.49it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.6.mlp.fc2.bias", shape: (2048,), dtype: float16

 18%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                                                                                                  | 44/245 [00:02<00:05, 39.49it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.7.ln.weight", shape: (2048,), dtype: float16

 18%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                                                                                                  | 44/245 [00:02<00:05, 39.49it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.7.ln.bias", shape: (2048,), dtype: float16

 18%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                                                                                                  | 44/245 [00:02<00:05, 39.49it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.7.mixer.Wqkv.q_weight", shape: (6144, 256), dtype: uint32

 18%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                                                                                                  | 44/245 [00:02<00:05, 39.49it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.7.mixer.Wqkv.q_scale", shape: (6144, 64), dtype: float16

 18%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                                                                                                  | 44/245 [00:02<00:05, 39.49it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.7.mixer.Wqkv.bias", shape: (6144,), dtype: float16

 18%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                                                                                                  | 44/245 [00:02<00:05, 39.49it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.7.mixer.out_proj.q_weight", shape: (2048, 256), dtype: uint32

 18%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                                                                                                  | 44/245 [00:02<00:05, 39.49it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.7.mixer.out_proj.q_scale", shape: (2048, 64), dtype: float16

 18%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                                                                                                  | 44/245 [00:02<00:05, 39.49it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.7.mixer.out_proj.bias", shape: (2048,), dtype: float16

 18%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                                                                                                  | 44/245 [00:02<00:05, 39.49it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.7.mlp.fc1.q_weight", shape: (8192, 256), dtype: uint32

 18%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                                                                                                  | 44/245 [00:02<00:05, 39.49it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.7.mlp.fc1.q_scale", shape: (8192, 64), dtype: float16

 18%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                                                                                                  | 44/245 [00:02<00:05, 39.49it/s]
 32%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                                 | 78/245 [00:02<00:02, 78.26it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.7.mlp.fc1.bias", shape: (8192,), dtype: float16

 32%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                                 | 78/245 [00:02<00:02, 78.26it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.7.mlp.fc2.q_weight", shape: (2048, 1024), dtype: uint32

 32%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                                 | 78/245 [00:02<00:02, 78.26it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.7.mlp.fc2.q_scale", shape: (2048, 256), dtype: float16

 32%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                                 | 78/245 [00:02<00:02, 78.26it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.7.mlp.fc2.bias", shape: (2048,), dtype: float16

 32%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                                 | 78/245 [00:02<00:02, 78.26it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.8.ln.weight", shape: (2048,), dtype: float16

 32%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                                 | 78/245 [00:02<00:02, 78.26it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.8.ln.bias", shape: (2048,), dtype: float16

 32%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                                 | 78/245 [00:02<00:02, 78.26it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.8.mixer.Wqkv.q_weight", shape: (6144, 256), dtype: uint32

 32%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                                 | 78/245 [00:02<00:02, 78.26it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.8.mixer.Wqkv.q_scale", shape: (6144, 64), dtype: float16

 32%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                                 | 78/245 [00:02<00:02, 78.26it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.8.mixer.Wqkv.bias", shape: (6144,), dtype: float16

 32%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                                 | 78/245 [00:02<00:02, 78.26it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.8.mixer.out_proj.q_weight", shape: (2048, 256), dtype: uint32

 32%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                                 | 78/245 [00:02<00:02, 78.26it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.8.mixer.out_proj.q_scale", shape: (2048, 64), dtype: float16

 32%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                                 | 78/245 [00:02<00:02, 78.26it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.8.mixer.out_proj.bias", shape: (2048,), dtype: float16

 32%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                                 | 78/245 [00:02<00:02, 78.26it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.8.mlp.fc1.q_weight", shape: (8192, 256), dtype: uint32

 32%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                                 | 78/245 [00:02<00:02, 78.26it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.8.mlp.fc1.q_scale", shape: (8192, 64), dtype: float16

 32%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                                 | 78/245 [00:02<00:02, 78.26it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.8.mlp.fc1.bias", shape: (8192,), dtype: float16

 32%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                                 | 78/245 [00:02<00:02, 78.26it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.8.mlp.fc2.q_weight", shape: (2048, 1024), dtype: uint32

 32%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                                 | 78/245 [00:02<00:02, 78.26it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.8.mlp.fc2.q_scale", shape: (2048, 256), dtype: float16

 32%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                                 | 78/245 [00:02<00:02, 78.26it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.8.mlp.fc2.bias", shape: (2048,), dtype: float16

 32%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                                 | 78/245 [00:02<00:02, 78.26it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.9.ln.weight", shape: (2048,), dtype: float16

 32%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                                 | 78/245 [00:02<00:02, 78.26it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.9.ln.bias", shape: (2048,), dtype: float16

 32%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                                 | 78/245 [00:02<00:02, 78.26it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.9.mixer.Wqkv.q_weight", shape: (6144, 256), dtype: uint32

 32%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                                 | 78/245 [00:02<00:02, 78.26it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.9.mixer.Wqkv.q_scale", shape: (6144, 64), dtype: float16

 32%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                                 | 78/245 [00:02<00:02, 78.26it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.9.mixer.Wqkv.bias", shape: (6144,), dtype: float16

 32%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                                 | 78/245 [00:02<00:02, 78.26it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.9.mixer.out_proj.q_weight", shape: (2048, 256), dtype: uint32

 32%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                                 | 78/245 [00:02<00:02, 78.26it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.9.mixer.out_proj.q_scale", shape: (2048, 64), dtype: float16

 32%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                                 | 78/245 [00:02<00:02, 78.26it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.9.mixer.out_proj.bias", shape: (2048,), dtype: float16

 32%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                                 | 78/245 [00:02<00:02, 78.26it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.9.mlp.fc1.q_weight", shape: (8192, 256), dtype: uint32

 32%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                                 | 78/245 [00:02<00:02, 78.26it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.9.mlp.fc1.q_scale", shape: (8192, 64), dtype: float16

 32%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                                 | 78/245 [00:02<00:02, 78.26it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.9.mlp.fc1.bias", shape: (8192,), dtype: float16

 32%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                                 | 78/245 [00:02<00:02, 78.26it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.9.mlp.fc2.q_weight", shape: (2048, 1024), dtype: uint32

 32%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                                 | 78/245 [00:02<00:02, 78.26it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.9.mlp.fc2.q_scale", shape: (2048, 256), dtype: float16

 32%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                                 | 78/245 [00:02<00:02, 78.26it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.9.mlp.fc2.bias", shape: (2048,), dtype: float16

 32%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                                 | 78/245 [00:02<00:02, 78.26it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.10.ln.weight", shape: (2048,), dtype: float16

 32%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                                 | 78/245 [00:02<00:02, 78.26it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.10.ln.bias", shape: (2048,), dtype: float16

 32%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                                 | 78/245 [00:02<00:02, 78.26it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.10.mixer.Wqkv.q_weight", shape: (6144, 256), dtype: uint32

 32%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                                 | 78/245 [00:02<00:02, 78.26it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.10.mixer.Wqkv.q_scale", shape: (6144, 64), dtype: float16

 32%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                                 | 78/245 [00:02<00:02, 78.26it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.10.mixer.Wqkv.bias", shape: (6144,), dtype: float16

 32%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                                 | 78/245 [00:02<00:02, 78.26it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.10.mixer.out_proj.q_weight", shape: (2048, 256), dtype: uint32

 32%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                                 | 78/245 [00:02<00:02, 78.26it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.10.mixer.out_proj.q_scale", shape: (2048, 64), dtype: float16

 32%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                                 | 78/245 [00:02<00:02, 78.26it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.10.mixer.out_proj.bias", shape: (2048,), dtype: float16

 32%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                                 | 78/245 [00:02<00:02, 78.26it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.10.mlp.fc1.q_weight", shape: (8192, 256), dtype: uint32

 32%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                                 | 78/245 [00:02<00:02, 78.26it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.10.mlp.fc1.q_scale", shape: (8192, 64), dtype: float16

 32%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                                 | 78/245 [00:02<00:02, 78.26it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.10.mlp.fc1.bias", shape: (8192,), dtype: float16

 32%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                                 | 78/245 [00:02<00:02, 78.26it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.10.mlp.fc2.q_weight", shape: (2048, 1024), dtype: uint32

 32%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                                 | 78/245 [00:02<00:02, 78.26it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.10.mlp.fc2.q_scale", shape: (2048, 256), dtype: float16

 32%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                                 | 78/245 [00:02<00:02, 78.26it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.10.mlp.fc2.bias", shape: (2048,), dtype: float16

 32%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                                 | 78/245 [00:02<00:02, 78.26it/s]
 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                | 111/245 [00:02<00:01, 117.47it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.11.ln.weight", shape: (2048,), dtype: float16

 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                | 111/245 [00:02<00:01, 117.47it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.11.ln.bias", shape: (2048,), dtype: float16

 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                | 111/245 [00:02<00:01, 117.47it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.11.mixer.Wqkv.q_weight", shape: (6144, 256), dtype: uint32

 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                | 111/245 [00:02<00:01, 117.47it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.11.mixer.Wqkv.q_scale", shape: (6144, 64), dtype: float16

 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                | 111/245 [00:02<00:01, 117.47it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.11.mixer.Wqkv.bias", shape: (6144,), dtype: float16

 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                | 111/245 [00:02<00:01, 117.47it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.11.mixer.out_proj.q_weight", shape: (2048, 256), dtype: uint32

 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                | 111/245 [00:02<00:01, 117.47it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.11.mixer.out_proj.q_scale", shape: (2048, 64), dtype: float16

 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                | 111/245 [00:02<00:01, 117.47it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.11.mixer.out_proj.bias", shape: (2048,), dtype: float16

 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                | 111/245 [00:02<00:01, 117.47it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.11.mlp.fc1.q_weight", shape: (8192, 256), dtype: uint32

 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                | 111/245 [00:02<00:01, 117.47it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.11.mlp.fc1.q_scale", shape: (8192, 64), dtype: float16

 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                | 111/245 [00:02<00:01, 117.47it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.11.mlp.fc1.bias", shape: (8192,), dtype: float16

 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                | 111/245 [00:02<00:01, 117.47it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.11.mlp.fc2.q_weight", shape: (2048, 1024), dtype: uint32

 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                | 111/245 [00:02<00:01, 117.47it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.11.mlp.fc2.q_scale", shape: (2048, 256), dtype: float16

 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                | 111/245 [00:02<00:01, 117.47it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.11.mlp.fc2.bias", shape: (2048,), dtype: float16

 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                | 111/245 [00:02<00:01, 117.47it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.12.ln.weight", shape: (2048,), dtype: float16

 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                | 111/245 [00:02<00:01, 117.47it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.12.ln.bias", shape: (2048,), dtype: float16

 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                | 111/245 [00:02<00:01, 117.47it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.12.mixer.Wqkv.q_weight", shape: (6144, 256), dtype: uint32

 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                | 111/245 [00:02<00:01, 117.47it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.12.mixer.Wqkv.q_scale", shape: (6144, 64), dtype: float16

 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                | 111/245 [00:02<00:01, 117.47it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.12.mixer.Wqkv.bias", shape: (6144,), dtype: float16

 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                | 111/245 [00:02<00:01, 117.47it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.12.mixer.out_proj.q_weight", shape: (2048, 256), dtype: uint32

 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                | 111/245 [00:03<00:01, 117.47it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.12.mixer.out_proj.q_scale", shape: (2048, 64), dtype: float16

 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                | 111/245 [00:03<00:01, 117.47it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.12.mixer.out_proj.bias", shape: (2048,), dtype: float16

 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                | 111/245 [00:03<00:01, 117.47it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.12.mlp.fc1.q_weight", shape: (8192, 256), dtype: uint32

 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                | 111/245 [00:03<00:01, 117.47it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.12.mlp.fc1.q_scale", shape: (8192, 64), dtype: float16

 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                | 111/245 [00:03<00:01, 117.47it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.12.mlp.fc1.bias", shape: (8192,), dtype: float16

 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                | 111/245 [00:03<00:01, 117.47it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.12.mlp.fc2.q_weight", shape: (2048, 1024), dtype: uint32

 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                | 111/245 [00:03<00:01, 117.47it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.12.mlp.fc2.q_scale", shape: (2048, 256), dtype: float16

 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                | 111/245 [00:03<00:01, 117.47it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.12.mlp.fc2.bias", shape: (2048,), dtype: float16

 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                | 111/245 [00:03<00:01, 117.47it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.13.ln.weight", shape: (2048,), dtype: float16

 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                | 111/245 [00:03<00:01, 117.47it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.13.ln.bias", shape: (2048,), dtype: float16

 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                | 111/245 [00:03<00:01, 117.47it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.13.mixer.Wqkv.q_weight", shape: (6144, 256), dtype: uint32

 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                | 111/245 [00:03<00:01, 117.47it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.13.mixer.Wqkv.q_scale", shape: (6144, 64), dtype: float16

 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                | 111/245 [00:03<00:01, 117.47it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.13.mixer.Wqkv.bias", shape: (6144,), dtype: float16

 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                | 111/245 [00:03<00:01, 117.47it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.13.mixer.out_proj.q_weight", shape: (2048, 256), dtype: uint32

 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                | 111/245 [00:03<00:01, 117.47it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.13.mixer.out_proj.q_scale", shape: (2048, 64), dtype: float16

 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                | 111/245 [00:03<00:01, 117.47it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.13.mixer.out_proj.bias", shape: (2048,), dtype: float16

 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                | 111/245 [00:03<00:01, 117.47it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.13.mlp.fc1.q_weight", shape: (8192, 256), dtype: uint32

 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                | 111/245 [00:03<00:01, 117.47it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.13.mlp.fc1.q_scale", shape: (8192, 64), dtype: float16

 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                | 111/245 [00:03<00:01, 117.47it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.13.mlp.fc1.bias", shape: (8192,), dtype: float16

 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                | 111/245 [00:03<00:01, 117.47it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.13.mlp.fc2.q_weight", shape: (2048, 1024), dtype: uint32

 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                | 111/245 [00:03<00:01, 117.47it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.13.mlp.fc2.q_scale", shape: (2048, 256), dtype: float16

 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                | 111/245 [00:03<00:01, 117.47it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.13.mlp.fc2.bias", shape: (2048,), dtype: float16

 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                | 111/245 [00:03<00:01, 117.47it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.14.ln.weight", shape: (2048,), dtype: float16

 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                | 111/245 [00:03<00:01, 117.47it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.14.ln.bias", shape: (2048,), dtype: float16

 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                | 111/245 [00:03<00:01, 117.47it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.14.mixer.Wqkv.q_weight", shape: (6144, 256), dtype: uint32

 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                | 111/245 [00:03<00:01, 117.47it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.14.mixer.Wqkv.q_scale", shape: (6144, 64), dtype: float16

 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                | 111/245 [00:03<00:01, 117.47it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.14.mixer.Wqkv.bias", shape: (6144,), dtype: float16

 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                | 111/245 [00:03<00:01, 117.47it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.14.mixer.out_proj.q_weight", shape: (2048, 256), dtype: uint32

 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                | 111/245 [00:03<00:01, 117.47it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.14.mixer.out_proj.q_scale", shape: (2048, 64), dtype: float16

 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                | 111/245 [00:03<00:01, 117.47it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.14.mixer.out_proj.bias", shape: (2048,), dtype: float16

 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                | 111/245 [00:03<00:01, 117.47it/s]
 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                                               | 147/245 [00:03<00:00, 161.54it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.14.mlp.fc1.q_weight", shape: (8192, 256), dtype: uint32

 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                                               | 147/245 [00:03<00:00, 161.54it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.14.mlp.fc1.q_scale", shape: (8192, 64), dtype: float16

 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                                               | 147/245 [00:03<00:00, 161.54it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.14.mlp.fc1.bias", shape: (8192,), dtype: float16

 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                                               | 147/245 [00:03<00:00, 161.54it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.14.mlp.fc2.q_weight", shape: (2048, 1024), dtype: uint32

 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                                               | 147/245 [00:03<00:00, 161.54it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.14.mlp.fc2.q_scale", shape: (2048, 256), dtype: float16

 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                                               | 147/245 [00:03<00:00, 161.54it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.14.mlp.fc2.bias", shape: (2048,), dtype: float16

 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                                               | 147/245 [00:03<00:00, 161.54it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.15.ln.weight", shape: (2048,), dtype: float16

 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                                               | 147/245 [00:03<00:00, 161.54it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.15.ln.bias", shape: (2048,), dtype: float16

 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                                               | 147/245 [00:03<00:00, 161.54it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.15.mixer.Wqkv.q_weight", shape: (6144, 256), dtype: uint32

 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                                               | 147/245 [00:03<00:00, 161.54it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.15.mixer.Wqkv.q_scale", shape: (6144, 64), dtype: float16

 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                                               | 147/245 [00:03<00:00, 161.54it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.15.mixer.Wqkv.bias", shape: (6144,), dtype: float16

 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                                               | 147/245 [00:03<00:00, 161.54it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.15.mixer.out_proj.q_weight", shape: (2048, 256), dtype: uint32

 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                                               | 147/245 [00:03<00:00, 161.54it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.15.mixer.out_proj.q_scale", shape: (2048, 64), dtype: float16

 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                                               | 147/245 [00:03<00:00, 161.54it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.15.mixer.out_proj.bias", shape: (2048,), dtype: float16

 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                                               | 147/245 [00:03<00:00, 161.54it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.15.mlp.fc1.q_weight", shape: (8192, 256), dtype: uint32

 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                                               | 147/245 [00:03<00:00, 161.54it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.15.mlp.fc1.q_scale", shape: (8192, 64), dtype: float16

 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                                               | 147/245 [00:03<00:00, 161.54it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.15.mlp.fc1.bias", shape: (8192,), dtype: float16

 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                                               | 147/245 [00:03<00:00, 161.54it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.15.mlp.fc2.q_weight", shape: (2048, 1024), dtype: uint32

 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                                               | 147/245 [00:03<00:00, 161.54it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.15.mlp.fc2.q_scale", shape: (2048, 256), dtype: float16

 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                                               | 147/245 [00:03<00:00, 161.54it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.15.mlp.fc2.bias", shape: (2048,), dtype: float16

 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                                               | 147/245 [00:03<00:00, 161.54it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.16.ln.weight", shape: (2048,), dtype: float16

 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                                               | 147/245 [00:03<00:00, 161.54it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.16.ln.bias", shape: (2048,), dtype: float16

 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                                               | 147/245 [00:03<00:00, 161.54it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.16.mixer.Wqkv.q_weight", shape: (6144, 256), dtype: uint32

 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                                               | 147/245 [00:03<00:00, 161.54it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.16.mixer.Wqkv.q_scale", shape: (6144, 64), dtype: float16

 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                                               | 147/245 [00:03<00:00, 161.54it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.16.mixer.Wqkv.bias", shape: (6144,), dtype: float16

 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                                               | 147/245 [00:03<00:00, 161.54it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.16.mixer.out_proj.q_weight", shape: (2048, 256), dtype: uint32

 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                                               | 147/245 [00:03<00:00, 161.54it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.16.mixer.out_proj.q_scale", shape: (2048, 64), dtype: float16

 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                                               | 147/245 [00:03<00:00, 161.54it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.16.mixer.out_proj.bias", shape: (2048,), dtype: float16

 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                                               | 147/245 [00:03<00:00, 161.54it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.16.mlp.fc1.q_weight", shape: (8192, 256), dtype: uint32

 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                                               | 147/245 [00:03<00:00, 161.54it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.16.mlp.fc1.q_scale", shape: (8192, 64), dtype: float16

 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                                               | 147/245 [00:03<00:00, 161.54it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.16.mlp.fc1.bias", shape: (8192,), dtype: float16

 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                                               | 147/245 [00:03<00:00, 161.54it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.16.mlp.fc2.q_weight", shape: (2048, 1024), dtype: uint32

 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                                               | 147/245 [00:03<00:00, 161.54it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.16.mlp.fc2.q_scale", shape: (2048, 256), dtype: float16

 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                                               | 147/245 [00:03<00:00, 161.54it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.16.mlp.fc2.bias", shape: (2048,), dtype: float16

 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                                               | 147/245 [00:03<00:00, 161.54it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.17.ln.weight", shape: (2048,), dtype: float16

 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                                               | 147/245 [00:03<00:00, 161.54it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.17.ln.bias", shape: (2048,), dtype: float16

 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                                               | 147/245 [00:03<00:00, 161.54it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.17.mixer.Wqkv.q_weight", shape: (6144, 256), dtype: uint32

 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                                               | 147/245 [00:03<00:00, 161.54it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.17.mixer.Wqkv.q_scale", shape: (6144, 64), dtype: float16

 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                                               | 147/245 [00:03<00:00, 161.54it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.17.mixer.Wqkv.bias", shape: (6144,), dtype: float16

 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                                               | 147/245 [00:03<00:00, 161.54it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.17.mixer.out_proj.q_weight", shape: (2048, 256), dtype: uint32

 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                                               | 147/245 [00:03<00:00, 161.54it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.17.mixer.out_proj.q_scale", shape: (2048, 64), dtype: float16

 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                                               | 147/245 [00:03<00:00, 161.54it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.17.mixer.out_proj.bias", shape: (2048,), dtype: float16

 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                                               | 147/245 [00:03<00:00, 161.54it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.17.mlp.fc1.q_weight", shape: (8192, 256), dtype: uint32

 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                                               | 147/245 [00:03<00:00, 161.54it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.17.mlp.fc1.q_scale", shape: (8192, 64), dtype: float16

 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                                               | 147/245 [00:03<00:00, 161.54it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.17.mlp.fc1.bias", shape: (8192,), dtype: float16

 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                                               | 147/245 [00:03<00:00, 161.54it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.17.mlp.fc2.q_weight", shape: (2048, 1024), dtype: uint32

 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                                               | 147/245 [00:03<00:00, 161.54it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.17.mlp.fc2.q_scale", shape: (2048, 256), dtype: float16

 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                                               | 147/245 [00:03<00:00, 161.54it/s]
 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                               | 180/245 [00:03<00:00, 192.97it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.17.mlp.fc2.bias", shape: (2048,), dtype: float16

 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                               | 180/245 [00:03<00:00, 192.97it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.18.ln.weight", shape: (2048,), dtype: float16

 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                               | 180/245 [00:03<00:00, 192.97it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.18.ln.bias", shape: (2048,), dtype: float16

 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                               | 180/245 [00:03<00:00, 192.97it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.18.mixer.Wqkv.q_weight", shape: (6144, 256), dtype: uint32

 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                               | 180/245 [00:03<00:00, 192.97it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.18.mixer.Wqkv.q_scale", shape: (6144, 64), dtype: float16

 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                               | 180/245 [00:03<00:00, 192.97it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.18.mixer.Wqkv.bias", shape: (6144,), dtype: float16

 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                               | 180/245 [00:03<00:00, 192.97it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.18.mixer.out_proj.q_weight", shape: (2048, 256), dtype: uint32

 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                               | 180/245 [00:03<00:00, 192.97it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.18.mixer.out_proj.q_scale", shape: (2048, 64), dtype: float16

 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                               | 180/245 [00:03<00:00, 192.97it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.18.mixer.out_proj.bias", shape: (2048,), dtype: float16

 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                               | 180/245 [00:03<00:00, 192.97it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.18.mlp.fc1.q_weight", shape: (8192, 256), dtype: uint32

 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                               | 180/245 [00:03<00:00, 192.97it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.18.mlp.fc1.q_scale", shape: (8192, 64), dtype: float16

 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                               | 180/245 [00:03<00:00, 192.97it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.18.mlp.fc1.bias", shape: (8192,), dtype: float16

 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                               | 180/245 [00:03<00:00, 192.97it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.18.mlp.fc2.q_weight", shape: (2048, 1024), dtype: uint32

 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                               | 180/245 [00:03<00:00, 192.97it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.18.mlp.fc2.q_scale", shape: (2048, 256), dtype: float16

 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                               | 180/245 [00:03<00:00, 192.97it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.18.mlp.fc2.bias", shape: (2048,), dtype: float16

 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                               | 180/245 [00:03<00:00, 192.97it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.19.ln.weight", shape: (2048,), dtype: float16

 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                               | 180/245 [00:03<00:00, 192.97it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.19.ln.bias", shape: (2048,), dtype: float16

 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                               | 180/245 [00:03<00:00, 192.97it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.19.mixer.Wqkv.q_weight", shape: (6144, 256), dtype: uint32

 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                               | 180/245 [00:03<00:00, 192.97it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.19.mixer.Wqkv.q_scale", shape: (6144, 64), dtype: float16

 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                               | 180/245 [00:03<00:00, 192.97it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.19.mixer.Wqkv.bias", shape: (6144,), dtype: float16

 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                               | 180/245 [00:03<00:00, 192.97it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.19.mixer.out_proj.q_weight", shape: (2048, 256), dtype: uint32

 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                               | 180/245 [00:03<00:00, 192.97it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.19.mixer.out_proj.q_scale", shape: (2048, 64), dtype: float16

 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                               | 180/245 [00:03<00:00, 192.97it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.19.mixer.out_proj.bias", shape: (2048,), dtype: float16

 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                               | 180/245 [00:03<00:00, 192.97it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.19.mlp.fc1.q_weight", shape: (8192, 256), dtype: uint32

 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                               | 180/245 [00:03<00:00, 192.97it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.19.mlp.fc1.q_scale", shape: (8192, 64), dtype: float16

 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                               | 180/245 [00:03<00:00, 192.97it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.19.mlp.fc1.bias", shape: (8192,), dtype: float16

 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                               | 180/245 [00:03<00:00, 192.97it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.19.mlp.fc2.q_weight", shape: (2048, 1024), dtype: uint32

 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                               | 180/245 [00:03<00:00, 192.97it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.19.mlp.fc2.q_scale", shape: (2048, 256), dtype: float16

 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                               | 180/245 [00:03<00:00, 192.97it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.19.mlp.fc2.bias", shape: (2048,), dtype: float16

 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                               | 180/245 [00:03<00:00, 192.97it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.20.ln.weight", shape: (2048,), dtype: float16

 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                               | 180/245 [00:03<00:00, 192.97it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.20.ln.bias", shape: (2048,), dtype: float16

 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                               | 180/245 [00:03<00:00, 192.97it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.20.mixer.Wqkv.q_weight", shape: (6144, 256), dtype: uint32

 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                               | 180/245 [00:03<00:00, 192.97it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.20.mixer.Wqkv.q_scale", shape: (6144, 64), dtype: float16

 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                               | 180/245 [00:03<00:00, 192.97it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.20.mixer.Wqkv.bias", shape: (6144,), dtype: float16

 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                               | 180/245 [00:03<00:00, 192.97it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.20.mixer.out_proj.q_weight", shape: (2048, 256), dtype: uint32

 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                               | 180/245 [00:03<00:00, 192.97it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.20.mixer.out_proj.q_scale", shape: (2048, 64), dtype: float16

 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                               | 180/245 [00:03<00:00, 192.97it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.20.mixer.out_proj.bias", shape: (2048,), dtype: float16

 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                               | 180/245 [00:03<00:00, 192.97it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.20.mlp.fc1.q_weight", shape: (8192, 256), dtype: uint32

 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                               | 180/245 [00:03<00:00, 192.97it/s]
                                                                                                                                                              
[2023-12-28 23:33:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.20.mlp.fc1.q_scale", shape: (8192, 64), dtype: float16

 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                               | 180/245 [00:03<00:00, 192.97it/s]
                                                                                                                                                              
[2023-12-28 23:33:29] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.20.mlp.fc1.bias", shape: (8192,), dtype: float16

 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                               | 180/245 [00:03<00:00, 192.97it/s]
                                                                                                                                                              
[2023-12-28 23:33:29] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.20.mlp.fc2.q_weight", shape: (2048, 1024), dtype: uint32

 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                               | 180/245 [00:03<00:00, 192.97it/s]
                                                                                                                                                              
[2023-12-28 23:33:29] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.20.mlp.fc2.q_scale", shape: (2048, 256), dtype: float16

 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                               | 180/245 [00:03<00:00, 192.97it/s]
                                                                                                                                                              
[2023-12-28 23:33:29] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.20.mlp.fc2.bias", shape: (2048,), dtype: float16

 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                               | 180/245 [00:03<00:00, 192.97it/s]
                                                                                                                                                              
[2023-12-28 23:33:29] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.21.ln.weight", shape: (2048,), dtype: float16

 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                               | 180/245 [00:03<00:00, 192.97it/s]
                                                                                                                                                              
[2023-12-28 23:33:29] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.21.ln.bias", shape: (2048,), dtype: float16

 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                               | 180/245 [00:03<00:00, 192.97it/s]
                                                                                                                                                              
[2023-12-28 23:33:29] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.21.mixer.Wqkv.q_weight", shape: (6144, 256), dtype: uint32

 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                               | 180/245 [00:03<00:00, 192.97it/s]
                                                                                                                                                              
[2023-12-28 23:33:29] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.21.mixer.Wqkv.q_scale", shape: (6144, 64), dtype: float16

 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                               | 180/245 [00:03<00:00, 192.97it/s]
                                                                                                                                                              
[2023-12-28 23:33:29] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.21.mixer.Wqkv.bias", shape: (6144,), dtype: float16

 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                               | 180/245 [00:03<00:00, 192.97it/s]
                                                                                                                                                              
[2023-12-28 23:33:29] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.21.mixer.out_proj.q_weight", shape: (2048, 256), dtype: uint32

 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                               | 180/245 [00:03<00:00, 192.97it/s]
                                                                                                                                                              
[2023-12-28 23:33:29] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.21.mixer.out_proj.q_scale", shape: (2048, 64), dtype: float16

 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                               | 180/245 [00:03<00:00, 192.97it/s]
                                                                                                                                                              
[2023-12-28 23:33:29] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.21.mixer.out_proj.bias", shape: (2048,), dtype: float16

 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                               | 180/245 [00:03<00:00, 192.97it/s]
                                                                                                                                                              
[2023-12-28 23:33:29] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.21.mlp.fc1.q_weight", shape: (8192, 256), dtype: uint32

 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                               | 180/245 [00:03<00:00, 192.97it/s]
                                                                                                                                                              
[2023-12-28 23:33:29] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.21.mlp.fc1.q_scale", shape: (8192, 64), dtype: float16

 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                               | 180/245 [00:03<00:00, 192.97it/s]
 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰             | 218/245 [00:03<00:00, 230.43it/s]
                                                                                                                                                              
[2023-12-28 23:33:29] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.21.mlp.fc1.bias", shape: (8192,), dtype: float16

 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰             | 218/245 [00:03<00:00, 230.43it/s]
                                                                                                                                                              
[2023-12-28 23:33:29] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.21.mlp.fc2.q_weight", shape: (2048, 1024), dtype: uint32

 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰             | 218/245 [00:03<00:00, 230.43it/s]
                                                                                                                                                              
[2023-12-28 23:33:29] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.21.mlp.fc2.q_scale", shape: (2048, 256), dtype: float16

 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰             | 218/245 [00:03<00:00, 230.43it/s]
                                                                                                                                                              
[2023-12-28 23:33:29] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.21.mlp.fc2.bias", shape: (2048,), dtype: float16

 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰             | 218/245 [00:03<00:00, 230.43it/s]
                                                                                                                                                              
[2023-12-28 23:33:29] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.22.ln.weight", shape: (2048,), dtype: float16

 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰             | 218/245 [00:03<00:00, 230.43it/s]
                                                                                                                                                              
[2023-12-28 23:33:29] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.22.ln.bias", shape: (2048,), dtype: float16

 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰             | 218/245 [00:03<00:00, 230.43it/s]
                                                                                                                                                              
[2023-12-28 23:33:29] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.22.mixer.Wqkv.q_weight", shape: (6144, 256), dtype: uint32

 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰             | 218/245 [00:03<00:00, 230.43it/s]
                                                                                                                                                              
[2023-12-28 23:33:29] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.22.mixer.Wqkv.q_scale", shape: (6144, 64), dtype: float16

 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰             | 218/245 [00:03<00:00, 230.43it/s]
                                                                                                                                                              
[2023-12-28 23:33:29] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.22.mixer.Wqkv.bias", shape: (6144,), dtype: float16

 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰             | 218/245 [00:03<00:00, 230.43it/s]
                                                                                                                                                              
[2023-12-28 23:33:29] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.22.mixer.out_proj.q_weight", shape: (2048, 256), dtype: uint32

 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰             | 218/245 [00:03<00:00, 230.43it/s]
                                                                                                                                                              
[2023-12-28 23:33:29] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.22.mixer.out_proj.q_scale", shape: (2048, 64), dtype: float16

 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰             | 218/245 [00:03<00:00, 230.43it/s]
                                                                                                                                                              
[2023-12-28 23:33:29] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.22.mixer.out_proj.bias", shape: (2048,), dtype: float16

 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰             | 218/245 [00:03<00:00, 230.43it/s]
                                                                                                                                                              
[2023-12-28 23:33:29] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.22.mlp.fc1.q_weight", shape: (8192, 256), dtype: uint32

 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰             | 218/245 [00:03<00:00, 230.43it/s]
                                                                                                                                                              
[2023-12-28 23:33:29] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.22.mlp.fc1.q_scale", shape: (8192, 64), dtype: float16

 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰             | 218/245 [00:03<00:00, 230.43it/s]
                                                                                                                                                              
[2023-12-28 23:33:29] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.22.mlp.fc1.bias", shape: (8192,), dtype: float16

 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰             | 218/245 [00:03<00:00, 230.43it/s]
                                                                                                                                                              
[2023-12-28 23:33:29] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.22.mlp.fc2.q_weight", shape: (2048, 1024), dtype: uint32

 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰             | 218/245 [00:03<00:00, 230.43it/s]
                                                                                                                                                              
[2023-12-28 23:33:29] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.22.mlp.fc2.q_scale", shape: (2048, 256), dtype: float16

 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰             | 218/245 [00:03<00:00, 230.43it/s]
                                                                                                                                                              
[2023-12-28 23:33:29] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.22.mlp.fc2.bias", shape: (2048,), dtype: float16

 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰             | 218/245 [00:03<00:00, 230.43it/s]
                                                                                                                                                              
[2023-12-28 23:33:29] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.23.ln.weight", shape: (2048,), dtype: float16

 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰             | 218/245 [00:03<00:00, 230.43it/s]
                                                                                                                                                              
[2023-12-28 23:33:29] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.23.ln.bias", shape: (2048,), dtype: float16

 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰             | 218/245 [00:03<00:00, 230.43it/s]
                                                                                                                                                              
[2023-12-28 23:33:29] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.23.mixer.Wqkv.q_weight", shape: (6144, 256), dtype: uint32

 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰             | 218/245 [00:03<00:00, 230.43it/s]
                                                                                                                                                              
[2023-12-28 23:33:29] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.23.mixer.Wqkv.q_scale", shape: (6144, 64), dtype: float16

 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰             | 218/245 [00:03<00:00, 230.43it/s]
                                                                                                                                                              
[2023-12-28 23:33:29] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.23.mixer.Wqkv.bias", shape: (6144,), dtype: float16

 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰             | 218/245 [00:03<00:00, 230.43it/s]
                                                                                                                                                              
[2023-12-28 23:33:29] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.23.mixer.out_proj.q_weight", shape: (2048, 256), dtype: uint32

 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰             | 218/245 [00:03<00:00, 230.43it/s]
                                                                                                                                                              
[2023-12-28 23:33:29] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.23.mixer.out_proj.q_scale", shape: (2048, 64), dtype: float16

 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰             | 218/245 [00:03<00:00, 230.43it/s]
                                                                                                                                                              
[2023-12-28 23:33:29] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.23.mixer.out_proj.bias", shape: (2048,), dtype: float16

 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰             | 218/245 [00:03<00:00, 230.43it/s]
                                                                                                                                                              
[2023-12-28 23:33:29] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.23.mlp.fc1.q_weight", shape: (8192, 256), dtype: uint32

 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰             | 218/245 [00:03<00:00, 230.43it/s]
                                                                                                                                                              
[2023-12-28 23:33:29] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.23.mlp.fc1.q_scale", shape: (8192, 64), dtype: float16

 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰             | 218/245 [00:03<00:00, 230.43it/s]
                                                                                                                                                              
[2023-12-28 23:33:29] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.23.mlp.fc1.bias", shape: (8192,), dtype: float16

 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰             | 218/245 [00:03<00:00, 230.43it/s]
                                                                                                                                                              
[2023-12-28 23:33:29] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.23.mlp.fc2.q_weight", shape: (2048, 1024), dtype: uint32

 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰             | 218/245 [00:03<00:00, 230.43it/s]
                                                                                                                                                              
[2023-12-28 23:33:29] INFO huggingface_loader.py:121: [Quantized] Parameter: "transformer.h.23.mlp.fc2.q_scale", shape: (2048, 256), dtype: float16

 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰             | 218/245 [00:03<00:00, 230.43it/s]
                                                                                                                                                              
[2023-12-28 23:33:29] INFO huggingface_loader.py:129: [Not quantized] Parameter: "transformer.h.23.mlp.fc2.bias", shape: (2048,), dtype: float16

 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰             | 218/245 [00:03<00:00, 230.43it/s]
                                                                                                                                                              
[2023-12-28 23:33:29] INFO huggingface_loader.py:129: [Not quantized] Parameter: "lm_head.ln.weight", shape: (2048,), dtype: float16

 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰             | 218/245 [00:03<00:00, 230.43it/s]
                                                                                                                                                              
[2023-12-28 23:33:29] INFO huggingface_loader.py:129: [Not quantized] Parameter: "lm_head.ln.bias", shape: (2048,), dtype: float16

 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰             | 218/245 [00:03<00:00, 230.43it/s]
                                                                                                                                                              
[2023-12-28 23:33:29] INFO huggingface_loader.py:121: [Quantized] Parameter: "lm_head.linear.q_weight", shape: (51200, 256), dtype: uint32

 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰             | 218/245 [00:03<00:00, 230.43it/s]
                                                                                                                                                              
[2023-12-28 23:33:29] INFO huggingface_loader.py:121: [Quantized] Parameter: "lm_head.linear.q_scale", shape: (51200, 64), dtype: float16

 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰             | 218/245 [00:03<00:00, 230.43it/s]
                                                                                                                                                              
[2023-12-28 23:33:29] INFO huggingface_loader.py:129: [Not quantized] Parameter: "lm_head.linear.bias", shape: (51200,), dtype: float16

 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰             | 218/245 [00:03<00:00, 230.43it/s]
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 245/245 [00:03<00:00, 71.86it/s]
[2023-12-28 23:33:29] INFO huggingface_loader.py:179: Unloading HF weight file: /ssd1/cfruan/mlc-llm-repos/mlc-llm-head/dist/models/phi-1_5/pytorch_model.bin
[2023-12-28 23:33:29] INFO stats.py:71: Time usage: HF loading: 1.868 sec; Pre-quantization mapping: 0.359 sec; Quantization: 2.522 sec
[2023-12-28 23:33:29] INFO stats.py:85: RAM usage: Peak RAM: 2.642 GB. Total bytes loaded from disk: 2.642 GB
[2023-12-28 23:33:29] INFO convert_weight.py:110: Parameter size after quantization: 0.744 GB
[2023-12-28 23:33:29] INFO convert_weight.py:115: Total parameters: 1,418,270,720
[2023-12-28 23:33:29] INFO convert_weight.py:116: Bits per parameter: 4.505
Start storing to cache /tmp/tmpxe445xtc


[0001/0343] saving transformer.embd.q_weight
                                            
[0002/0343] saving transformer.embd.q_scale
                                            
[0003/0343] saving transformer.h.0.ln.weight
                                            
[0004/0343] saving transformer.h.0.ln.bias
                                            
[0005/0343] saving transformer.h.0.mixer.Wqkv.q_weight
                                                      
[0006/0343] saving transformer.h.0.mixer.Wqkv.q_scale
                                                      
[0007/0343] saving transformer.h.0.mixer.Wqkv.bias
                                                      
[0008/0343] saving transformer.h.0.mixer.out_proj.q_weight
                                                          
[0009/0343] saving transformer.h.0.mixer.out_proj.q_scale
                                                          
[0010/0343] saving transformer.h.0.mixer.out_proj.bias
                                                          
[0011/0343] saving transformer.h.0.mlp.fc1.q_weight
                                                          
[0012/0343] saving transformer.h.0.mlp.fc1.q_scale
                                                          
[0013/0343] saving transformer.h.0.mlp.fc1.bias
                                                          
[0014/0343] saving transformer.h.0.mlp.fc2.q_weight
                                                          
[0015/0343] saving transformer.h.0.mlp.fc2.q_scale
                                                          
[0016/0343] saving transformer.h.0.mlp.fc2.bias
                                                          
[0017/0343] saving transformer.h.1.ln.weight
                                                          
[0018/0343] saving transformer.h.1.ln.bias
                                                          
[0019/0343] saving transformer.h.1.mixer.Wqkv.q_weight
                                                          
[0020/0343] saving transformer.h.1.mixer.Wqkv.q_scale
                                                          
[0021/0343] saving transformer.h.1.mixer.Wqkv.bias
                                                          
[0022/0343] saving transformer.h.1.mixer.out_proj.q_weight
                                                          
[0023/0343] saving transformer.h.1.mixer.out_proj.q_scale
                                                          
[0024/0343] saving transformer.h.1.mixer.out_proj.bias
                                                          
[0025/0343] saving transformer.h.1.mlp.fc1.q_weight
                                                          
[0026/0343] saving transformer.h.1.mlp.fc1.q_scale
                                                          
[0027/0343] saving transformer.h.1.mlp.fc1.bias
                                                          
[0028/0343] saving transformer.h.1.mlp.fc2.q_weight
                                                          
[0029/0343] saving transformer.h.1.mlp.fc2.q_scale
                                                          
[0030/0343] saving transformer.h.1.mlp.fc2.bias
                                                          
[0031/0343] saving transformer.h.2.ln.weight
                                                          
[0032/0343] saving transformer.h.2.ln.bias
                                                          
[0033/0343] saving transformer.h.2.mixer.Wqkv.q_weight
                                                          
[0034/0343] saving transformer.h.2.mixer.Wqkv.q_scale
                                                          
[0035/0343] saving transformer.h.2.mixer.Wqkv.bias
                                                          
[0036/0343] saving transformer.h.2.mixer.out_proj.q_weight
                                                          
[0037/0343] saving transformer.h.2.mixer.out_proj.q_scale
                                                          
[0038/0343] saving transformer.h.2.mixer.out_proj.bias
                                                          
[0039/0343] saving transformer.h.2.mlp.fc1.q_weight
                                                          
[0040/0343] saving transformer.h.2.mlp.fc1.q_scale
                                                          
[0041/0343] saving transformer.h.2.mlp.fc1.bias
                                                          
[0042/0343] saving transformer.h.2.mlp.fc2.q_weight
                                                          
[0043/0343] saving transformer.h.2.mlp.fc2.q_scale
                                                          
[0044/0343] saving transformer.h.2.mlp.fc2.bias
                                                          
[0045/0343] saving transformer.h.3.ln.weight
                                                          
[0046/0343] saving transformer.h.3.ln.bias
                                                          
[0047/0343] saving transformer.h.3.mixer.Wqkv.q_weight
                                                          
[0048/0343] saving transformer.h.3.mixer.Wqkv.q_scale
                                                          
[0049/0343] saving transformer.h.3.mixer.Wqkv.bias
                                                          
[0050/0343] saving transformer.h.3.mixer.out_proj.q_weight
                                                          
[0051/0343] saving transformer.h.3.mixer.out_proj.q_scale
                                                          
[0052/0343] saving transformer.h.3.mixer.out_proj.bias
                                                          
[0053/0343] saving transformer.h.3.mlp.fc1.q_weight
                                                          
[0054/0343] saving transformer.h.3.mlp.fc1.q_scale
                                                          
[0055/0343] saving transformer.h.3.mlp.fc1.bias
                                                          
[0056/0343] saving transformer.h.3.mlp.fc2.q_weight
                                                          
[0057/0343] saving transformer.h.3.mlp.fc2.q_scale
                                                          
[0058/0343] saving transformer.h.3.mlp.fc2.bias
                                                          
[0059/0343] saving transformer.h.4.ln.weight
                                                          
[0060/0343] saving transformer.h.4.ln.bias
                                                          
[0061/0343] saving transformer.h.4.mixer.Wqkv.q_weight
                                                          
[0062/0343] saving transformer.h.4.mixer.Wqkv.q_scale
                                                          
[0063/0343] saving transformer.h.4.mixer.Wqkv.bias
                                                          
[0064/0343] saving transformer.h.4.mixer.out_proj.q_weight
                                                          
[0065/0343] saving transformer.h.4.mixer.out_proj.q_scale
                                                          
[0066/0343] saving transformer.h.4.mixer.out_proj.bias
                                                          
[0067/0343] saving transformer.h.4.mlp.fc1.q_weight
                                                          
[0068/0343] saving transformer.h.4.mlp.fc1.q_scale
                                                          
[0069/0343] saving transformer.h.4.mlp.fc1.bias
                                                          
[0070/0343] saving transformer.h.4.mlp.fc2.q_weight
                                                          
[0071/0343] saving transformer.h.4.mlp.fc2.q_scale
                                                          
[0072/0343] saving transformer.h.4.mlp.fc2.bias
                                                          
[0073/0343] saving transformer.h.5.ln.weight
                                                          
[0074/0343] saving transformer.h.5.ln.bias
                                                          
[0075/0343] saving transformer.h.5.mixer.Wqkv.q_weight
                                                          
[0076/0343] saving transformer.h.5.mixer.Wqkv.q_scale
                                                          
[0077/0343] saving transformer.h.5.mixer.Wqkv.bias
                                                          
[0078/0343] saving transformer.h.5.mixer.out_proj.q_weight
                                                          
[0079/0343] saving transformer.h.5.mixer.out_proj.q_scale
                                                          
[0080/0343] saving transformer.h.5.mixer.out_proj.bias
                                                          
[0081/0343] saving transformer.h.5.mlp.fc1.q_weight
                                                          
[0082/0343] saving transformer.h.5.mlp.fc1.q_scale
                                                          
[0083/0343] saving transformer.h.5.mlp.fc1.bias
                                                          
[0084/0343] saving transformer.h.5.mlp.fc2.q_weight
                                                          
[0085/0343] saving transformer.h.5.mlp.fc2.q_scale
                                                          
[0086/0343] saving transformer.h.5.mlp.fc2.bias
                                                          
[0087/0343] saving transformer.h.6.ln.weight
                                                          
[0088/0343] saving transformer.h.6.ln.bias
                                                          
[0089/0343] saving transformer.h.6.mixer.Wqkv.q_weight
                                                          
[0090/0343] saving transformer.h.6.mixer.Wqkv.q_scale
                                                          
[0091/0343] saving transformer.h.6.mixer.Wqkv.bias
                                                          
[0092/0343] saving transformer.h.6.mixer.out_proj.q_weight
                                                          
[0093/0343] saving transformer.h.6.mixer.out_proj.q_scale
                                                          
[0094/0343] saving transformer.h.6.mixer.out_proj.bias
                                                          
[0095/0343] saving transformer.h.6.mlp.fc1.q_weight
                                                          
[0096/0343] saving transformer.h.6.mlp.fc1.q_scale
                                                          
[0097/0343] saving transformer.h.6.mlp.fc1.bias
                                                          
[0098/0343] saving transformer.h.6.mlp.fc2.q_weight
                                                          
[0099/0343] saving transformer.h.6.mlp.fc2.q_scale
                                                          
[0100/0343] saving transformer.h.6.mlp.fc2.bias
                                                          
[0101/0343] saving transformer.h.7.ln.weight
                                                          
[0102/0343] saving transformer.h.7.ln.bias
                                                          
[0103/0343] saving transformer.h.7.mixer.Wqkv.q_weight
                                                          
[0104/0343] saving transformer.h.7.mixer.Wqkv.q_scale
                                                          
[0105/0343] saving transformer.h.7.mixer.Wqkv.bias
                                                          
[0106/0343] saving transformer.h.7.mixer.out_proj.q_weight
                                                          
[0107/0343] saving transformer.h.7.mixer.out_proj.q_scale
                                                          
[0108/0343] saving transformer.h.7.mixer.out_proj.bias
                                                          
[0109/0343] saving transformer.h.7.mlp.fc1.q_weight
                                                          
[0110/0343] saving transformer.h.7.mlp.fc1.q_scale
                                                          
[0111/0343] saving transformer.h.7.mlp.fc1.bias
                                                          
[0112/0343] saving transformer.h.7.mlp.fc2.q_weight
                                                          
[0113/0343] saving transformer.h.7.mlp.fc2.q_scale
                                                          
[0114/0343] saving transformer.h.7.mlp.fc2.bias
                                                          
[0115/0343] saving transformer.h.8.ln.weight
                                                          
[0116/0343] saving transformer.h.8.ln.bias
                                                          
[0117/0343] saving transformer.h.8.mixer.Wqkv.q_weight
                                                          
[0118/0343] saving transformer.h.8.mixer.Wqkv.q_scale
                                                          
[0119/0343] saving transformer.h.8.mixer.Wqkv.bias
                                                          
[0120/0343] saving transformer.h.8.mixer.out_proj.q_weight
                                                          
[0121/0343] saving transformer.h.8.mixer.out_proj.q_scale
                                                          
[0122/0343] saving transformer.h.8.mixer.out_proj.bias
                                                          
[0123/0343] saving transformer.h.8.mlp.fc1.q_weight
                                                          
[0124/0343] saving transformer.h.8.mlp.fc1.q_scale
                                                          
[0125/0343] saving transformer.h.8.mlp.fc1.bias
                                                          
[0126/0343] saving transformer.h.8.mlp.fc2.q_weight
                                                          
[0127/0343] saving transformer.h.8.mlp.fc2.q_scale
                                                          
[0128/0343] saving transformer.h.8.mlp.fc2.bias
                                                          
[0129/0343] saving transformer.h.9.ln.weight
                                                          
[0130/0343] saving transformer.h.9.ln.bias
                                                          
[0131/0343] saving transformer.h.9.mixer.Wqkv.q_weight
                                                          
[0132/0343] saving transformer.h.9.mixer.Wqkv.q_scale
                                                          
[0133/0343] saving transformer.h.9.mixer.Wqkv.bias
                                                          
[0134/0343] saving transformer.h.9.mixer.out_proj.q_weight
                                                          
[0135/0343] saving transformer.h.9.mixer.out_proj.q_scale
                                                          
[0136/0343] saving transformer.h.9.mixer.out_proj.bias
                                                          
[0137/0343] saving transformer.h.9.mlp.fc1.q_weight
                                                          
[0138/0343] saving transformer.h.9.mlp.fc1.q_scale
                                                          
[0139/0343] saving transformer.h.9.mlp.fc1.bias
                                                          
[0140/0343] saving transformer.h.9.mlp.fc2.q_weight
                                                          
[0141/0343] saving transformer.h.9.mlp.fc2.q_scale
                                                          
[0142/0343] saving transformer.h.9.mlp.fc2.bias
                                                          
[0143/0343] saving transformer.h.10.ln.weight
                                                          
[0144/0343] saving transformer.h.10.ln.bias
                                                          
[0145/0343] saving transformer.h.10.mixer.Wqkv.q_weight
                                                          
[0146/0343] saving transformer.h.10.mixer.Wqkv.q_scale
                                                          
[0147/0343] saving transformer.h.10.mixer.Wqkv.bias
                                                          
[0148/0343] saving transformer.h.10.mixer.out_proj.q_weight
                                                           
[0149/0343] saving transformer.h.10.mixer.out_proj.q_scale
                                                           
[0150/0343] saving transformer.h.10.mixer.out_proj.bias
                                                           
[0151/0343] saving transformer.h.10.mlp.fc1.q_weight
                                                           
[0152/0343] saving transformer.h.10.mlp.fc1.q_scale
                                                           
[0153/0343] saving transformer.h.10.mlp.fc1.bias
                                                           
[0154/0343] saving transformer.h.10.mlp.fc2.q_weight
                                                           
[0155/0343] saving transformer.h.10.mlp.fc2.q_scale
                                                           
[0156/0343] saving transformer.h.10.mlp.fc2.bias
                                                           
[0157/0343] saving transformer.h.11.ln.weight
                                                           
[0158/0343] saving transformer.h.11.ln.bias
                                                           
[0159/0343] saving transformer.h.11.mixer.Wqkv.q_weight
                                                           
[0160/0343] saving transformer.h.11.mixer.Wqkv.q_scale
                                                           
[0161/0343] saving transformer.h.11.mixer.Wqkv.bias
                                                           
[0162/0343] saving transformer.h.11.mixer.out_proj.q_weight
                                                           
[0163/0343] saving transformer.h.11.mixer.out_proj.q_scale
                                                           
[0164/0343] saving transformer.h.11.mixer.out_proj.bias
                                                           
[0165/0343] saving transformer.h.11.mlp.fc1.q_weight
                                                           
[0166/0343] saving transformer.h.11.mlp.fc1.q_scale
                                                           
[0167/0343] saving transformer.h.11.mlp.fc1.bias
                                                           
[0168/0343] saving transformer.h.11.mlp.fc2.q_weight
                                                           
[0169/0343] saving transformer.h.11.mlp.fc2.q_scale
                                                           
[0170/0343] saving transformer.h.11.mlp.fc2.bias
                                                           
[0171/0343] saving transformer.h.12.ln.weight
                                                           
[0172/0343] saving transformer.h.12.ln.bias
                                                           
[0173/0343] saving transformer.h.12.mixer.Wqkv.q_weight
                                                           
[0174/0343] saving transformer.h.12.mixer.Wqkv.q_scale
                                                           
[0175/0343] saving transformer.h.12.mixer.Wqkv.bias
                                                           
[0176/0343] saving transformer.h.12.mixer.out_proj.q_weight
                                                           
[0177/0343] saving transformer.h.12.mixer.out_proj.q_scale
                                                           
[0178/0343] saving transformer.h.12.mixer.out_proj.bias
                                                           
[0179/0343] saving transformer.h.12.mlp.fc1.q_weight
                                                           
[0180/0343] saving transformer.h.12.mlp.fc1.q_scale
                                                           
[0181/0343] saving transformer.h.12.mlp.fc1.bias
                                                           
[0182/0343] saving transformer.h.12.mlp.fc2.q_weight
                                                           
[0183/0343] saving transformer.h.12.mlp.fc2.q_scale
                                                           
[0184/0343] saving transformer.h.12.mlp.fc2.bias
                                                           
[0185/0343] saving transformer.h.13.ln.weight
                                                           
[0186/0343] saving transformer.h.13.ln.bias
                                                           
[0187/0343] saving transformer.h.13.mixer.Wqkv.q_weight
                                                           
[0188/0343] saving transformer.h.13.mixer.Wqkv.q_scale
                                                           
[0189/0343] saving transformer.h.13.mixer.Wqkv.bias
                                                           
[0190/0343] saving transformer.h.13.mixer.out_proj.q_weight
                                                           
[0191/0343] saving transformer.h.13.mixer.out_proj.q_scale
                                                           
[0192/0343] saving transformer.h.13.mixer.out_proj.bias
                                                           
[0193/0343] saving transformer.h.13.mlp.fc1.q_weight
                                                           
[0194/0343] saving transformer.h.13.mlp.fc1.q_scale
                                                           
[0195/0343] saving transformer.h.13.mlp.fc1.bias
                                                           
[0196/0343] saving transformer.h.13.mlp.fc2.q_weight
                                                           
[0197/0343] saving transformer.h.13.mlp.fc2.q_scale
                                                           
[0198/0343] saving transformer.h.13.mlp.fc2.bias
                                                           
[0199/0343] saving transformer.h.14.ln.weight
                                                           
[0200/0343] saving transformer.h.14.ln.bias
                                                           
[0201/0343] saving transformer.h.14.mixer.Wqkv.q_weight
                                                           
[0202/0343] saving transformer.h.14.mixer.Wqkv.q_scale
                                                           
[0203/0343] saving transformer.h.14.mixer.Wqkv.bias
                                                           
[0204/0343] saving transformer.h.14.mixer.out_proj.q_weight
                                                           
[0205/0343] saving transformer.h.14.mixer.out_proj.q_scale
                                                           
[0206/0343] saving transformer.h.14.mixer.out_proj.bias
                                                           
[0207/0343] saving transformer.h.14.mlp.fc1.q_weight
                                                           
[0208/0343] saving transformer.h.14.mlp.fc1.q_scale
                                                           
[0209/0343] saving transformer.h.14.mlp.fc1.bias
                                                           
[0210/0343] saving transformer.h.14.mlp.fc2.q_weight
                                                           
[0211/0343] saving transformer.h.14.mlp.fc2.q_scale
                                                           
[0212/0343] saving transformer.h.14.mlp.fc2.bias
                                                           
[0213/0343] saving transformer.h.15.ln.weight
                                                           
[0214/0343] saving transformer.h.15.ln.bias
                                                           
[0215/0343] saving transformer.h.15.mixer.Wqkv.q_weight
                                                           
[0216/0343] saving transformer.h.15.mixer.Wqkv.q_scale
                                                           
[0217/0343] saving transformer.h.15.mixer.Wqkv.bias
                                                           
[0218/0343] saving transformer.h.15.mixer.out_proj.q_weight
                                                           
[0219/0343] saving transformer.h.15.mixer.out_proj.q_scale
                                                           
[0220/0343] saving transformer.h.15.mixer.out_proj.bias
                                                           
[0221/0343] saving transformer.h.15.mlp.fc1.q_weight
                                                           
[0222/0343] saving transformer.h.15.mlp.fc1.q_scale
                                                           
[0223/0343] saving transformer.h.15.mlp.fc1.bias
                                                           
[0224/0343] saving transformer.h.15.mlp.fc2.q_weight
                                                           
[0225/0343] saving transformer.h.15.mlp.fc2.q_scale
                                                           
[0226/0343] saving transformer.h.15.mlp.fc2.bias
                                                           
[0227/0343] saving transformer.h.16.ln.weight
                                                           
[0228/0343] saving transformer.h.16.ln.bias
                                                           
[0229/0343] saving transformer.h.16.mixer.Wqkv.q_weight
                                                           
[0230/0343] saving transformer.h.16.mixer.Wqkv.q_scale
                                                           
[0231/0343] saving transformer.h.16.mixer.Wqkv.bias
                                                           
[0232/0343] saving transformer.h.16.mixer.out_proj.q_weight
                                                           
[0233/0343] saving transformer.h.16.mixer.out_proj.q_scale
                                                           
[0234/0343] saving transformer.h.16.mixer.out_proj.bias
                                                           
[0235/0343] saving transformer.h.16.mlp.fc1.q_weight
                                                           
[0236/0343] saving transformer.h.16.mlp.fc1.q_scale
                                                           
[0237/0343] saving transformer.h.16.mlp.fc1.bias
                                                           
[0238/0343] saving transformer.h.16.mlp.fc2.q_weight
                                                           
[0239/0343] saving transformer.h.16.mlp.fc2.q_scale
                                                           
[0240/0343] saving transformer.h.16.mlp.fc2.bias
                                                           
[0241/0343] saving transformer.h.17.ln.weight
                                                           
[0242/0343] saving transformer.h.17.ln.bias
                                                           
[0243/0343] saving transformer.h.17.mixer.Wqkv.q_weight
                                                           
[0244/0343] saving transformer.h.17.mixer.Wqkv.q_scale
                                                           
[0245/0343] saving transformer.h.17.mixer.Wqkv.bias
                                                           
[0246/0343] saving transformer.h.17.mixer.out_proj.q_weight
                                                           
[0247/0343] saving transformer.h.17.mixer.out_proj.q_scale
                                                           
[0248/0343] saving transformer.h.17.mixer.out_proj.bias
                                                           
[0249/0343] saving transformer.h.17.mlp.fc1.q_weight
                                                           
[0250/0343] saving transformer.h.17.mlp.fc1.q_scale
                                                           
[0251/0343] saving transformer.h.17.mlp.fc1.bias
                                                           
[0252/0343] saving transformer.h.17.mlp.fc2.q_weight
                                                           
[0253/0343] saving transformer.h.17.mlp.fc2.q_scale
                                                           
[0254/0343] saving transformer.h.17.mlp.fc2.bias
                                                           
[0255/0343] saving transformer.h.18.ln.weight
                                                           
[0256/0343] saving transformer.h.18.ln.bias
                                                           
[0257/0343] saving transformer.h.18.mixer.Wqkv.q_weight
                                                           
[0258/0343] saving transformer.h.18.mixer.Wqkv.q_scale
                                                           
[0259/0343] saving transformer.h.18.mixer.Wqkv.bias
                                                           
[0260/0343] saving transformer.h.18.mixer.out_proj.q_weight
                                                           
[0261/0343] saving transformer.h.18.mixer.out_proj.q_scale
                                                           
[0262/0343] saving transformer.h.18.mixer.out_proj.bias
                                                           
[0263/0343] saving transformer.h.18.mlp.fc1.q_weight
                                                           
[0264/0343] saving transformer.h.18.mlp.fc1.q_scale
                                                           
[0265/0343] saving transformer.h.18.mlp.fc1.bias
                                                           
[0266/0343] saving transformer.h.18.mlp.fc2.q_weight
                                                           
[0267/0343] saving transformer.h.18.mlp.fc2.q_scale
                                                           
[0268/0343] saving transformer.h.18.mlp.fc2.bias
                                                           
[0269/0343] saving transformer.h.19.ln.weight
                                                           
[0270/0343] saving transformer.h.19.ln.bias
                                                           
[0271/0343] saving transformer.h.19.mixer.Wqkv.q_weight
                                                           
[0272/0343] saving transformer.h.19.mixer.Wqkv.q_scale
                                                           
[0273/0343] saving transformer.h.19.mixer.Wqkv.bias
                                                           
[0274/0343] saving transformer.h.19.mixer.out_proj.q_weight
                                                           
[0275/0343] saving transformer.h.19.mixer.out_proj.q_scale
                                                           
[0276/0343] saving transformer.h.19.mixer.out_proj.bias
                                                           
[0277/0343] saving transformer.h.19.mlp.fc1.q_weight
                                                           
[0278/0343] saving transformer.h.19.mlp.fc1.q_scale
                                                           
[0279/0343] saving transformer.h.19.mlp.fc1.bias
                                                           
[0280/0343] saving transformer.h.19.mlp.fc2.q_weight
                                                           
[0281/0343] saving transformer.h.19.mlp.fc2.q_scale
                                                           
[0282/0343] saving transformer.h.19.mlp.fc2.bias
                                                           
[0283/0343] saving transformer.h.20.ln.weight
                                                           
[0284/0343] saving transformer.h.20.ln.bias
                                                           
[0285/0343] saving transformer.h.20.mixer.Wqkv.q_weight
                                                           
[0286/0343] saving transformer.h.20.mixer.Wqkv.q_scale
                                                           
[0287/0343] saving transformer.h.20.mixer.Wqkv.bias
                                                           
[0288/0343] saving transformer.h.20.mixer.out_proj.q_weight
                                                           
[0289/0343] saving transformer.h.20.mixer.out_proj.q_scale
                                                           
[0290/0343] saving transformer.h.20.mixer.out_proj.bias
                                                           
[0291/0343] saving transformer.h.20.mlp.fc1.q_weight
                                                           
[0292/0343] saving transformer.h.20.mlp.fc1.q_scale[2023-12-28 23:33:30] INFO convert_weight.py:132: Saved to directory: /tmp/tmpxe445xtc

                                                           
[0293/0343] saving transformer.h.20.mlp.fc1.bias
                                                           
[0294/0343] saving transformer.h.20.mlp.fc2.q_weight
                                                           
[0295/0343] saving transformer.h.20.mlp.fc2.q_scale
                                                           
[0296/0343] saving transformer.h.20.mlp.fc2.bias
                                                           
[0297/0343] saving transformer.h.21.ln.weight
                                                           
[0298/0343] saving transformer.h.21.ln.bias
                                                           
[0299/0343] saving transformer.h.21.mixer.Wqkv.q_weight
                                                           
[0300/0343] saving transformer.h.21.mixer.Wqkv.q_scale
                                                           
[0301/0343] saving transformer.h.21.mixer.Wqkv.bias
                                                           
[0302/0343] saving transformer.h.21.mixer.out_proj.q_weight
                                                           
[0303/0343] saving transformer.h.21.mixer.out_proj.q_scale
                                                           
[0304/0343] saving transformer.h.21.mixer.out_proj.bias
                                                           
[0305/0343] saving transformer.h.21.mlp.fc1.q_weight
                                                           
[0306/0343] saving transformer.h.21.mlp.fc1.q_scale
                                                           
[0307/0343] saving transformer.h.21.mlp.fc1.bias
                                                           
[0308/0343] saving transformer.h.21.mlp.fc2.q_weight
                                                           
[0309/0343] saving transformer.h.21.mlp.fc2.q_scale
                                                           
[0310/0343] saving transformer.h.21.mlp.fc2.bias
                                                           
[0311/0343] saving transformer.h.22.ln.weight
                                                           
[0312/0343] saving transformer.h.22.ln.bias
                                                           
[0313/0343] saving transformer.h.22.mixer.Wqkv.q_weight
                                                           
[0314/0343] saving transformer.h.22.mixer.Wqkv.q_scale
                                                           
[0315/0343] saving transformer.h.22.mixer.Wqkv.bias
                                                           
[0316/0343] saving transformer.h.22.mixer.out_proj.q_weight
                                                           
[0317/0343] saving transformer.h.22.mixer.out_proj.q_scale
                                                           
[0318/0343] saving transformer.h.22.mixer.out_proj.bias
                                                           
[0319/0343] saving transformer.h.22.mlp.fc1.q_weight
                                                           
[0320/0343] saving transformer.h.22.mlp.fc1.q_scale
                                                           
[0321/0343] saving transformer.h.22.mlp.fc1.bias
                                                           
[0322/0343] saving transformer.h.22.mlp.fc2.q_weight
                                                           
[0323/0343] saving transformer.h.22.mlp.fc2.q_scale
                                                           
[0324/0343] saving transformer.h.22.mlp.fc2.bias
                                                           
[0325/0343] saving transformer.h.23.ln.weight
                                                           
[0326/0343] saving transformer.h.23.ln.bias
                                                           
[0327/0343] saving transformer.h.23.mixer.Wqkv.q_weight
                                                           
[0328/0343] saving transformer.h.23.mixer.Wqkv.q_scale
                                                           
[0329/0343] saving transformer.h.23.mixer.Wqkv.bias
                                                           
[0330/0343] saving transformer.h.23.mixer.out_proj.q_weight
                                                           
[0331/0343] saving transformer.h.23.mixer.out_proj.q_scale
                                                           
[0332/0343] saving transformer.h.23.mixer.out_proj.bias
                                                           
[0333/0343] saving transformer.h.23.mlp.fc1.q_weight
                                                           
[0334/0343] saving transformer.h.23.mlp.fc1.q_scale
                                                           
[0335/0343] saving transformer.h.23.mlp.fc1.bias
                                                           
[0336/0343] saving transformer.h.23.mlp.fc2.q_weight
                                                           
[0337/0343] saving transformer.h.23.mlp.fc2.q_scale
                                                           
[0338/0343] saving transformer.h.23.mlp.fc2.bias
                                                           
[0339/0343] saving lm_head.ln.weight
                                                           
[0340/0343] saving lm_head.ln.bias
                                                           
[0341/0343] saving lm_head.linear.q_weight
                                                           
[0342/0343] saving lm_head.linear.q_scale
                                                           
[0343/0343] saving lm_head.linear.bias
All finished, 27 total shards committed, record saved to /tmp/tmpxe445xtc/ndarray-cache.json