File size: 130,581 Bytes
89e9da9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
/home/floriadmin/miniforge3/envs/mlc/bin/python -m mlc_llm gen_config ../dist/models/Qwen1.5-4B --quantization q8f32_1 --conv-template chatml --output /tmp/tmpvomo8uva
[2024-03-18 19:32:31] INFO auto_config.py:115: Found model configuration: ../dist/models/Qwen1.5-4B/config.json
[2024-03-18 19:32:31] INFO auto_config.py:153: Found model type: qwen2. Use `--model-type` to override.
[2024-03-18 19:32:31] INFO qwen2_model.py:46: context_window_size not found in config.json. Falling back to max_position_embeddings (32768)
[2024-03-18 19:32:31] INFO qwen2_model.py:60: prefill_chunk_size defaults to context_window_size (32768)
[2024-03-18 19:32:31] WARNING config.py:99: Warning: Cannot override max_batch_size, because QWen2Config does not have this field
[2024-03-18 19:32:31] INFO gen_config.py:133: [generation_config.json] Setting bos_token_id: 151643
[2024-03-18 19:32:31] INFO gen_config.py:133: [generation_config.json] Setting eos_token_id: 151643
[2024-03-18 19:32:31] INFO gen_config.py:147: Not found tokenizer config: ../dist/models/Qwen1.5-4B/tokenizer.model
[2024-03-18 19:32:31] INFO gen_config.py:145: Found tokenizer config: ../dist/models/Qwen1.5-4B/tokenizer.json. Copying to /tmp/tmpvomo8uva/tokenizer.json
[2024-03-18 19:32:31] INFO gen_config.py:145: Found tokenizer config: ../dist/models/Qwen1.5-4B/vocab.json. Copying to /tmp/tmpvomo8uva/vocab.json
[2024-03-18 19:32:31] INFO gen_config.py:145: Found tokenizer config: ../dist/models/Qwen1.5-4B/merges.txt. Copying to /tmp/tmpvomo8uva/merges.txt
[2024-03-18 19:32:31] INFO gen_config.py:147: Not found tokenizer config: ../dist/models/Qwen1.5-4B/added_tokens.json
[2024-03-18 19:32:31] INFO gen_config.py:145: Found tokenizer config: ../dist/models/Qwen1.5-4B/tokenizer_config.json. Copying to /tmp/tmpvomo8uva/tokenizer_config.json
[2024-03-18 19:32:31] INFO gen_config.py:75: [System default] Setting pad_token_id: 0
[2024-03-18 19:32:31] INFO gen_config.py:75: [System default] Setting temperature: 0.7
[2024-03-18 19:32:31] INFO gen_config.py:75: [System default] Setting presence_penalty: 0.0
[2024-03-18 19:32:31] INFO gen_config.py:75: [System default] Setting frequency_penalty: 0.0
[2024-03-18 19:32:31] INFO gen_config.py:75: [System default] Setting repetition_penalty: 1.0
[2024-03-18 19:32:31] INFO gen_config.py:75: [System default] Setting top_p: 0.95
[2024-03-18 19:32:31] INFO gen_config.py:75: [System default] Setting mean_gen_len: 128
[2024-03-18 19:32:31] INFO gen_config.py:75: [System default] Setting max_gen_len: 512
[2024-03-18 19:32:31] INFO gen_config.py:75: [System default] Setting shift_fill_factor: 0.3
[2024-03-18 19:32:31] INFO gen_config.py:198: Dumping configuration file to: /tmp/tmpvomo8uva/mlc-chat-config.json
/home/floriadmin/miniforge3/envs/mlc/bin/python -m mlc_llm convert_weight ../dist/models/Qwen1.5-4B --quantization q8f32_1 --source-format auto --output /tmp/tmpvomo8uva
[2024-03-18 19:32:32] INFO auto_config.py:115: Found model configuration: ../dist/models/Qwen1.5-4B/config.json
[2024-03-18 19:32:33] INFO auto_device.py:76: Found device: cuda:0
[2024-03-18 19:32:33] INFO auto_device.py:76: Found device: cuda:1
[2024-03-18 19:32:33] INFO auto_device.py:76: Found device: cuda:2
[2024-03-18 19:32:33] INFO auto_device.py:76: Found device: cuda:3
[2024-03-18 19:32:33] INFO auto_device.py:76: Found device: cuda:4
[2024-03-18 19:32:33] INFO auto_device.py:76: Found device: cuda:5
[2024-03-18 19:32:33] INFO auto_device.py:76: Found device: cuda:6
[2024-03-18 19:32:33] INFO auto_device.py:76: Found device: cuda:7
[2024-03-18 19:32:33] INFO auto_device.py:76: Found device: cuda:8
[2024-03-18 19:32:33] INFO auto_device.py:76: Found device: cuda:9
[2024-03-18 19:32:34] INFO auto_device.py:85: Not found device: rocm:0
[2024-03-18 19:32:35] INFO auto_device.py:85: Not found device: metal:0
[2024-03-18 19:32:39] INFO auto_device.py:76: Found device: vulkan:0
[2024-03-18 19:32:39] INFO auto_device.py:76: Found device: vulkan:1
[2024-03-18 19:32:39] INFO auto_device.py:76: Found device: vulkan:2
[2024-03-18 19:32:39] INFO auto_device.py:76: Found device: vulkan:3
[2024-03-18 19:32:39] INFO auto_device.py:76: Found device: vulkan:4
[2024-03-18 19:32:39] INFO auto_device.py:76: Found device: vulkan:5
[2024-03-18 19:32:39] INFO auto_device.py:76: Found device: vulkan:6
[2024-03-18 19:32:39] INFO auto_device.py:76: Found device: vulkan:7
[2024-03-18 19:32:39] INFO auto_device.py:76: Found device: vulkan:8
[2024-03-18 19:32:39] INFO auto_device.py:76: Found device: vulkan:9
[2024-03-18 19:32:39] INFO auto_device.py:76: Found device: vulkan:10
[2024-03-18 19:32:40] INFO auto_device.py:85: Not found device: opencl:0
[2024-03-18 19:32:40] INFO auto_device.py:33: Using device: cuda:0
[2024-03-18 19:32:40] INFO auto_weight.py:70: Finding weights in: ../dist/models/Qwen1.5-4B
[2024-03-18 19:32:40] INFO auto_weight.py:136: Not found Huggingface PyTorch
[2024-03-18 19:32:40] INFO auto_weight.py:143: Found source weight format: huggingface-safetensor. Source configuration: ../dist/models/Qwen1.5-4B/model.safetensors.index.json
[2024-03-18 19:32:40] INFO auto_weight.py:106: Using source weight configuration: ../dist/models/Qwen1.5-4B/model.safetensors.index.json. Use `--source` to override.
[2024-03-18 19:32:40] INFO auto_weight.py:110: Using source weight format: huggingface-safetensor. Use `--source-format` to override.
[2024-03-18 19:32:40] INFO auto_config.py:153: Found model type: qwen2. Use `--model-type` to override.
[2024-03-18 19:32:40] INFO qwen2_model.py:46: context_window_size not found in config.json. Falling back to max_position_embeddings (32768)
[2024-03-18 19:32:40] INFO qwen2_model.py:60: prefill_chunk_size defaults to context_window_size (32768)
Weight conversion with arguments:
  --config          ../dist/models/Qwen1.5-4B/config.json
  --quantization    GroupQuantize(name='q8f32_1', kind='group-quant', group_size=32, quantize_dtype='int8', storage_dtype='uint32', model_dtype='float32', linear_weight_layout='NK', quantize_embedding=True, quantize_final_fc=True, num_elem_per_storage=4, num_storage_per_group=8, max_int_value=127)
  --model-type      qwen2
  --device          cuda:0
  --source          ../dist/models/Qwen1.5-4B/model.safetensors.index.json
  --source-format   huggingface-safetensor
  --output          /tmp/tmpvomo8uva
Start storing to cache /tmp/tmpvomo8uva

  0%|                                                                                                    | 0/283 [00:00<?, ?it/s]
                                                                                                                                 
[2024-03-18 19:32:43] INFO huggingface_loader.py:182: Loading HF parameters from: ../dist/models/Qwen1.5-4B/model-00002-of-00002.safetensors

  0%|                                                                                                    | 0/283 [00:00<?, ?it/s]
                                                                                                                                 
[2024-03-18 19:32:51] INFO group_quantization.py:232: Compiling quantize function for key: ((151936, 2560), float32, cuda, axis=1, output_transpose=False)

  0%|                                                                                                    | 0/283 [00:08<?, ?it/s]
                                                                                                                                 
[2024-03-18 19:32:52] INFO huggingface_loader.py:164: [Quantized] Parameter: "lm_head.q_weight", shape: (151936, 640), dtype: uint32

  0%|                                                                                                    | 0/283 [00:09<?, ?it/s]
                                                                                                                                 
[2024-03-18 19:32:54] INFO huggingface_loader.py:164: [Quantized] Parameter: "lm_head.q_scale", shape: (151936, 80), dtype: float32

  0%|                                                                                                    | 0/283 [00:11<?, ?it/s]/home/floriadmin/miniforge3/envs/mlc/lib/python3.11/site-packages/numpy/core/getlimits.py:549: UserWarning: The value of the smallest subnormal for <class 'numpy.float32'> type is zero.
  setattr(self, word, getattr(machar, word).flat[0])
/home/floriadmin/miniforge3/envs/mlc/lib/python3.11/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for <class 'numpy.float32'> type is zero.
  return self._float_to_str(self.smallest_subnormal)

  0%|β–Ž                                                                                           | 1/283 [00:11<54:47, 11.66s/it]
                                                                                                                                 
[2024-03-18 19:32:54] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.20.input_layernorm.weight", shape: (2560,), dtype: float32

  0%|β–Ž                                                                                           | 1/283 [00:11<54:47, 11.66s/it]
                                                                                                                                 
[2024-03-18 19:32:55] INFO group_quantization.py:232: Compiling quantize function for key: ((2560, 6912), float32, cuda, axis=1, output_transpose=False)

  0%|β–Ž                                                                                           | 1/283 [00:11<54:47, 11.66s/it]
                                                                                                                                 
[2024-03-18 19:32:55] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.20.mlp.down_proj.q_weight", shape: (2560, 1728), dtype: uint32

  0%|β–Ž                                                                                           | 1/283 [00:12<54:47, 11.66s/it]
                                                                                                                                 
[2024-03-18 19:32:55] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.20.mlp.down_proj.q_scale", shape: (2560, 216), dtype: float32

  0%|β–Ž                                                                                           | 1/283 [00:12<54:47, 11.66s/it]
  1%|β–‰                                                                                           | 3/283 [00:12<15:28,  3.32s/it]
                                                                                                                                 
[2024-03-18 19:32:56] INFO group_quantization.py:232: Compiling quantize function for key: ((13824, 2560), float32, cuda, axis=1, output_transpose=False)

  1%|β–‰                                                                                           | 3/283 [00:12<15:28,  3.32s/it]
                                                                                                                                 
[2024-03-18 19:32:56] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.20.mlp.gate_up_proj.q_weight", shape: (13824, 640), dtype: uint32

  1%|β–‰                                                                                           | 3/283 [00:13<15:28,  3.32s/it]
                                                                                                                                 
[2024-03-18 19:32:56] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.20.mlp.gate_up_proj.q_scale", shape: (13824, 80), dtype: float32

  1%|β–‰                                                                                           | 3/283 [00:13<15:28,  3.32s/it]
  1%|β–ˆβ–Ž                                                                                          | 4/283 [00:13<11:59,  2.58s/it]
                                                                                                                                 
[2024-03-18 19:32:56] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.20.post_attention_layernorm.weight", shape: (2560,), dtype: float32

  1%|β–ˆβ–Ž                                                                                          | 4/283 [00:13<11:59,  2.58s/it]
                                                                                                                                 
[2024-03-18 19:32:56] INFO group_quantization.py:232: Compiling quantize function for key: ((2560, 2560), float32, cuda, axis=1, output_transpose=False)

  1%|β–ˆβ–Ž                                                                                          | 4/283 [00:13<11:59,  2.58s/it]
                                                                                                                                 
[2024-03-18 19:32:57] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.20.self_attn.o_proj.q_weight", shape: (2560, 640), dtype: uint32

  1%|β–ˆβ–Ž                                                                                          | 4/283 [00:14<11:59,  2.58s/it]
                                                                                                                                 
[2024-03-18 19:32:57] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.20.self_attn.o_proj.q_scale", shape: (2560, 80), dtype: float32

  1%|β–ˆβ–Ž                                                                                          | 4/283 [00:14<11:59,  2.58s/it]
  2%|β–ˆβ–‰                                                                                          | 6/283 [00:14<06:55,  1.50s/it]
                                                                                                                                 
[2024-03-18 19:32:57] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.21.input_layernorm.weight", shape: (2560,), dtype: float32

  2%|β–ˆβ–‰                                                                                          | 6/283 [00:14<06:55,  1.50s/it]
                                                                                                                                 
[2024-03-18 19:32:57] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.21.mlp.down_proj.q_weight", shape: (2560, 1728), dtype: uint32

  2%|β–ˆβ–‰                                                                                          | 6/283 [00:14<06:55,  1.50s/it]
                                                                                                                                 
[2024-03-18 19:32:57] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.21.mlp.down_proj.q_scale", shape: (2560, 216), dtype: float32

  2%|β–ˆβ–‰                                                                                          | 6/283 [00:14<06:55,  1.50s/it]
  3%|β–ˆβ–ˆβ–Œ                                                                                         | 8/283 [00:14<04:17,  1.07it/s]
                                                                                                                                 
[2024-03-18 19:32:58] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.21.mlp.gate_up_proj.q_weight", shape: (13824, 640), dtype: uint32

  3%|β–ˆβ–ˆβ–Œ                                                                                         | 8/283 [00:15<04:17,  1.07it/s]
                                                                                                                                 
[2024-03-18 19:32:58] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.21.mlp.gate_up_proj.q_scale", shape: (13824, 80), dtype: float32

  3%|β–ˆβ–ˆβ–Œ                                                                                         | 8/283 [00:15<04:17,  1.07it/s]
  3%|β–ˆβ–ˆβ–‰                                                                                         | 9/283 [00:15<03:57,  1.15it/s]
                                                                                                                                 
[2024-03-18 19:32:58] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.21.post_attention_layernorm.weight", shape: (2560,), dtype: float32

  3%|β–ˆβ–ˆβ–‰                                                                                         | 9/283 [00:15<03:57,  1.15it/s]
                                                                                                                                 
[2024-03-18 19:32:58] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.21.self_attn.c_attn.bias", shape: (7680,), dtype: float32

  3%|β–ˆβ–ˆβ–‰                                                                                         | 9/283 [00:15<03:57,  1.15it/s]
                                                                                                                                 
[2024-03-18 19:32:58] INFO group_quantization.py:232: Compiling quantize function for key: ((7680, 2560), float32, cuda, axis=1, output_transpose=False)

  3%|β–ˆβ–ˆβ–‰                                                                                         | 9/283 [00:15<03:57,  1.15it/s]
                                                                                                                                 
[2024-03-18 19:32:59] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.21.self_attn.c_attn.q_weight", shape: (7680, 640), dtype: uint32

  3%|β–ˆβ–ˆβ–‰                                                                                         | 9/283 [00:16<03:57,  1.15it/s]
                                                                                                                                 
[2024-03-18 19:32:59] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.21.self_attn.c_attn.q_scale", shape: (7680, 80), dtype: float32

  3%|β–ˆβ–ˆβ–‰                                                                                         | 9/283 [00:16<03:57,  1.15it/s]
  4%|β–ˆβ–ˆβ–ˆβ–Š                                                                                       | 12/283 [00:16<02:34,  1.75it/s]
                                                                                                                                 
[2024-03-18 19:32:59] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.21.self_attn.o_proj.q_weight", shape: (2560, 640), dtype: uint32

  4%|β–ˆβ–ˆβ–ˆβ–Š                                                                                       | 12/283 [00:16<02:34,  1.75it/s]
                                                                                                                                 
[2024-03-18 19:32:59] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.21.self_attn.o_proj.q_scale", shape: (2560, 80), dtype: float32

  4%|β–ˆβ–ˆβ–ˆβ–Š                                                                                       | 12/283 [00:16<02:34,  1.75it/s]
  5%|β–ˆβ–ˆβ–ˆβ–ˆβ–                                                                                      | 13/283 [00:16<02:12,  2.04it/s]
                                                                                                                                 
[2024-03-18 19:32:59] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.22.input_layernorm.weight", shape: (2560,), dtype: float32

  5%|β–ˆβ–ˆβ–ˆβ–ˆβ–                                                                                      | 13/283 [00:16<02:12,  2.04it/s]
                                                                                                                                 
[2024-03-18 19:32:59] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.22.mlp.down_proj.q_weight", shape: (2560, 1728), dtype: uint32

  5%|β–ˆβ–ˆβ–ˆβ–ˆβ–                                                                                      | 13/283 [00:16<02:12,  2.04it/s]
                                                                                                                                 
[2024-03-18 19:32:59] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.22.mlp.down_proj.q_scale", shape: (2560, 216), dtype: float32

  5%|β–ˆβ–ˆβ–ˆβ–ˆβ–                                                                                      | 13/283 [00:16<02:12,  2.04it/s]
  5%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š                                                                                      | 15/283 [00:16<01:37,  2.74it/s]
                                                                                                                                 
[2024-03-18 19:33:00] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.22.mlp.gate_up_proj.q_weight", shape: (13824, 640), dtype: uint32

  5%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š                                                                                      | 15/283 [00:17<01:37,  2.74it/s]
                                                                                                                                 
[2024-03-18 19:33:00] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.22.mlp.gate_up_proj.q_scale", shape: (13824, 80), dtype: float32

  5%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š                                                                                      | 15/283 [00:17<01:37,  2.74it/s]
  6%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                                     | 16/283 [00:17<01:52,  2.36it/s]
                                                                                                                                 
[2024-03-18 19:33:00] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.22.post_attention_layernorm.weight", shape: (2560,), dtype: float32

  6%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                                     | 16/283 [00:17<01:52,  2.36it/s]
                                                                                                                                 
[2024-03-18 19:33:00] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.22.self_attn.c_attn.bias", shape: (7680,), dtype: float32

  6%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                                     | 16/283 [00:17<01:52,  2.36it/s]
                                                                                                                                 
[2024-03-18 19:33:00] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.22.self_attn.c_attn.q_weight", shape: (7680, 640), dtype: uint32

  6%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                                     | 16/283 [00:17<01:52,  2.36it/s]
                                                                                                                                 
[2024-03-18 19:33:00] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.22.self_attn.c_attn.q_scale", shape: (7680, 80), dtype: float32

  6%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                                     | 16/283 [00:17<01:52,  2.36it/s]
  7%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                                                                                     | 19/283 [00:17<01:13,  3.58it/s]
                                                                                                                                 
[2024-03-18 19:33:00] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.22.self_attn.o_proj.q_weight", shape: (2560, 640), dtype: uint32

  7%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                                                                                     | 19/283 [00:17<01:13,  3.58it/s]
                                                                                                                                 
[2024-03-18 19:33:00] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.22.self_attn.o_proj.q_scale", shape: (2560, 80), dtype: float32

  7%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                                                                                     | 19/283 [00:17<01:13,  3.58it/s]
  7%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                                    | 20/283 [00:17<01:07,  3.89it/s]
                                                                                                                                 
[2024-03-18 19:33:00] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.23.input_layernorm.weight", shape: (2560,), dtype: float32

  7%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                                    | 20/283 [00:17<01:07,  3.89it/s]
                                                                                                                                 
[2024-03-18 19:33:01] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.23.mlp.down_proj.q_weight", shape: (2560, 1728), dtype: uint32

  7%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                                    | 20/283 [00:17<01:07,  3.89it/s]
                                                                                                                                 
[2024-03-18 19:33:01] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.23.mlp.down_proj.q_scale", shape: (2560, 216), dtype: float32

  7%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                                    | 20/283 [00:18<01:07,  3.89it/s]
  8%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                                                                                    | 22/283 [00:18<00:56,  4.62it/s]
                                                                                                                                 
[2024-03-18 19:33:01] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.23.mlp.gate_up_proj.q_weight", shape: (13824, 640), dtype: uint32

  8%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                                                                                    | 22/283 [00:18<00:56,  4.62it/s]
                                                                                                                                 
[2024-03-18 19:33:01] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.23.mlp.gate_up_proj.q_scale", shape: (13824, 80), dtype: float32

  8%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                                                                                    | 22/283 [00:18<00:56,  4.62it/s]
  8%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                                   | 23/283 [00:18<01:16,  3.39it/s]
                                                                                                                                 
[2024-03-18 19:33:01] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.23.post_attention_layernorm.weight", shape: (2560,), dtype: float32

  8%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                                   | 23/283 [00:18<01:16,  3.39it/s]
                                                                                                                                 
[2024-03-18 19:33:01] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.23.self_attn.c_attn.bias", shape: (7680,), dtype: float32

  8%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                                   | 23/283 [00:18<01:16,  3.39it/s]
                                                                                                                                 
[2024-03-18 19:33:02] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.23.self_attn.c_attn.q_weight", shape: (7680, 640), dtype: uint32

  8%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                                   | 23/283 [00:18<01:16,  3.39it/s]
                                                                                                                                 
[2024-03-18 19:33:02] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.23.self_attn.c_attn.q_scale", shape: (7680, 80), dtype: float32

  8%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                                   | 23/283 [00:18<01:16,  3.39it/s]
  9%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž                                                                                  | 26/283 [00:18<00:54,  4.73it/s]
                                                                                                                                 
[2024-03-18 19:33:02] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.23.self_attn.o_proj.q_weight", shape: (2560, 640), dtype: uint32

  9%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž                                                                                  | 26/283 [00:19<00:54,  4.73it/s]
                                                                                                                                 
[2024-03-18 19:33:02] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.23.self_attn.o_proj.q_scale", shape: (2560, 80), dtype: float32

  9%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž                                                                                  | 26/283 [00:19<00:54,  4.73it/s]
 10%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                                                                                  | 27/283 [00:19<00:51,  4.98it/s]
                                                                                                                                 
[2024-03-18 19:33:02] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.24.input_layernorm.weight", shape: (2560,), dtype: float32

 10%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                                                                                  | 27/283 [00:19<00:51,  4.98it/s]
                                                                                                                                 
[2024-03-18 19:33:02] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.24.mlp.down_proj.q_weight", shape: (2560, 1728), dtype: uint32

 10%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                                                                                  | 27/283 [00:19<00:51,  4.98it/s]
                                                                                                                                 
[2024-03-18 19:33:02] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.24.mlp.down_proj.q_scale", shape: (2560, 216), dtype: float32

 10%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                                                                                  | 27/283 [00:19<00:51,  4.98it/s]
 10%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž                                                                                 | 29/283 [00:19<00:45,  5.62it/s]
                                                                                                                                 
[2024-03-18 19:33:03] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.24.mlp.gate_up_proj.q_weight", shape: (13824, 640), dtype: uint32

 10%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž                                                                                 | 29/283 [00:19<00:45,  5.62it/s]
                                                                                                                                 
[2024-03-18 19:33:03] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.24.mlp.gate_up_proj.q_scale", shape: (13824, 80), dtype: float32

 10%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž                                                                                 | 29/283 [00:20<00:45,  5.62it/s]
 11%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                                                                                 | 30/283 [00:20<01:06,  3.80it/s]
                                                                                                                                 
[2024-03-18 19:33:03] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.24.post_attention_layernorm.weight", shape: (2560,), dtype: float32

 11%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                                                                                 | 30/283 [00:20<01:06,  3.80it/s]
                                                                                                                                 
[2024-03-18 19:33:03] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.24.self_attn.c_attn.bias", shape: (7680,), dtype: float32

 11%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                                                                                 | 30/283 [00:20<01:06,  3.80it/s]
                                                                                                                                 
[2024-03-18 19:33:03] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.24.self_attn.c_attn.q_weight", shape: (7680, 640), dtype: uint32

 11%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                                                                                 | 30/283 [00:20<01:06,  3.80it/s]
                                                                                                                                 
[2024-03-18 19:33:03] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.24.self_attn.c_attn.q_scale", shape: (7680, 80), dtype: float32

 11%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                                                                                 | 30/283 [00:20<01:06,  3.80it/s]
 12%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                                                                                | 33/283 [00:20<00:48,  5.13it/s]
                                                                                                                                 
[2024-03-18 19:33:03] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.24.self_attn.o_proj.q_weight", shape: (2560, 640), dtype: uint32

 12%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                                                                                | 33/283 [00:20<00:48,  5.13it/s]
                                                                                                                                 
[2024-03-18 19:33:03] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.24.self_attn.o_proj.q_scale", shape: (2560, 80), dtype: float32

 12%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                                                                                | 33/283 [00:20<00:48,  5.13it/s]
 12%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰                                                                                | 34/283 [00:20<00:46,  5.34it/s]
                                                                                                                                 
[2024-03-18 19:33:03] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.25.input_layernorm.weight", shape: (2560,), dtype: float32

 12%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰                                                                                | 34/283 [00:20<00:46,  5.34it/s]
                                                                                                                                 
[2024-03-18 19:33:03] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.25.mlp.down_proj.q_weight", shape: (2560, 1728), dtype: uint32

 12%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰                                                                                | 34/283 [00:20<00:46,  5.34it/s]
                                                                                                                                 
[2024-03-18 19:33:03] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.25.mlp.down_proj.q_scale", shape: (2560, 216), dtype: float32

 12%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰                                                                                | 34/283 [00:20<00:46,  5.34it/s]
 13%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                                                                               | 36/283 [00:20<00:41,  5.94it/s]
                                                                                                                                 
[2024-03-18 19:33:04] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.25.mlp.gate_up_proj.q_weight", shape: (13824, 640), dtype: uint32

 13%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                                                                               | 36/283 [00:21<00:41,  5.94it/s]
                                                                                                                                 
[2024-03-18 19:33:04] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.25.mlp.gate_up_proj.q_scale", shape: (13824, 80), dtype: float32

 13%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                                                                               | 36/283 [00:21<00:41,  5.94it/s]
 13%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰                                                                               | 37/283 [00:21<01:03,  3.86it/s]
                                                                                                                                 
[2024-03-18 19:33:04] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.25.post_attention_layernorm.weight", shape: (2560,), dtype: float32

 13%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰                                                                               | 37/283 [00:21<01:03,  3.86it/s]
                                                                                                                                 
[2024-03-18 19:33:04] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.25.self_attn.c_attn.bias", shape: (7680,), dtype: float32

 13%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰                                                                               | 37/283 [00:21<01:03,  3.86it/s]
                                                                                                                                 
[2024-03-18 19:33:04] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.25.self_attn.c_attn.q_weight", shape: (7680, 640), dtype: uint32

 13%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰                                                                               | 37/283 [00:21<01:03,  3.86it/s]
                                                                                                                                 
[2024-03-18 19:33:04] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.25.self_attn.c_attn.q_scale", shape: (7680, 80), dtype: float32

 13%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰                                                                               | 37/283 [00:21<01:03,  3.86it/s]
 14%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                                                                              | 40/283 [00:21<00:46,  5.18it/s]
                                                                                                                                 
[2024-03-18 19:33:05] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.25.self_attn.o_proj.q_weight", shape: (2560, 640), dtype: uint32

 14%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                                                                              | 40/283 [00:21<00:46,  5.18it/s]
                                                                                                                                 
[2024-03-18 19:33:05] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.25.self_attn.o_proj.q_scale", shape: (2560, 80), dtype: float32

 14%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                                                                              | 40/283 [00:21<00:46,  5.18it/s]
 14%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                             | 41/283 [00:21<00:44,  5.40it/s]
                                                                                                                                 
[2024-03-18 19:33:05] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.26.input_layernorm.weight", shape: (2560,), dtype: float32

 14%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                             | 41/283 [00:21<00:44,  5.40it/s]
                                                                                                                                 
[2024-03-18 19:33:05] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.26.mlp.down_proj.q_weight", shape: (2560, 1728), dtype: uint32

 14%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                             | 41/283 [00:22<00:44,  5.40it/s]
                                                                                                                                 
[2024-03-18 19:33:05] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.26.mlp.down_proj.q_scale", shape: (2560, 216), dtype: float32

 14%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                             | 41/283 [00:22<00:44,  5.40it/s]
 15%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                                                                             | 43/283 [00:22<00:39,  6.01it/s]
                                                                                                                                 
[2024-03-18 19:33:05] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.26.mlp.gate_up_proj.q_weight", shape: (13824, 640), dtype: uint32

 15%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                                                                             | 43/283 [00:22<00:39,  6.01it/s]
                                                                                                                                 
[2024-03-18 19:33:05] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.26.mlp.gate_up_proj.q_scale", shape: (13824, 80), dtype: float32

 15%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                                                                             | 43/283 [00:22<00:39,  6.01it/s]
 16%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                            | 44/283 [00:22<01:00,  3.93it/s]
                                                                                                                                 
[2024-03-18 19:33:05] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.26.post_attention_layernorm.weight", shape: (2560,), dtype: float32

 16%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                            | 44/283 [00:22<01:00,  3.93it/s]
                                                                                                                                 
[2024-03-18 19:33:05] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.26.self_attn.c_attn.bias", shape: (7680,), dtype: float32

 16%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                            | 44/283 [00:22<01:00,  3.93it/s]
                                                                                                                                 
[2024-03-18 19:33:06] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.26.self_attn.c_attn.q_weight", shape: (7680, 640), dtype: uint32

 16%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                            | 44/283 [00:23<01:00,  3.93it/s]
                                                                                                                                 
[2024-03-18 19:33:06] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.26.self_attn.c_attn.q_scale", shape: (7680, 80), dtype: float32

 16%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                            | 44/283 [00:23<01:00,  3.93it/s]
 17%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                                                                            | 47/283 [00:23<00:44,  5.25it/s]
                                                                                                                                 
[2024-03-18 19:33:06] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.26.self_attn.o_proj.q_weight", shape: (2560, 640), dtype: uint32

 17%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                                                                            | 47/283 [00:23<00:44,  5.25it/s]
                                                                                                                                 
[2024-03-18 19:33:06] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.26.self_attn.o_proj.q_scale", shape: (2560, 80), dtype: float32

 17%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                                                                            | 47/283 [00:23<00:44,  5.25it/s]
 17%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                           | 48/283 [00:23<00:43,  5.43it/s]
                                                                                                                                 
[2024-03-18 19:33:06] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.27.input_layernorm.weight", shape: (2560,), dtype: float32

 17%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                           | 48/283 [00:23<00:43,  5.43it/s]
                                                                                                                                 
[2024-03-18 19:33:06] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.27.mlp.down_proj.q_weight", shape: (2560, 1728), dtype: uint32

 17%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                           | 48/283 [00:23<00:43,  5.43it/s]
                                                                                                                                 
[2024-03-18 19:33:06] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.27.mlp.down_proj.q_scale", shape: (2560, 216), dtype: float32

 17%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                           | 48/283 [00:23<00:43,  5.43it/s]
 18%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                                                                           | 50/283 [00:23<00:38,  6.01it/s]
                                                                                                                                 
[2024-03-18 19:33:07] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.27.mlp.gate_up_proj.q_weight", shape: (13824, 640), dtype: uint32

 18%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                                                                           | 50/283 [00:23<00:38,  6.01it/s]
                                                                                                                                 
[2024-03-18 19:33:07] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.27.mlp.gate_up_proj.q_scale", shape: (13824, 80), dtype: float32

 18%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                                                                           | 50/283 [00:24<00:38,  6.01it/s]
 18%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                          | 51/283 [00:24<00:58,  3.95it/s]
                                                                                                                                 
[2024-03-18 19:33:07] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.27.post_attention_layernorm.weight", shape: (2560,), dtype: float32

 18%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                          | 51/283 [00:24<00:58,  3.95it/s]
                                                                                                                                 
[2024-03-18 19:33:07] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.27.self_attn.c_attn.bias", shape: (7680,), dtype: float32

 18%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                          | 51/283 [00:24<00:58,  3.95it/s]
                                                                                                                                 
[2024-03-18 19:33:07] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.27.self_attn.c_attn.q_weight", shape: (7680, 640), dtype: uint32

 18%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                          | 51/283 [00:24<00:58,  3.95it/s]
                                                                                                                                 
[2024-03-18 19:33:07] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.27.self_attn.c_attn.q_scale", shape: (7680, 80), dtype: float32

 18%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                          | 51/283 [00:24<00:58,  3.95it/s]
 19%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž                                                                         | 54/283 [00:24<00:43,  5.29it/s]
                                                                                                                                 
[2024-03-18 19:33:07] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.27.self_attn.o_proj.q_weight", shape: (2560, 640), dtype: uint32

 19%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž                                                                         | 54/283 [00:24<00:43,  5.29it/s]
                                                                                                                                 
[2024-03-18 19:33:07] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.27.self_attn.o_proj.q_scale", shape: (2560, 80), dtype: float32

 19%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž                                                                         | 54/283 [00:24<00:43,  5.29it/s]
 19%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                                                                         | 55/283 [00:24<00:41,  5.49it/s]
                                                                                                                                 
[2024-03-18 19:33:07] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.28.input_layernorm.weight", shape: (2560,), dtype: float32

 19%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                                                                         | 55/283 [00:24<00:41,  5.49it/s]
                                                                                                                                 
[2024-03-18 19:33:08] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.28.mlp.down_proj.q_weight", shape: (2560, 1728), dtype: uint32

 19%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                                                                         | 55/283 [00:24<00:41,  5.49it/s]
                                                                                                                                 
[2024-03-18 19:33:08] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.28.mlp.down_proj.q_scale", shape: (2560, 216), dtype: float32

 19%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                                                                         | 55/283 [00:24<00:41,  5.49it/s]
 20%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž                                                                        | 57/283 [00:24<00:37,  6.05it/s]
                                                                                                                                 
[2024-03-18 19:33:08] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.28.mlp.gate_up_proj.q_weight", shape: (13824, 640), dtype: uint32

 20%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž                                                                        | 57/283 [00:25<00:37,  6.05it/s]
                                                                                                                                 
[2024-03-18 19:33:08] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.28.mlp.gate_up_proj.q_scale", shape: (13824, 80), dtype: float32

 20%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž                                                                        | 57/283 [00:25<00:37,  6.05it/s]
 20%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                                                                        | 58/283 [00:25<00:56,  3.96it/s]
                                                                                                                                 
[2024-03-18 19:33:08] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.28.post_attention_layernorm.weight", shape: (2560,), dtype: float32

 20%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                                                                        | 58/283 [00:25<00:56,  3.96it/s]
                                                                                                                                 
[2024-03-18 19:33:08] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.28.self_attn.c_attn.bias", shape: (7680,), dtype: float32

 20%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                                                                        | 58/283 [00:25<00:56,  3.96it/s]
                                                                                                                                 
[2024-03-18 19:33:08] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.28.self_attn.c_attn.q_weight", shape: (7680, 640), dtype: uint32

 20%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                                                                        | 58/283 [00:25<00:56,  3.96it/s]
                                                                                                                                 
[2024-03-18 19:33:09] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.28.self_attn.c_attn.q_scale", shape: (7680, 80), dtype: float32

 20%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                                                                        | 58/283 [00:25<00:56,  3.96it/s]
 22%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                                                                       | 61/283 [00:25<00:41,  5.30it/s]
                                                                                                                                 
[2024-03-18 19:33:09] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.28.self_attn.o_proj.q_weight", shape: (2560, 640), dtype: uint32

 22%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                                                                       | 61/283 [00:25<00:41,  5.30it/s]
                                                                                                                                 
[2024-03-18 19:33:09] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.28.self_attn.o_proj.q_scale", shape: (2560, 80), dtype: float32

 22%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                                                                       | 61/283 [00:25<00:41,  5.30it/s]
 22%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰                                                                       | 62/283 [00:25<00:40,  5.51it/s]
                                                                                                                                 
[2024-03-18 19:33:09] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.29.input_layernorm.weight", shape: (2560,), dtype: float32

 22%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰                                                                       | 62/283 [00:25<00:40,  5.51it/s]
                                                                                                                                 
[2024-03-18 19:33:09] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.29.mlp.down_proj.q_weight", shape: (2560, 1728), dtype: uint32

 22%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰                                                                       | 62/283 [00:26<00:40,  5.51it/s]
                                                                                                                                 
[2024-03-18 19:33:09] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.29.mlp.down_proj.q_scale", shape: (2560, 216), dtype: float32

 22%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰                                                                       | 62/283 [00:26<00:40,  5.51it/s]
 23%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                                                                      | 64/283 [00:26<00:36,  6.05it/s]
                                                                                                                                 
[2024-03-18 19:33:09] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.29.mlp.gate_up_proj.q_weight", shape: (13824, 640), dtype: uint32

 23%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                                                                      | 64/283 [00:26<00:36,  6.05it/s]
                                                                                                                                 
[2024-03-18 19:33:10] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.29.mlp.gate_up_proj.q_scale", shape: (13824, 80), dtype: float32

 23%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                                                                      | 64/283 [00:26<00:36,  6.05it/s]
 23%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰                                                                      | 65/283 [00:26<00:55,  3.95it/s]
                                                                                                                                 
[2024-03-18 19:33:10] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.29.post_attention_layernorm.weight", shape: (2560,), dtype: float32

 23%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰                                                                      | 65/283 [00:26<00:55,  3.95it/s]
                                                                                                                                 
[2024-03-18 19:33:10] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.29.self_attn.c_attn.bias", shape: (7680,), dtype: float32

 23%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰                                                                      | 65/283 [00:26<00:55,  3.95it/s]
                                                                                                                                 
[2024-03-18 19:33:10] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.29.self_attn.c_attn.q_weight", shape: (7680, 640), dtype: uint32

 23%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰                                                                      | 65/283 [00:27<00:55,  3.95it/s]
                                                                                                                                 
[2024-03-18 19:33:10] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.29.self_attn.c_attn.q_scale", shape: (7680, 80), dtype: float32

 23%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰                                                                      | 65/283 [00:27<00:55,  3.95it/s]
 24%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                                                                     | 68/283 [00:27<00:40,  5.29it/s]
                                                                                                                                 
[2024-03-18 19:33:10] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.29.self_attn.o_proj.q_weight", shape: (2560, 640), dtype: uint32

 24%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                                                                     | 68/283 [00:27<00:40,  5.29it/s]
                                                                                                                                 
[2024-03-18 19:33:10] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.29.self_attn.o_proj.q_scale", shape: (2560, 80), dtype: float32

 24%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                                                                     | 68/283 [00:27<00:40,  5.29it/s]
 24%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                    | 69/283 [00:27<00:38,  5.49it/s]
                                                                                                                                 
[2024-03-18 19:33:10] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.30.input_layernorm.weight", shape: (2560,), dtype: float32

 24%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                    | 69/283 [00:27<00:38,  5.49it/s]
                                                                                                                                 
[2024-03-18 19:33:10] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.30.mlp.down_proj.q_weight", shape: (2560, 1728), dtype: uint32

 24%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                    | 69/283 [00:27<00:38,  5.49it/s]
                                                                                                                                 
[2024-03-18 19:33:10] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.30.mlp.down_proj.q_scale", shape: (2560, 216), dtype: float32

 24%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                    | 69/283 [00:27<00:38,  5.49it/s]
 25%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                                                                    | 71/283 [00:27<00:35,  6.06it/s]
                                                                                                                                 
[2024-03-18 19:33:11] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.30.mlp.gate_up_proj.q_weight", shape: (13824, 640), dtype: uint32

 25%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                                                                    | 71/283 [00:28<00:35,  6.06it/s]
                                                                                                                                 
[2024-03-18 19:33:11] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.30.mlp.gate_up_proj.q_scale", shape: (13824, 80), dtype: float32

 25%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                                                                    | 71/283 [00:28<00:35,  6.06it/s]
 25%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                   | 72/283 [00:28<00:53,  3.93it/s]
                                                                                                                                 
[2024-03-18 19:33:11] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.30.post_attention_layernorm.weight", shape: (2560,), dtype: float32

 25%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                   | 72/283 [00:28<00:53,  3.93it/s]
                                                                                                                                 
[2024-03-18 19:33:11] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.30.self_attn.c_attn.bias", shape: (7680,), dtype: float32

 25%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                   | 72/283 [00:28<00:53,  3.93it/s]
                                                                                                                                 
[2024-03-18 19:33:11] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.30.self_attn.c_attn.q_weight", shape: (7680, 640), dtype: uint32

 25%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                   | 72/283 [00:28<00:53,  3.93it/s]
                                                                                                                                 
[2024-03-18 19:33:11] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.30.self_attn.c_attn.q_scale", shape: (7680, 80), dtype: float32

 25%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                   | 72/283 [00:28<00:53,  3.93it/s]
 27%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                                                                   | 75/283 [00:28<00:39,  5.26it/s]
                                                                                                                                 
[2024-03-18 19:33:11] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.30.self_attn.o_proj.q_weight", shape: (2560, 640), dtype: uint32

 27%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                                                                   | 75/283 [00:28<00:39,  5.26it/s]
                                                                                                                                 
[2024-03-18 19:33:11] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.30.self_attn.o_proj.q_scale", shape: (2560, 80), dtype: float32

 27%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                                                                   | 75/283 [00:28<00:39,  5.26it/s]
 27%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                  | 76/283 [00:28<00:37,  5.45it/s]
                                                                                                                                 
[2024-03-18 19:33:11] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.31.input_layernorm.weight", shape: (2560,), dtype: float32

 27%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                  | 76/283 [00:28<00:37,  5.45it/s]
                                                                                                                                 
[2024-03-18 19:33:12] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.31.mlp.down_proj.q_weight", shape: (2560, 1728), dtype: uint32

 27%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                  | 76/283 [00:28<00:37,  5.45it/s]
                                                                                                                                 
[2024-03-18 19:33:12] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.31.mlp.down_proj.q_scale", shape: (2560, 216), dtype: float32

 27%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                  | 76/283 [00:29<00:37,  5.45it/s]
 28%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                                                                  | 78/283 [00:29<00:34,  6.01it/s]
                                                                                                                                 
[2024-03-18 19:33:12] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.31.mlp.gate_up_proj.q_weight", shape: (13824, 640), dtype: uint32

 28%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                                                                  | 78/283 [00:29<00:34,  6.01it/s]
                                                                                                                                 
[2024-03-18 19:33:12] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.31.mlp.gate_up_proj.q_scale", shape: (13824, 80), dtype: float32

 28%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                                                                  | 78/283 [00:29<00:34,  6.01it/s]
 28%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                 | 79/283 [00:29<00:52,  3.92it/s]
                                                                                                                                 
[2024-03-18 19:33:12] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.31.post_attention_layernorm.weight", shape: (2560,), dtype: float32

 28%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                 | 79/283 [00:29<00:52,  3.92it/s]
                                                                                                                                 
[2024-03-18 19:33:12] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.31.self_attn.c_attn.bias", shape: (7680,), dtype: float32

 28%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                 | 79/283 [00:29<00:52,  3.92it/s]
                                                                                                                                 
[2024-03-18 19:33:13] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.31.self_attn.c_attn.q_weight", shape: (7680, 640), dtype: uint32

 28%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                 | 79/283 [00:29<00:52,  3.92it/s]
                                                                                                                                 
[2024-03-18 19:33:13] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.31.self_attn.c_attn.q_scale", shape: (7680, 80), dtype: float32

 28%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                 | 79/283 [00:29<00:52,  3.92it/s]
 29%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž                                                                | 82/283 [00:29<00:38,  5.26it/s]
                                                                                                                                 
[2024-03-18 19:33:13] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.31.self_attn.o_proj.q_weight", shape: (2560, 640), dtype: uint32

 29%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž                                                                | 82/283 [00:30<00:38,  5.26it/s]
                                                                                                                                 
[2024-03-18 19:33:13] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.31.self_attn.o_proj.q_scale", shape: (2560, 80), dtype: float32

 29%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž                                                                | 82/283 [00:30<00:38,  5.26it/s]
 29%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                                                                | 83/283 [00:30<00:36,  5.44it/s]
                                                                                                                                 
[2024-03-18 19:33:13] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.32.input_layernorm.weight", shape: (2560,), dtype: float32

 29%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                                                                | 83/283 [00:30<00:36,  5.44it/s]
                                                                                                                                 
[2024-03-18 19:33:13] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.32.mlp.down_proj.q_weight", shape: (2560, 1728), dtype: uint32

 29%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                                                                | 83/283 [00:30<00:36,  5.44it/s]
                                                                                                                                 
[2024-03-18 19:33:13] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.32.mlp.down_proj.q_scale", shape: (2560, 216), dtype: float32

 29%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                                                                | 83/283 [00:30<00:36,  5.44it/s]
 30%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž                                                               | 85/283 [00:30<00:33,  5.99it/s]
                                                                                                                                 
[2024-03-18 19:33:14] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.32.mlp.gate_up_proj.q_weight", shape: (13824, 640), dtype: uint32

 30%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž                                                               | 85/283 [00:30<00:33,  5.99it/s]
                                                                                                                                 
[2024-03-18 19:33:14] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.32.mlp.gate_up_proj.q_scale", shape: (13824, 80), dtype: float32

 30%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž                                                               | 85/283 [00:31<00:33,  5.99it/s]
 30%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                                                               | 86/283 [00:31<00:51,  3.86it/s]
                                                                                                                                 
[2024-03-18 19:33:14] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.32.post_attention_layernorm.weight", shape: (2560,), dtype: float32

 30%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                                                               | 86/283 [00:31<00:51,  3.86it/s]
                                                                                                                                 
[2024-03-18 19:33:14] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.32.self_attn.c_attn.bias", shape: (7680,), dtype: float32

 30%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                                                               | 86/283 [00:31<00:51,  3.86it/s]
                                                                                                                                 
[2024-03-18 19:33:14] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.32.self_attn.c_attn.q_weight", shape: (7680, 640), dtype: uint32

 30%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                                                               | 86/283 [00:31<00:51,  3.86it/s]
                                                                                                                                 
[2024-03-18 19:33:14] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.32.self_attn.c_attn.q_scale", shape: (7680, 80), dtype: float32

 30%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                                                               | 86/283 [00:31<00:51,  3.86it/s]
 31%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                                                              | 89/283 [00:31<00:37,  5.20it/s]
                                                                                                                                 
[2024-03-18 19:33:14] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.32.self_attn.o_proj.q_weight", shape: (2560, 640), dtype: uint32

 31%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                                                              | 89/283 [00:31<00:37,  5.20it/s]
                                                                                                                                 
[2024-03-18 19:33:14] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.32.self_attn.o_proj.q_scale", shape: (2560, 80), dtype: float32

 31%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                                                              | 89/283 [00:31<00:37,  5.20it/s]
 32%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰                                                              | 90/283 [00:31<00:36,  5.35it/s]
                                                                                                                                 
[2024-03-18 19:33:14] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.33.input_layernorm.weight", shape: (2560,), dtype: float32

 32%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰                                                              | 90/283 [00:31<00:36,  5.35it/s]
                                                                                                                                 
[2024-03-18 19:33:14] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.33.mlp.down_proj.q_weight", shape: (2560, 1728), dtype: uint32

 32%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰                                                              | 90/283 [00:31<00:36,  5.35it/s]
                                                                                                                                 
[2024-03-18 19:33:14] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.33.mlp.down_proj.q_scale", shape: (2560, 216), dtype: float32

 32%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰                                                              | 90/283 [00:31<00:36,  5.35it/s]
 33%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                                                             | 92/283 [00:31<00:32,  5.96it/s]
                                                                                                                                 
[2024-03-18 19:33:15] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.33.mlp.gate_up_proj.q_weight", shape: (13824, 640), dtype: uint32

 33%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                                                             | 92/283 [00:32<00:32,  5.96it/s]
                                                                                                                                 
[2024-03-18 19:33:15] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.33.mlp.gate_up_proj.q_scale", shape: (13824, 80), dtype: float32

 33%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                                                             | 92/283 [00:32<00:32,  5.96it/s]
 33%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰                                                             | 93/283 [00:32<00:50,  3.78it/s]
                                                                                                                                 
[2024-03-18 19:33:15] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.33.post_attention_layernorm.weight", shape: (2560,), dtype: float32

 33%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰                                                             | 93/283 [00:32<00:50,  3.78it/s]
                                                                                                                                 
[2024-03-18 19:33:15] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.33.self_attn.c_attn.bias", shape: (7680,), dtype: float32

 33%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰                                                             | 93/283 [00:32<00:50,  3.78it/s]
                                                                                                                                 
[2024-03-18 19:33:15] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.33.self_attn.c_attn.q_weight", shape: (7680, 640), dtype: uint32

 33%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰                                                             | 93/283 [00:32<00:50,  3.78it/s]
                                                                                                                                 
[2024-03-18 19:33:15] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.33.self_attn.c_attn.q_scale", shape: (7680, 80), dtype: float32

 33%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰                                                             | 93/283 [00:32<00:50,  3.78it/s]
 34%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                                                            | 96/283 [00:32<00:36,  5.11it/s]
                                                                                                                                 
[2024-03-18 19:33:16] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.33.self_attn.o_proj.q_weight", shape: (2560, 640), dtype: uint32

 34%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                                                            | 96/283 [00:32<00:36,  5.11it/s]
                                                                                                                                 
[2024-03-18 19:33:16] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.33.self_attn.o_proj.q_scale", shape: (2560, 80), dtype: float32

 34%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                                                            | 96/283 [00:32<00:36,  5.11it/s]
 34%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                           | 97/283 [00:32<00:34,  5.34it/s]
                                                                                                                                 
[2024-03-18 19:33:16] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.34.input_layernorm.weight", shape: (2560,), dtype: float32

 34%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                           | 97/283 [00:32<00:34,  5.34it/s]
                                                                                                                                 
[2024-03-18 19:33:16] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.34.mlp.down_proj.q_weight", shape: (2560, 1728), dtype: uint32

 34%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                           | 97/283 [00:33<00:34,  5.34it/s]
                                                                                                                                 
[2024-03-18 19:33:16] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.34.mlp.down_proj.q_scale", shape: (2560, 216), dtype: float32

 34%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                           | 97/283 [00:33<00:34,  5.34it/s]
 35%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                                                           | 99/283 [00:33<00:31,  5.86it/s]
                                                                                                                                 
[2024-03-18 19:33:16] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.34.mlp.gate_up_proj.q_weight", shape: (13824, 640), dtype: uint32

 35%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                                                           | 99/283 [00:33<00:31,  5.86it/s]
                                                                                                                                 
[2024-03-18 19:33:17] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.34.mlp.gate_up_proj.q_scale", shape: (13824, 80), dtype: float32

 35%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                                                           | 99/283 [00:33<00:31,  5.86it/s]
 35%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                                                          | 100/283 [00:33<00:47,  3.85it/s]
                                                                                                                                 
[2024-03-18 19:33:17] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.34.post_attention_layernorm.weight", shape: (2560,), dtype: float32

 35%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                                                          | 100/283 [00:33<00:47,  3.85it/s]
                                                                                                                                 
[2024-03-18 19:33:17] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.34.self_attn.c_attn.bias", shape: (7680,), dtype: float32

 35%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                                                          | 100/283 [00:33<00:47,  3.85it/s]
                                                                                                                                 
[2024-03-18 19:33:17] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.34.self_attn.c_attn.q_weight", shape: (7680, 640), dtype: uint32

 35%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                                                          | 100/283 [00:34<00:47,  3.85it/s]
                                                                                                                                 
[2024-03-18 19:33:17] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.34.self_attn.c_attn.q_scale", shape: (7680, 80), dtype: float32

 35%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                                                          | 100/283 [00:34<00:47,  3.85it/s]
 36%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                                                         | 103/283 [00:34<00:34,  5.17it/s]
                                                                                                                                 
[2024-03-18 19:33:17] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.34.self_attn.o_proj.q_weight", shape: (2560, 640), dtype: uint32

 36%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                                                         | 103/283 [00:34<00:34,  5.17it/s]
                                                                                                                                 
[2024-03-18 19:33:17] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.34.self_attn.o_proj.q_scale", shape: (2560, 80), dtype: float32

 36%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                                                         | 103/283 [00:34<00:34,  5.17it/s]
 37%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                                                         | 104/283 [00:34<00:33,  5.29it/s]
                                                                                                                                 
[2024-03-18 19:33:17] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.35.input_layernorm.weight", shape: (2560,), dtype: float32

 37%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                                                         | 104/283 [00:34<00:33,  5.29it/s]
                                                                                                                                 
[2024-03-18 19:33:17] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.35.mlp.down_proj.q_weight", shape: (2560, 1728), dtype: uint32

 37%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                                                         | 104/283 [00:34<00:33,  5.29it/s]
                                                                                                                                 
[2024-03-18 19:33:17] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.35.mlp.down_proj.q_scale", shape: (2560, 216), dtype: float32

 37%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                                                         | 104/283 [00:34<00:33,  5.29it/s]
 37%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                                                        | 106/283 [00:34<00:29,  5.91it/s]
                                                                                                                                 
[2024-03-18 19:33:18] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.35.mlp.gate_up_proj.q_weight", shape: (13824, 640), dtype: uint32

 37%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                                                        | 106/283 [00:35<00:29,  5.91it/s]
                                                                                                                                 
[2024-03-18 19:33:18] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.35.mlp.gate_up_proj.q_scale", shape: (13824, 80), dtype: float32

 37%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                                                        | 106/283 [00:35<00:29,  5.91it/s]
 38%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                                                        | 107/283 [00:35<00:45,  3.86it/s]
                                                                                                                                 
[2024-03-18 19:33:18] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.35.post_attention_layernorm.weight", shape: (2560,), dtype: float32

 38%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                                                        | 107/283 [00:35<00:45,  3.86it/s]
                                                                                                                                 
[2024-03-18 19:33:18] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.35.self_attn.c_attn.bias", shape: (7680,), dtype: float32

 38%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                                                        | 107/283 [00:35<00:45,  3.86it/s]
                                                                                                                                 
[2024-03-18 19:33:18] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.35.self_attn.c_attn.q_weight", shape: (7680, 640), dtype: uint32

 38%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                                                        | 107/283 [00:35<00:45,  3.86it/s]
                                                                                                                                 
[2024-03-18 19:33:18] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.35.self_attn.c_attn.q_scale", shape: (7680, 80), dtype: float32

 38%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                                                        | 107/283 [00:35<00:45,  3.86it/s]
 39%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰                                                       | 110/283 [00:35<00:33,  5.17it/s]
                                                                                                                                 
[2024-03-18 19:33:18] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.35.self_attn.o_proj.q_weight", shape: (2560, 640), dtype: uint32

 39%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰                                                       | 110/283 [00:35<00:33,  5.17it/s]
                                                                                                                                 
[2024-03-18 19:33:18] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.35.self_attn.o_proj.q_scale", shape: (2560, 80), dtype: float32

 39%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰                                                       | 110/283 [00:35<00:33,  5.17it/s]
 39%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž                                                      | 111/283 [00:35<00:32,  5.33it/s]
                                                                                                                                 
[2024-03-18 19:33:18] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.36.input_layernorm.weight", shape: (2560,), dtype: float32

 39%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž                                                      | 111/283 [00:35<00:32,  5.33it/s]
                                                                                                                                 
[2024-03-18 19:33:19] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.36.mlp.down_proj.q_weight", shape: (2560, 1728), dtype: uint32

 39%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž                                                      | 111/283 [00:35<00:32,  5.33it/s]
                                                                                                                                 
[2024-03-18 19:33:19] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.36.mlp.down_proj.q_scale", shape: (2560, 216), dtype: float32

 39%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž                                                      | 111/283 [00:36<00:32,  5.33it/s]
 40%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰                                                      | 113/283 [00:36<00:28,  5.94it/s]
                                                                                                                                 
[2024-03-18 19:33:19] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.36.mlp.gate_up_proj.q_weight", shape: (13824, 640), dtype: uint32

 40%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰                                                      | 113/283 [00:36<00:28,  5.94it/s]
                                                                                                                                 
[2024-03-18 19:33:20] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.36.mlp.gate_up_proj.q_scale", shape: (13824, 80), dtype: float32

 40%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰                                                      | 113/283 [00:36<00:28,  5.94it/s]
 40%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž                                                     | 114/283 [00:37<00:56,  2.99it/s]
                                                                                                                                 
[2024-03-18 19:33:20] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.36.post_attention_layernorm.weight", shape: (2560,), dtype: float32

 40%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž                                                     | 114/283 [00:37<00:56,  2.99it/s]
                                                                                                                                 
[2024-03-18 19:33:20] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.36.self_attn.c_attn.bias", shape: (7680,), dtype: float32

 40%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž                                                     | 114/283 [00:37<00:56,  2.99it/s]
                                                                                                                                 
[2024-03-18 19:33:20] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.36.self_attn.c_attn.q_weight", shape: (7680, 640), dtype: uint32

 40%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž                                                     | 114/283 [00:37<00:56,  2.99it/s]
                                                                                                                                 
[2024-03-18 19:33:20] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.36.self_attn.c_attn.q_scale", shape: (7680, 80), dtype: float32

 40%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž                                                     | 114/283 [00:37<00:56,  2.99it/s]
 41%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                    | 117/283 [00:37<00:39,  4.23it/s]
                                                                                                                                 
[2024-03-18 19:33:20] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.36.self_attn.o_proj.q_weight", shape: (2560, 640), dtype: uint32

 41%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                    | 117/283 [00:37<00:39,  4.23it/s]
                                                                                                                                 
[2024-03-18 19:33:20] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.36.self_attn.o_proj.q_scale", shape: (2560, 80), dtype: float32

 41%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                    | 117/283 [00:37<00:39,  4.23it/s]
 42%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                                                    | 118/283 [00:37<00:36,  4.51it/s]
                                                                                                                                 
[2024-03-18 19:33:20] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.37.input_layernorm.weight", shape: (2560,), dtype: float32

 42%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                                                    | 118/283 [00:37<00:36,  4.51it/s]
                                                                                                                                 
[2024-03-18 19:33:20] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.37.mlp.down_proj.q_weight", shape: (2560, 1728), dtype: uint32

 42%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                                                    | 118/283 [00:37<00:36,  4.51it/s]
                                                                                                                                 
[2024-03-18 19:33:21] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.37.mlp.down_proj.q_scale", shape: (2560, 216), dtype: float32

 42%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                                                    | 118/283 [00:37<00:36,  4.51it/s]
 42%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                   | 120/283 [00:37<00:31,  5.22it/s]
                                                                                                                                 
[2024-03-18 19:33:21] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.37.mlp.gate_up_proj.q_weight", shape: (13824, 640), dtype: uint32

 42%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                   | 120/283 [00:38<00:31,  5.22it/s]
                                                                                                                                 
[2024-03-18 19:33:21] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.37.mlp.gate_up_proj.q_scale", shape: (13824, 80), dtype: float32

 42%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                   | 120/283 [00:38<00:31,  5.22it/s]
 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                   | 121/283 [00:38<00:44,  3.61it/s]
                                                                                                                                 
[2024-03-18 19:33:21] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.37.post_attention_layernorm.weight", shape: (2560,), dtype: float32

 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                   | 121/283 [00:38<00:44,  3.61it/s]
                                                                                                                                 
[2024-03-18 19:33:21] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.37.self_attn.c_attn.bias", shape: (7680,), dtype: float32

 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                   | 121/283 [00:38<00:44,  3.61it/s]
                                                                                                                                 
[2024-03-18 19:33:21] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.37.self_attn.c_attn.q_weight", shape: (7680, 640), dtype: uint32

 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                   | 121/283 [00:38<00:44,  3.61it/s]
                                                                                                                                 
[2024-03-18 19:33:21] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.37.self_attn.c_attn.q_scale", shape: (7680, 80), dtype: float32

 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                   | 121/283 [00:38<00:44,  3.61it/s]
 44%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                  | 124/283 [00:38<00:32,  4.94it/s]
                                                                                                                                 
[2024-03-18 19:33:22] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.37.self_attn.o_proj.q_weight", shape: (2560, 640), dtype: uint32

 44%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                  | 124/283 [00:38<00:32,  4.94it/s]
                                                                                                                                 
[2024-03-18 19:33:22] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.37.self_attn.o_proj.q_scale", shape: (2560, 80), dtype: float32

 44%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                  | 124/283 [00:38<00:32,  4.94it/s]
 44%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                                                  | 125/283 [00:38<00:30,  5.18it/s]
                                                                                                                                 
[2024-03-18 19:33:22] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.38.input_layernorm.weight", shape: (2560,), dtype: float32

 44%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                                                  | 125/283 [00:38<00:30,  5.18it/s]
                                                                                                                                 
[2024-03-18 19:33:22] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.38.mlp.down_proj.q_weight", shape: (2560, 1728), dtype: uint32

 44%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                                                  | 125/283 [00:39<00:30,  5.18it/s]
                                                                                                                                 
[2024-03-18 19:33:22] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.38.mlp.down_proj.q_scale", shape: (2560, 216), dtype: float32

 44%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                                                  | 125/283 [00:39<00:30,  5.18it/s]
 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                 | 127/283 [00:39<00:26,  5.82it/s]
                                                                                                                                 
[2024-03-18 19:33:23] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.38.mlp.gate_up_proj.q_weight", shape: (13824, 640), dtype: uint32

 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                 | 127/283 [00:39<00:26,  5.82it/s]
                                                                                                                                 
[2024-03-18 19:33:23] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.38.mlp.gate_up_proj.q_scale", shape: (13824, 80), dtype: float32

 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                 | 127/283 [00:40<00:26,  5.82it/s]
 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                                                 | 128/283 [00:40<00:48,  3.22it/s]
                                                                                                                                 
[2024-03-18 19:33:23] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.38.post_attention_layernorm.weight", shape: (2560,), dtype: float32

 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                                                 | 128/283 [00:40<00:48,  3.22it/s]
                                                                                                                                 
[2024-03-18 19:33:23] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.38.self_attn.c_attn.bias", shape: (7680,), dtype: float32

 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                                                 | 128/283 [00:40<00:48,  3.22it/s]
                                                                                                                                 
[2024-03-18 19:33:23] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.38.self_attn.c_attn.q_weight", shape: (7680, 640), dtype: uint32

 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                                                 | 128/283 [00:40<00:48,  3.22it/s]
                                                                                                                                 
[2024-03-18 19:33:23] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.38.self_attn.c_attn.q_scale", shape: (7680, 80), dtype: float32

 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                                                 | 128/283 [00:40<00:48,  3.22it/s]
 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                                                | 131/283 [00:40<00:33,  4.53it/s]
                                                                                                                                 
[2024-03-18 19:33:23] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.38.self_attn.o_proj.q_weight", shape: (2560, 640), dtype: uint32

 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                                                | 131/283 [00:40<00:33,  4.53it/s]
                                                                                                                                 
[2024-03-18 19:33:23] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.38.self_attn.o_proj.q_scale", shape: (2560, 80), dtype: float32

 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                                                | 131/283 [00:40<00:33,  4.53it/s]
 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰                                                | 132/283 [00:40<00:31,  4.77it/s]
                                                                                                                                 
[2024-03-18 19:33:23] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.39.input_layernorm.weight", shape: (2560,), dtype: float32

 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰                                                | 132/283 [00:40<00:31,  4.77it/s]
                                                                                                                                 
[2024-03-18 19:33:23] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.39.mlp.down_proj.q_weight", shape: (2560, 1728), dtype: uint32

 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰                                                | 132/283 [00:40<00:31,  4.77it/s]
                                                                                                                                 
[2024-03-18 19:33:24] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.39.mlp.down_proj.q_scale", shape: (2560, 216), dtype: float32

 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰                                                | 132/283 [00:40<00:31,  4.77it/s]
 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                                               | 134/283 [00:40<00:27,  5.47it/s]
                                                                                                                                 
[2024-03-18 19:33:24] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.39.mlp.gate_up_proj.q_weight", shape: (13824, 640), dtype: uint32

 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                                               | 134/283 [00:41<00:27,  5.47it/s]
                                                                                                                                 
[2024-03-18 19:33:25] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.39.mlp.gate_up_proj.q_scale", shape: (13824, 80), dtype: float32

 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                                               | 134/283 [00:41<00:27,  5.47it/s]
 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰                                               | 135/283 [00:41<00:53,  2.79it/s]
                                                                                                                                 
[2024-03-18 19:33:25] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.39.post_attention_layernorm.weight", shape: (2560,), dtype: float32

 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰                                               | 135/283 [00:41<00:53,  2.79it/s]
                                                                                                                                 
[2024-03-18 19:33:25] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.39.self_attn.c_attn.bias", shape: (7680,), dtype: float32

 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰                                               | 135/283 [00:41<00:53,  2.79it/s]
                                                                                                                                 
[2024-03-18 19:33:25] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.39.self_attn.c_attn.q_weight", shape: (7680, 640), dtype: uint32

 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰                                               | 135/283 [00:42<00:53,  2.79it/s]
                                                                                                                                 
[2024-03-18 19:33:25] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.39.self_attn.c_attn.q_scale", shape: (7680, 80), dtype: float32

 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰                                               | 135/283 [00:42<00:53,  2.79it/s]
 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰                                              | 138/283 [00:42<00:35,  4.04it/s]
                                                                                                                                 
[2024-03-18 19:33:25] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.39.self_attn.o_proj.q_weight", shape: (2560, 640), dtype: uint32

 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰                                              | 138/283 [00:42<00:35,  4.04it/s]
                                                                                                                                 
[2024-03-18 19:33:25] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.39.self_attn.o_proj.q_scale", shape: (2560, 80), dtype: float32

 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰                                              | 138/283 [00:42<00:35,  4.04it/s]
 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                             | 139/283 [00:42<00:33,  4.34it/s]
                                                                                                                                 
[2024-03-18 19:33:25] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.norm.weight", shape: (2560,), dtype: float32

 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                             | 139/283 [00:42<00:33,  4.34it/s]
                                                                                                                                 
[2024-03-18 19:33:25] INFO huggingface_loader.py:194: Unloading HF weight file: ../dist/models/Qwen1.5-4B/model-00002-of-00002.safetensors

 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                             | 139/283 [00:42<00:33,  4.34it/s]
                                                                                                                                 
[2024-03-18 19:33:26] INFO huggingface_loader.py:182: Loading HF parameters from: ../dist/models/Qwen1.5-4B/model-00001-of-00002.safetensors

 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                             | 139/283 [00:42<00:33,  4.34it/s][19:33:33] /workspace/tvm/src/runtime/memory/pooled_allocator.h:65: Warning: PooledAllocator got InternalError during allocation: InternalError: Check failed: (e == cudaSuccess || e == cudaErrorCudartUnloading) is false: CUDA: out of memory
[19:33:33] /workspace/tvm/src/runtime/memory/pooled_allocator.h:66: Warning: Trying to release all unused memory and reallocate...
terminate called after throwing an instance of 'tvm::runtime::InternalError'
  what():  [19:33:33] /workspace/tvm/include/tvm/runtime/packed_func.h:1346: unknown type = 0
Stack trace:
  0: _ZN3tvm7runtime6deta
  1: _ZN3tvm7runtime6memory13MemoryM
  2: _ZN3tvm7runtime18SimpleObjAllocator7HandlerINS0_
  3: tvm::runtime::relax_vm::VMAllocStorage(void*, tvm::runtime::ShapeTuple, long, DLDataType, tvm::runtime::String) [clone .cold]
  4: tvm::runtime::TypedPackedFunc<tvm::runtime::memory::Storage (void*, tvm::runtime::ShapeTuple, long, DLDataType, tvm::runtime::String)>::AssignTypedLambda<tvm::runtime::memory::Storage (*)(void*, tvm::runtime::ShapeTuple, long, DLDataType, tvm::runtime::String)>(tvm::runtime::memory::Storage (*)(void*, tvm::runtime::ShapeTuple, long, DLDataType, tvm::runtime::String), std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}::operator()(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*) const
  5: _ZN3tvm7runtime13PackedFun
  6: tvm::runtime::relax_vm::VirtualMachineImpl::RunInstrCall(tvm::runtime::relax_vm::VMFrame*, tvm::runtime::relax_vm::Instruction)
  7: tvm::runtime::relax_vm::VirtualMachineImpl::RunLoop()
  8: tvm::runtime::relax_vm::VirtualMachineImpl::InvokeBytecode(long, std::vector<tvm::runtime::TVMRetValue, std::allocator<tvm::runtime::TVMRetValue> > const&)
  9: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::relax_vm::VirtualMachineImpl::GetClosureInternal(tvm::runtime::String const&, bool)::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#1}> >::Call(tvm::runtime::PackedFuncObj const*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)
  10: tvm::runtime::relax_vm::VirtualMachineImpl::InvokeClosurePacked(tvm::runtime::ObjectRef const&, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)