/home/floriadmin/miniforge3/envs/mlc/bin/python -m mlc_llm gen_config ../dist/models/Qwen1.5-4B --quantization q8f32_1 --conv-template chatml --output /tmp/tmpvomo8uva [2024-03-18 19:32:31] INFO auto_config.py:115: Found model configuration: ../dist/models/Qwen1.5-4B/config.json [2024-03-18 19:32:31] INFO auto_config.py:153: Found model type: qwen2. Use `--model-type` to override. [2024-03-18 19:32:31] INFO qwen2_model.py:46: context_window_size not found in config.json. Falling back to max_position_embeddings (32768) [2024-03-18 19:32:31] INFO qwen2_model.py:60: prefill_chunk_size defaults to context_window_size (32768) [2024-03-18 19:32:31] WARNING config.py:99: Warning: Cannot override max_batch_size, because QWen2Config does not have this field [2024-03-18 19:32:31] INFO gen_config.py:133: [generation_config.json] Setting bos_token_id: 151643 [2024-03-18 19:32:31] INFO gen_config.py:133: [generation_config.json] Setting eos_token_id: 151643 [2024-03-18 19:32:31] INFO gen_config.py:147: Not found tokenizer config: ../dist/models/Qwen1.5-4B/tokenizer.model [2024-03-18 19:32:31] INFO gen_config.py:145: Found tokenizer config: ../dist/models/Qwen1.5-4B/tokenizer.json. Copying to /tmp/tmpvomo8uva/tokenizer.json [2024-03-18 19:32:31] INFO gen_config.py:145: Found tokenizer config: ../dist/models/Qwen1.5-4B/vocab.json. Copying to /tmp/tmpvomo8uva/vocab.json [2024-03-18 19:32:31] INFO gen_config.py:145: Found tokenizer config: ../dist/models/Qwen1.5-4B/merges.txt. Copying to /tmp/tmpvomo8uva/merges.txt [2024-03-18 19:32:31] INFO gen_config.py:147: Not found tokenizer config: ../dist/models/Qwen1.5-4B/added_tokens.json [2024-03-18 19:32:31] INFO gen_config.py:145: Found tokenizer config: ../dist/models/Qwen1.5-4B/tokenizer_config.json. Copying to /tmp/tmpvomo8uva/tokenizer_config.json [2024-03-18 19:32:31] INFO gen_config.py:75: [System default] Setting pad_token_id: 0 [2024-03-18 19:32:31] INFO gen_config.py:75: [System default] Setting temperature: 0.7 [2024-03-18 19:32:31] INFO gen_config.py:75: [System default] Setting presence_penalty: 0.0 [2024-03-18 19:32:31] INFO gen_config.py:75: [System default] Setting frequency_penalty: 0.0 [2024-03-18 19:32:31] INFO gen_config.py:75: [System default] Setting repetition_penalty: 1.0 [2024-03-18 19:32:31] INFO gen_config.py:75: [System default] Setting top_p: 0.95 [2024-03-18 19:32:31] INFO gen_config.py:75: [System default] Setting mean_gen_len: 128 [2024-03-18 19:32:31] INFO gen_config.py:75: [System default] Setting max_gen_len: 512 [2024-03-18 19:32:31] INFO gen_config.py:75: [System default] Setting shift_fill_factor: 0.3 [2024-03-18 19:32:31] INFO gen_config.py:198: Dumping configuration file to: /tmp/tmpvomo8uva/mlc-chat-config.json /home/floriadmin/miniforge3/envs/mlc/bin/python -m mlc_llm convert_weight ../dist/models/Qwen1.5-4B --quantization q8f32_1 --source-format auto --output /tmp/tmpvomo8uva [2024-03-18 19:32:32] INFO auto_config.py:115: Found model configuration: ../dist/models/Qwen1.5-4B/config.json [2024-03-18 19:32:33] INFO auto_device.py:76: Found device: cuda:0 [2024-03-18 19:32:33] INFO auto_device.py:76: Found device: cuda:1 [2024-03-18 19:32:33] INFO auto_device.py:76: Found device: cuda:2 [2024-03-18 19:32:33] INFO auto_device.py:76: Found device: cuda:3 [2024-03-18 19:32:33] INFO auto_device.py:76: Found device: cuda:4 [2024-03-18 19:32:33] INFO auto_device.py:76: Found device: cuda:5 [2024-03-18 19:32:33] INFO auto_device.py:76: Found device: cuda:6 [2024-03-18 19:32:33] INFO auto_device.py:76: Found device: cuda:7 [2024-03-18 19:32:33] INFO auto_device.py:76: Found device: cuda:8 [2024-03-18 19:32:33] INFO auto_device.py:76: Found device: cuda:9 [2024-03-18 19:32:34] INFO auto_device.py:85: Not found device: rocm:0 [2024-03-18 19:32:35] INFO auto_device.py:85: Not found device: metal:0 [2024-03-18 19:32:39] INFO auto_device.py:76: Found device: vulkan:0 [2024-03-18 19:32:39] INFO auto_device.py:76: Found device: vulkan:1 [2024-03-18 19:32:39] INFO auto_device.py:76: Found device: vulkan:2 [2024-03-18 19:32:39] INFO auto_device.py:76: Found device: vulkan:3 [2024-03-18 19:32:39] INFO auto_device.py:76: Found device: vulkan:4 [2024-03-18 19:32:39] INFO auto_device.py:76: Found device: vulkan:5 [2024-03-18 19:32:39] INFO auto_device.py:76: Found device: vulkan:6 [2024-03-18 19:32:39] INFO auto_device.py:76: Found device: vulkan:7 [2024-03-18 19:32:39] INFO auto_device.py:76: Found device: vulkan:8 [2024-03-18 19:32:39] INFO auto_device.py:76: Found device: vulkan:9 [2024-03-18 19:32:39] INFO auto_device.py:76: Found device: vulkan:10 [2024-03-18 19:32:40] INFO auto_device.py:85: Not found device: opencl:0 [2024-03-18 19:32:40] INFO auto_device.py:33: Using device: cuda:0 [2024-03-18 19:32:40] INFO auto_weight.py:70: Finding weights in: ../dist/models/Qwen1.5-4B [2024-03-18 19:32:40] INFO auto_weight.py:136: Not found Huggingface PyTorch [2024-03-18 19:32:40] INFO auto_weight.py:143: Found source weight format: huggingface-safetensor. Source configuration: ../dist/models/Qwen1.5-4B/model.safetensors.index.json [2024-03-18 19:32:40] INFO auto_weight.py:106: Using source weight configuration: ../dist/models/Qwen1.5-4B/model.safetensors.index.json. Use `--source` to override. [2024-03-18 19:32:40] INFO auto_weight.py:110: Using source weight format: huggingface-safetensor. Use `--source-format` to override. [2024-03-18 19:32:40] INFO auto_config.py:153: Found model type: qwen2. Use `--model-type` to override. [2024-03-18 19:32:40] INFO qwen2_model.py:46: context_window_size not found in config.json. Falling back to max_position_embeddings (32768) [2024-03-18 19:32:40] INFO qwen2_model.py:60: prefill_chunk_size defaults to context_window_size (32768) Weight conversion with arguments: --config ../dist/models/Qwen1.5-4B/config.json --quantization GroupQuantize(name='q8f32_1', kind='group-quant', group_size=32, quantize_dtype='int8', storage_dtype='uint32', model_dtype='float32', linear_weight_layout='NK', quantize_embedding=True, quantize_final_fc=True, num_elem_per_storage=4, num_storage_per_group=8, max_int_value=127) --model-type qwen2 --device cuda:0 --source ../dist/models/Qwen1.5-4B/model.safetensors.index.json --source-format huggingface-safetensor --output /tmp/tmpvomo8uva Start storing to cache /tmp/tmpvomo8uva 0%| | 0/283 [00:00 type is zero. setattr(self, word, getattr(machar, word).flat[0]) /home/floriadmin/miniforge3/envs/mlc/lib/python3.11/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for type is zero. return self._float_to_str(self.smallest_subnormal) 0%|▎ | 1/283 [00:11<54:47, 11.66s/it] [2024-03-18 19:32:54] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.20.input_layernorm.weight", shape: (2560,), dtype: float32 0%|▎ | 1/283 [00:11<54:47, 11.66s/it] [2024-03-18 19:32:55] INFO group_quantization.py:232: Compiling quantize function for key: ((2560, 6912), float32, cuda, axis=1, output_transpose=False) 0%|▎ | 1/283 [00:11<54:47, 11.66s/it] [2024-03-18 19:32:55] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.20.mlp.down_proj.q_weight", shape: (2560, 1728), dtype: uint32 0%|▎ | 1/283 [00:12<54:47, 11.66s/it] [2024-03-18 19:32:55] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.20.mlp.down_proj.q_scale", shape: (2560, 216), dtype: float32 0%|▎ | 1/283 [00:12<54:47, 11.66s/it] 1%|▉ | 3/283 [00:12<15:28, 3.32s/it] [2024-03-18 19:32:56] INFO group_quantization.py:232: Compiling quantize function for key: ((13824, 2560), float32, cuda, axis=1, output_transpose=False) 1%|▉ | 3/283 [00:12<15:28, 3.32s/it] [2024-03-18 19:32:56] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.20.mlp.gate_up_proj.q_weight", shape: (13824, 640), dtype: uint32 1%|▉ | 3/283 [00:13<15:28, 3.32s/it] [2024-03-18 19:32:56] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.20.mlp.gate_up_proj.q_scale", shape: (13824, 80), dtype: float32 1%|▉ | 3/283 [00:13<15:28, 3.32s/it] 1%|█▎ | 4/283 [00:13<11:59, 2.58s/it] [2024-03-18 19:32:56] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.20.post_attention_layernorm.weight", shape: (2560,), dtype: float32 1%|█▎ | 4/283 [00:13<11:59, 2.58s/it] [2024-03-18 19:32:56] INFO group_quantization.py:232: Compiling quantize function for key: ((2560, 2560), float32, cuda, axis=1, output_transpose=False) 1%|█▎ | 4/283 [00:13<11:59, 2.58s/it] [2024-03-18 19:32:57] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.20.self_attn.o_proj.q_weight", shape: (2560, 640), dtype: uint32 1%|█▎ | 4/283 [00:14<11:59, 2.58s/it] [2024-03-18 19:32:57] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.20.self_attn.o_proj.q_scale", shape: (2560, 80), dtype: float32 1%|█▎ | 4/283 [00:14<11:59, 2.58s/it] 2%|█▉ | 6/283 [00:14<06:55, 1.50s/it] [2024-03-18 19:32:57] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.21.input_layernorm.weight", shape: (2560,), dtype: float32 2%|█▉ | 6/283 [00:14<06:55, 1.50s/it] [2024-03-18 19:32:57] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.21.mlp.down_proj.q_weight", shape: (2560, 1728), dtype: uint32 2%|█▉ | 6/283 [00:14<06:55, 1.50s/it] [2024-03-18 19:32:57] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.21.mlp.down_proj.q_scale", shape: (2560, 216), dtype: float32 2%|█▉ | 6/283 [00:14<06:55, 1.50s/it] 3%|██▌ | 8/283 [00:14<04:17, 1.07it/s] [2024-03-18 19:32:58] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.21.mlp.gate_up_proj.q_weight", shape: (13824, 640), dtype: uint32 3%|██▌ | 8/283 [00:15<04:17, 1.07it/s] [2024-03-18 19:32:58] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.21.mlp.gate_up_proj.q_scale", shape: (13824, 80), dtype: float32 3%|██▌ | 8/283 [00:15<04:17, 1.07it/s] 3%|██▉ | 9/283 [00:15<03:57, 1.15it/s] [2024-03-18 19:32:58] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.21.post_attention_layernorm.weight", shape: (2560,), dtype: float32 3%|██▉ | 9/283 [00:15<03:57, 1.15it/s] [2024-03-18 19:32:58] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.21.self_attn.c_attn.bias", shape: (7680,), dtype: float32 3%|██▉ | 9/283 [00:15<03:57, 1.15it/s] [2024-03-18 19:32:58] INFO group_quantization.py:232: Compiling quantize function for key: ((7680, 2560), float32, cuda, axis=1, output_transpose=False) 3%|██▉ | 9/283 [00:15<03:57, 1.15it/s] [2024-03-18 19:32:59] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.21.self_attn.c_attn.q_weight", shape: (7680, 640), dtype: uint32 3%|██▉ | 9/283 [00:16<03:57, 1.15it/s] [2024-03-18 19:32:59] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.21.self_attn.c_attn.q_scale", shape: (7680, 80), dtype: float32 3%|██▉ | 9/283 [00:16<03:57, 1.15it/s] 4%|███▊ | 12/283 [00:16<02:34, 1.75it/s] [2024-03-18 19:32:59] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.21.self_attn.o_proj.q_weight", shape: (2560, 640), dtype: uint32 4%|███▊ | 12/283 [00:16<02:34, 1.75it/s] [2024-03-18 19:32:59] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.21.self_attn.o_proj.q_scale", shape: (2560, 80), dtype: float32 4%|███▊ | 12/283 [00:16<02:34, 1.75it/s] 5%|████▏ | 13/283 [00:16<02:12, 2.04it/s] [2024-03-18 19:32:59] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.22.input_layernorm.weight", shape: (2560,), dtype: float32 5%|████▏ | 13/283 [00:16<02:12, 2.04it/s] [2024-03-18 19:32:59] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.22.mlp.down_proj.q_weight", shape: (2560, 1728), dtype: uint32 5%|████▏ | 13/283 [00:16<02:12, 2.04it/s] [2024-03-18 19:32:59] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.22.mlp.down_proj.q_scale", shape: (2560, 216), dtype: float32 5%|████▏ | 13/283 [00:16<02:12, 2.04it/s] 5%|████▊ | 15/283 [00:16<01:37, 2.74it/s] [2024-03-18 19:33:00] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.22.mlp.gate_up_proj.q_weight", shape: (13824, 640), dtype: uint32 5%|████▊ | 15/283 [00:17<01:37, 2.74it/s] [2024-03-18 19:33:00] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.22.mlp.gate_up_proj.q_scale", shape: (13824, 80), dtype: float32 5%|████▊ | 15/283 [00:17<01:37, 2.74it/s] 6%|█████▏ | 16/283 [00:17<01:52, 2.36it/s] [2024-03-18 19:33:00] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.22.post_attention_layernorm.weight", shape: (2560,), dtype: float32 6%|█████▏ | 16/283 [00:17<01:52, 2.36it/s] [2024-03-18 19:33:00] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.22.self_attn.c_attn.bias", shape: (7680,), dtype: float32 6%|█████▏ | 16/283 [00:17<01:52, 2.36it/s] [2024-03-18 19:33:00] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.22.self_attn.c_attn.q_weight", shape: (7680, 640), dtype: uint32 6%|█████▏ | 16/283 [00:17<01:52, 2.36it/s] [2024-03-18 19:33:00] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.22.self_attn.c_attn.q_scale", shape: (7680, 80), dtype: float32 6%|█████▏ | 16/283 [00:17<01:52, 2.36it/s] 7%|██████ | 19/283 [00:17<01:13, 3.58it/s] [2024-03-18 19:33:00] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.22.self_attn.o_proj.q_weight", shape: (2560, 640), dtype: uint32 7%|██████ | 19/283 [00:17<01:13, 3.58it/s] [2024-03-18 19:33:00] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.22.self_attn.o_proj.q_scale", shape: (2560, 80), dtype: float32 7%|██████ | 19/283 [00:17<01:13, 3.58it/s] 7%|██████▍ | 20/283 [00:17<01:07, 3.89it/s] [2024-03-18 19:33:00] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.23.input_layernorm.weight", shape: (2560,), dtype: float32 7%|██████▍ | 20/283 [00:17<01:07, 3.89it/s] [2024-03-18 19:33:01] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.23.mlp.down_proj.q_weight", shape: (2560, 1728), dtype: uint32 7%|██████▍ | 20/283 [00:17<01:07, 3.89it/s] [2024-03-18 19:33:01] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.23.mlp.down_proj.q_scale", shape: (2560, 216), dtype: float32 7%|██████▍ | 20/283 [00:18<01:07, 3.89it/s] 8%|███████ | 22/283 [00:18<00:56, 4.62it/s] [2024-03-18 19:33:01] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.23.mlp.gate_up_proj.q_weight", shape: (13824, 640), dtype: uint32 8%|███████ | 22/283 [00:18<00:56, 4.62it/s] [2024-03-18 19:33:01] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.23.mlp.gate_up_proj.q_scale", shape: (13824, 80), dtype: float32 8%|███████ | 22/283 [00:18<00:56, 4.62it/s] 8%|███████▍ | 23/283 [00:18<01:16, 3.39it/s] [2024-03-18 19:33:01] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.23.post_attention_layernorm.weight", shape: (2560,), dtype: float32 8%|███████▍ | 23/283 [00:18<01:16, 3.39it/s] [2024-03-18 19:33:01] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.23.self_attn.c_attn.bias", shape: (7680,), dtype: float32 8%|███████▍ | 23/283 [00:18<01:16, 3.39it/s] [2024-03-18 19:33:02] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.23.self_attn.c_attn.q_weight", shape: (7680, 640), dtype: uint32 8%|███████▍ | 23/283 [00:18<01:16, 3.39it/s] [2024-03-18 19:33:02] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.23.self_attn.c_attn.q_scale", shape: (7680, 80), dtype: float32 8%|███████▍ | 23/283 [00:18<01:16, 3.39it/s] 9%|████████▎ | 26/283 [00:18<00:54, 4.73it/s] [2024-03-18 19:33:02] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.23.self_attn.o_proj.q_weight", shape: (2560, 640), dtype: uint32 9%|████████▎ | 26/283 [00:19<00:54, 4.73it/s] [2024-03-18 19:33:02] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.23.self_attn.o_proj.q_scale", shape: (2560, 80), dtype: float32 9%|████████▎ | 26/283 [00:19<00:54, 4.73it/s] 10%|████████▋ | 27/283 [00:19<00:51, 4.98it/s] [2024-03-18 19:33:02] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.24.input_layernorm.weight", shape: (2560,), dtype: float32 10%|████████▋ | 27/283 [00:19<00:51, 4.98it/s] [2024-03-18 19:33:02] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.24.mlp.down_proj.q_weight", shape: (2560, 1728), dtype: uint32 10%|████████▋ | 27/283 [00:19<00:51, 4.98it/s] [2024-03-18 19:33:02] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.24.mlp.down_proj.q_scale", shape: (2560, 216), dtype: float32 10%|████████▋ | 27/283 [00:19<00:51, 4.98it/s] 10%|█████████▎ | 29/283 [00:19<00:45, 5.62it/s] [2024-03-18 19:33:03] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.24.mlp.gate_up_proj.q_weight", shape: (13824, 640), dtype: uint32 10%|█████████▎ | 29/283 [00:19<00:45, 5.62it/s] [2024-03-18 19:33:03] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.24.mlp.gate_up_proj.q_scale", shape: (13824, 80), dtype: float32 10%|█████████▎ | 29/283 [00:20<00:45, 5.62it/s] 11%|█████████▋ | 30/283 [00:20<01:06, 3.80it/s] [2024-03-18 19:33:03] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.24.post_attention_layernorm.weight", shape: (2560,), dtype: float32 11%|█████████▋ | 30/283 [00:20<01:06, 3.80it/s] [2024-03-18 19:33:03] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.24.self_attn.c_attn.bias", shape: (7680,), dtype: float32 11%|█████████▋ | 30/283 [00:20<01:06, 3.80it/s] [2024-03-18 19:33:03] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.24.self_attn.c_attn.q_weight", shape: (7680, 640), dtype: uint32 11%|█████████▋ | 30/283 [00:20<01:06, 3.80it/s] [2024-03-18 19:33:03] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.24.self_attn.c_attn.q_scale", shape: (7680, 80), dtype: float32 11%|█████████▋ | 30/283 [00:20<01:06, 3.80it/s] 12%|██████████▌ | 33/283 [00:20<00:48, 5.13it/s] [2024-03-18 19:33:03] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.24.self_attn.o_proj.q_weight", shape: (2560, 640), dtype: uint32 12%|██████████▌ | 33/283 [00:20<00:48, 5.13it/s] [2024-03-18 19:33:03] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.24.self_attn.o_proj.q_scale", shape: (2560, 80), dtype: float32 12%|██████████▌ | 33/283 [00:20<00:48, 5.13it/s] 12%|██████████▉ | 34/283 [00:20<00:46, 5.34it/s] [2024-03-18 19:33:03] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.25.input_layernorm.weight", shape: (2560,), dtype: float32 12%|██████████▉ | 34/283 [00:20<00:46, 5.34it/s] [2024-03-18 19:33:03] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.25.mlp.down_proj.q_weight", shape: (2560, 1728), dtype: uint32 12%|██████████▉ | 34/283 [00:20<00:46, 5.34it/s] [2024-03-18 19:33:03] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.25.mlp.down_proj.q_scale", shape: (2560, 216), dtype: float32 12%|██████████▉ | 34/283 [00:20<00:46, 5.34it/s] 13%|███████████▌ | 36/283 [00:20<00:41, 5.94it/s] [2024-03-18 19:33:04] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.25.mlp.gate_up_proj.q_weight", shape: (13824, 640), dtype: uint32 13%|███████████▌ | 36/283 [00:21<00:41, 5.94it/s] [2024-03-18 19:33:04] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.25.mlp.gate_up_proj.q_scale", shape: (13824, 80), dtype: float32 13%|███████████▌ | 36/283 [00:21<00:41, 5.94it/s] 13%|███████████▉ | 37/283 [00:21<01:03, 3.86it/s] [2024-03-18 19:33:04] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.25.post_attention_layernorm.weight", shape: (2560,), dtype: float32 13%|███████████▉ | 37/283 [00:21<01:03, 3.86it/s] [2024-03-18 19:33:04] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.25.self_attn.c_attn.bias", shape: (7680,), dtype: float32 13%|███████████▉ | 37/283 [00:21<01:03, 3.86it/s] [2024-03-18 19:33:04] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.25.self_attn.c_attn.q_weight", shape: (7680, 640), dtype: uint32 13%|███████████▉ | 37/283 [00:21<01:03, 3.86it/s] [2024-03-18 19:33:04] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.25.self_attn.c_attn.q_scale", shape: (7680, 80), dtype: float32 13%|███████████▉ | 37/283 [00:21<01:03, 3.86it/s] 14%|████████████▊ | 40/283 [00:21<00:46, 5.18it/s] [2024-03-18 19:33:05] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.25.self_attn.o_proj.q_weight", shape: (2560, 640), dtype: uint32 14%|████████████▊ | 40/283 [00:21<00:46, 5.18it/s] [2024-03-18 19:33:05] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.25.self_attn.o_proj.q_scale", shape: (2560, 80), dtype: float32 14%|████████████▊ | 40/283 [00:21<00:46, 5.18it/s] 14%|█████████████▏ | 41/283 [00:21<00:44, 5.40it/s] [2024-03-18 19:33:05] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.26.input_layernorm.weight", shape: (2560,), dtype: float32 14%|█████████████▏ | 41/283 [00:21<00:44, 5.40it/s] [2024-03-18 19:33:05] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.26.mlp.down_proj.q_weight", shape: (2560, 1728), dtype: uint32 14%|█████████████▏ | 41/283 [00:22<00:44, 5.40it/s] [2024-03-18 19:33:05] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.26.mlp.down_proj.q_scale", shape: (2560, 216), dtype: float32 14%|█████████████▏ | 41/283 [00:22<00:44, 5.40it/s] 15%|█████████████▊ | 43/283 [00:22<00:39, 6.01it/s] [2024-03-18 19:33:05] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.26.mlp.gate_up_proj.q_weight", shape: (13824, 640), dtype: uint32 15%|█████████████▊ | 43/283 [00:22<00:39, 6.01it/s] [2024-03-18 19:33:05] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.26.mlp.gate_up_proj.q_scale", shape: (13824, 80), dtype: float32 15%|█████████████▊ | 43/283 [00:22<00:39, 6.01it/s] 16%|██████████████▏ | 44/283 [00:22<01:00, 3.93it/s] [2024-03-18 19:33:05] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.26.post_attention_layernorm.weight", shape: (2560,), dtype: float32 16%|██████████████▏ | 44/283 [00:22<01:00, 3.93it/s] [2024-03-18 19:33:05] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.26.self_attn.c_attn.bias", shape: (7680,), dtype: float32 16%|██████████████▏ | 44/283 [00:22<01:00, 3.93it/s] [2024-03-18 19:33:06] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.26.self_attn.c_attn.q_weight", shape: (7680, 640), dtype: uint32 16%|██████████████▏ | 44/283 [00:23<01:00, 3.93it/s] [2024-03-18 19:33:06] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.26.self_attn.c_attn.q_scale", shape: (7680, 80), dtype: float32 16%|██████████████▏ | 44/283 [00:23<01:00, 3.93it/s] 17%|███████████████ | 47/283 [00:23<00:44, 5.25it/s] [2024-03-18 19:33:06] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.26.self_attn.o_proj.q_weight", shape: (2560, 640), dtype: uint32 17%|███████████████ | 47/283 [00:23<00:44, 5.25it/s] [2024-03-18 19:33:06] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.26.self_attn.o_proj.q_scale", shape: (2560, 80), dtype: float32 17%|███████████████ | 47/283 [00:23<00:44, 5.25it/s] 17%|███████████████▍ | 48/283 [00:23<00:43, 5.43it/s] [2024-03-18 19:33:06] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.27.input_layernorm.weight", shape: (2560,), dtype: float32 17%|███████████████▍ | 48/283 [00:23<00:43, 5.43it/s] [2024-03-18 19:33:06] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.27.mlp.down_proj.q_weight", shape: (2560, 1728), dtype: uint32 17%|███████████████▍ | 48/283 [00:23<00:43, 5.43it/s] [2024-03-18 19:33:06] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.27.mlp.down_proj.q_scale", shape: (2560, 216), dtype: float32 17%|███████████████▍ | 48/283 [00:23<00:43, 5.43it/s] 18%|████████████████ | 50/283 [00:23<00:38, 6.01it/s] [2024-03-18 19:33:07] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.27.mlp.gate_up_proj.q_weight", shape: (13824, 640), dtype: uint32 18%|████████████████ | 50/283 [00:23<00:38, 6.01it/s] [2024-03-18 19:33:07] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.27.mlp.gate_up_proj.q_scale", shape: (13824, 80), dtype: float32 18%|████████████████ | 50/283 [00:24<00:38, 6.01it/s] 18%|████████████████▍ | 51/283 [00:24<00:58, 3.95it/s] [2024-03-18 19:33:07] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.27.post_attention_layernorm.weight", shape: (2560,), dtype: float32 18%|████████████████▍ | 51/283 [00:24<00:58, 3.95it/s] [2024-03-18 19:33:07] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.27.self_attn.c_attn.bias", shape: (7680,), dtype: float32 18%|████████████████▍ | 51/283 [00:24<00:58, 3.95it/s] [2024-03-18 19:33:07] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.27.self_attn.c_attn.q_weight", shape: (7680, 640), dtype: uint32 18%|████████████████▍ | 51/283 [00:24<00:58, 3.95it/s] [2024-03-18 19:33:07] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.27.self_attn.c_attn.q_scale", shape: (7680, 80), dtype: float32 18%|████████████████▍ | 51/283 [00:24<00:58, 3.95it/s] 19%|█████████████████▎ | 54/283 [00:24<00:43, 5.29it/s] [2024-03-18 19:33:07] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.27.self_attn.o_proj.q_weight", shape: (2560, 640), dtype: uint32 19%|█████████████████▎ | 54/283 [00:24<00:43, 5.29it/s] [2024-03-18 19:33:07] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.27.self_attn.o_proj.q_scale", shape: (2560, 80), dtype: float32 19%|█████████████████▎ | 54/283 [00:24<00:43, 5.29it/s] 19%|█████████████████▋ | 55/283 [00:24<00:41, 5.49it/s] [2024-03-18 19:33:07] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.28.input_layernorm.weight", shape: (2560,), dtype: float32 19%|█████████████████▋ | 55/283 [00:24<00:41, 5.49it/s] [2024-03-18 19:33:08] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.28.mlp.down_proj.q_weight", shape: (2560, 1728), dtype: uint32 19%|█████████████████▋ | 55/283 [00:24<00:41, 5.49it/s] [2024-03-18 19:33:08] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.28.mlp.down_proj.q_scale", shape: (2560, 216), dtype: float32 19%|█████████████████▋ | 55/283 [00:24<00:41, 5.49it/s] 20%|██████████████████▎ | 57/283 [00:24<00:37, 6.05it/s] [2024-03-18 19:33:08] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.28.mlp.gate_up_proj.q_weight", shape: (13824, 640), dtype: uint32 20%|██████████████████▎ | 57/283 [00:25<00:37, 6.05it/s] [2024-03-18 19:33:08] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.28.mlp.gate_up_proj.q_scale", shape: (13824, 80), dtype: float32 20%|██████████████████▎ | 57/283 [00:25<00:37, 6.05it/s] 20%|██████████████████▋ | 58/283 [00:25<00:56, 3.96it/s] [2024-03-18 19:33:08] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.28.post_attention_layernorm.weight", shape: (2560,), dtype: float32 20%|██████████████████▋ | 58/283 [00:25<00:56, 3.96it/s] [2024-03-18 19:33:08] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.28.self_attn.c_attn.bias", shape: (7680,), dtype: float32 20%|██████████████████▋ | 58/283 [00:25<00:56, 3.96it/s] [2024-03-18 19:33:08] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.28.self_attn.c_attn.q_weight", shape: (7680, 640), dtype: uint32 20%|██████████████████▋ | 58/283 [00:25<00:56, 3.96it/s] [2024-03-18 19:33:09] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.28.self_attn.c_attn.q_scale", shape: (7680, 80), dtype: float32 20%|██████████████████▋ | 58/283 [00:25<00:56, 3.96it/s] 22%|███████████████████▌ | 61/283 [00:25<00:41, 5.30it/s] [2024-03-18 19:33:09] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.28.self_attn.o_proj.q_weight", shape: (2560, 640), dtype: uint32 22%|███████████████████▌ | 61/283 [00:25<00:41, 5.30it/s] [2024-03-18 19:33:09] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.28.self_attn.o_proj.q_scale", shape: (2560, 80), dtype: float32 22%|███████████████████▌ | 61/283 [00:25<00:41, 5.30it/s] 22%|███████████████████▉ | 62/283 [00:25<00:40, 5.51it/s] [2024-03-18 19:33:09] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.29.input_layernorm.weight", shape: (2560,), dtype: float32 22%|███████████████████▉ | 62/283 [00:25<00:40, 5.51it/s] [2024-03-18 19:33:09] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.29.mlp.down_proj.q_weight", shape: (2560, 1728), dtype: uint32 22%|███████████████████▉ | 62/283 [00:26<00:40, 5.51it/s] [2024-03-18 19:33:09] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.29.mlp.down_proj.q_scale", shape: (2560, 216), dtype: float32 22%|███████████████████▉ | 62/283 [00:26<00:40, 5.51it/s] 23%|████████████████████▌ | 64/283 [00:26<00:36, 6.05it/s] [2024-03-18 19:33:09] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.29.mlp.gate_up_proj.q_weight", shape: (13824, 640), dtype: uint32 23%|████████████████████▌ | 64/283 [00:26<00:36, 6.05it/s] [2024-03-18 19:33:10] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.29.mlp.gate_up_proj.q_scale", shape: (13824, 80), dtype: float32 23%|████████████████████▌ | 64/283 [00:26<00:36, 6.05it/s] 23%|████████████████████▉ | 65/283 [00:26<00:55, 3.95it/s] [2024-03-18 19:33:10] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.29.post_attention_layernorm.weight", shape: (2560,), dtype: float32 23%|████████████████████▉ | 65/283 [00:26<00:55, 3.95it/s] [2024-03-18 19:33:10] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.29.self_attn.c_attn.bias", shape: (7680,), dtype: float32 23%|████████████████████▉ | 65/283 [00:26<00:55, 3.95it/s] [2024-03-18 19:33:10] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.29.self_attn.c_attn.q_weight", shape: (7680, 640), dtype: uint32 23%|████████████████████▉ | 65/283 [00:27<00:55, 3.95it/s] [2024-03-18 19:33:10] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.29.self_attn.c_attn.q_scale", shape: (7680, 80), dtype: float32 23%|████████████████████▉ | 65/283 [00:27<00:55, 3.95it/s] 24%|█████████████████████▊ | 68/283 [00:27<00:40, 5.29it/s] [2024-03-18 19:33:10] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.29.self_attn.o_proj.q_weight", shape: (2560, 640), dtype: uint32 24%|█████████████████████▊ | 68/283 [00:27<00:40, 5.29it/s] [2024-03-18 19:33:10] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.29.self_attn.o_proj.q_scale", shape: (2560, 80), dtype: float32 24%|█████████████████████▊ | 68/283 [00:27<00:40, 5.29it/s] 24%|██████████████████████▏ | 69/283 [00:27<00:38, 5.49it/s] [2024-03-18 19:33:10] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.30.input_layernorm.weight", shape: (2560,), dtype: float32 24%|██████████████████████▏ | 69/283 [00:27<00:38, 5.49it/s] [2024-03-18 19:33:10] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.30.mlp.down_proj.q_weight", shape: (2560, 1728), dtype: uint32 24%|██████████████████████▏ | 69/283 [00:27<00:38, 5.49it/s] [2024-03-18 19:33:10] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.30.mlp.down_proj.q_scale", shape: (2560, 216), dtype: float32 24%|██████████████████████▏ | 69/283 [00:27<00:38, 5.49it/s] 25%|██████████████████████▊ | 71/283 [00:27<00:35, 6.06it/s] [2024-03-18 19:33:11] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.30.mlp.gate_up_proj.q_weight", shape: (13824, 640), dtype: uint32 25%|██████████████████████▊ | 71/283 [00:28<00:35, 6.06it/s] [2024-03-18 19:33:11] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.30.mlp.gate_up_proj.q_scale", shape: (13824, 80), dtype: float32 25%|██████████████████████▊ | 71/283 [00:28<00:35, 6.06it/s] 25%|███████████████████████▏ | 72/283 [00:28<00:53, 3.93it/s] [2024-03-18 19:33:11] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.30.post_attention_layernorm.weight", shape: (2560,), dtype: float32 25%|███████████████████████▏ | 72/283 [00:28<00:53, 3.93it/s] [2024-03-18 19:33:11] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.30.self_attn.c_attn.bias", shape: (7680,), dtype: float32 25%|███████████████████████▏ | 72/283 [00:28<00:53, 3.93it/s] [2024-03-18 19:33:11] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.30.self_attn.c_attn.q_weight", shape: (7680, 640), dtype: uint32 25%|███████████████████████▏ | 72/283 [00:28<00:53, 3.93it/s] [2024-03-18 19:33:11] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.30.self_attn.c_attn.q_scale", shape: (7680, 80), dtype: float32 25%|███████████████████████▏ | 72/283 [00:28<00:53, 3.93it/s] 27%|████████████████████████ | 75/283 [00:28<00:39, 5.26it/s] [2024-03-18 19:33:11] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.30.self_attn.o_proj.q_weight", shape: (2560, 640), dtype: uint32 27%|████████████████████████ | 75/283 [00:28<00:39, 5.26it/s] [2024-03-18 19:33:11] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.30.self_attn.o_proj.q_scale", shape: (2560, 80), dtype: float32 27%|████████████████████████ | 75/283 [00:28<00:39, 5.26it/s] 27%|████████████████████████▍ | 76/283 [00:28<00:37, 5.45it/s] [2024-03-18 19:33:11] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.31.input_layernorm.weight", shape: (2560,), dtype: float32 27%|████████████████████████▍ | 76/283 [00:28<00:37, 5.45it/s] [2024-03-18 19:33:12] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.31.mlp.down_proj.q_weight", shape: (2560, 1728), dtype: uint32 27%|████████████████████████▍ | 76/283 [00:28<00:37, 5.45it/s] [2024-03-18 19:33:12] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.31.mlp.down_proj.q_scale", shape: (2560, 216), dtype: float32 27%|████████████████████████▍ | 76/283 [00:29<00:37, 5.45it/s] 28%|█████████████████████████ | 78/283 [00:29<00:34, 6.01it/s] [2024-03-18 19:33:12] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.31.mlp.gate_up_proj.q_weight", shape: (13824, 640), dtype: uint32 28%|█████████████████████████ | 78/283 [00:29<00:34, 6.01it/s] [2024-03-18 19:33:12] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.31.mlp.gate_up_proj.q_scale", shape: (13824, 80), dtype: float32 28%|█████████████████████████ | 78/283 [00:29<00:34, 6.01it/s] 28%|█████████████████████████▍ | 79/283 [00:29<00:52, 3.92it/s] [2024-03-18 19:33:12] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.31.post_attention_layernorm.weight", shape: (2560,), dtype: float32 28%|█████████████████████████▍ | 79/283 [00:29<00:52, 3.92it/s] [2024-03-18 19:33:12] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.31.self_attn.c_attn.bias", shape: (7680,), dtype: float32 28%|█████████████████████████▍ | 79/283 [00:29<00:52, 3.92it/s] [2024-03-18 19:33:13] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.31.self_attn.c_attn.q_weight", shape: (7680, 640), dtype: uint32 28%|█████████████████████████▍ | 79/283 [00:29<00:52, 3.92it/s] [2024-03-18 19:33:13] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.31.self_attn.c_attn.q_scale", shape: (7680, 80), dtype: float32 28%|█████████████████████████▍ | 79/283 [00:29<00:52, 3.92it/s] 29%|██████████████████████████▎ | 82/283 [00:29<00:38, 5.26it/s] [2024-03-18 19:33:13] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.31.self_attn.o_proj.q_weight", shape: (2560, 640), dtype: uint32 29%|██████████████████████████▎ | 82/283 [00:30<00:38, 5.26it/s] [2024-03-18 19:33:13] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.31.self_attn.o_proj.q_scale", shape: (2560, 80), dtype: float32 29%|██████████████████████████▎ | 82/283 [00:30<00:38, 5.26it/s] 29%|██████████████████████████▋ | 83/283 [00:30<00:36, 5.44it/s] [2024-03-18 19:33:13] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.32.input_layernorm.weight", shape: (2560,), dtype: float32 29%|██████████████████████████▋ | 83/283 [00:30<00:36, 5.44it/s] [2024-03-18 19:33:13] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.32.mlp.down_proj.q_weight", shape: (2560, 1728), dtype: uint32 29%|██████████████████████████▋ | 83/283 [00:30<00:36, 5.44it/s] [2024-03-18 19:33:13] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.32.mlp.down_proj.q_scale", shape: (2560, 216), dtype: float32 29%|██████████████████████████▋ | 83/283 [00:30<00:36, 5.44it/s] 30%|███████████████████████████▎ | 85/283 [00:30<00:33, 5.99it/s] [2024-03-18 19:33:14] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.32.mlp.gate_up_proj.q_weight", shape: (13824, 640), dtype: uint32 30%|███████████████████████████▎ | 85/283 [00:30<00:33, 5.99it/s] [2024-03-18 19:33:14] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.32.mlp.gate_up_proj.q_scale", shape: (13824, 80), dtype: float32 30%|███████████████████████████▎ | 85/283 [00:31<00:33, 5.99it/s] 30%|███████████████████████████▋ | 86/283 [00:31<00:51, 3.86it/s] [2024-03-18 19:33:14] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.32.post_attention_layernorm.weight", shape: (2560,), dtype: float32 30%|███████████████████████████▋ | 86/283 [00:31<00:51, 3.86it/s] [2024-03-18 19:33:14] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.32.self_attn.c_attn.bias", shape: (7680,), dtype: float32 30%|███████████████████████████▋ | 86/283 [00:31<00:51, 3.86it/s] [2024-03-18 19:33:14] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.32.self_attn.c_attn.q_weight", shape: (7680, 640), dtype: uint32 30%|███████████████████████████▋ | 86/283 [00:31<00:51, 3.86it/s] [2024-03-18 19:33:14] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.32.self_attn.c_attn.q_scale", shape: (7680, 80), dtype: float32 30%|███████████████████████████▋ | 86/283 [00:31<00:51, 3.86it/s] 31%|████████████████████████████▌ | 89/283 [00:31<00:37, 5.20it/s] [2024-03-18 19:33:14] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.32.self_attn.o_proj.q_weight", shape: (2560, 640), dtype: uint32 31%|████████████████████████████▌ | 89/283 [00:31<00:37, 5.20it/s] [2024-03-18 19:33:14] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.32.self_attn.o_proj.q_scale", shape: (2560, 80), dtype: float32 31%|████████████████████████████▌ | 89/283 [00:31<00:37, 5.20it/s] 32%|████████████████████████████▉ | 90/283 [00:31<00:36, 5.35it/s] [2024-03-18 19:33:14] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.33.input_layernorm.weight", shape: (2560,), dtype: float32 32%|████████████████████████████▉ | 90/283 [00:31<00:36, 5.35it/s] [2024-03-18 19:33:14] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.33.mlp.down_proj.q_weight", shape: (2560, 1728), dtype: uint32 32%|████████████████████████████▉ | 90/283 [00:31<00:36, 5.35it/s] [2024-03-18 19:33:14] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.33.mlp.down_proj.q_scale", shape: (2560, 216), dtype: float32 32%|████████████████████████████▉ | 90/283 [00:31<00:36, 5.35it/s] 33%|█████████████████████████████▌ | 92/283 [00:31<00:32, 5.96it/s] [2024-03-18 19:33:15] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.33.mlp.gate_up_proj.q_weight", shape: (13824, 640), dtype: uint32 33%|█████████████████████████████▌ | 92/283 [00:32<00:32, 5.96it/s] [2024-03-18 19:33:15] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.33.mlp.gate_up_proj.q_scale", shape: (13824, 80), dtype: float32 33%|█████████████████████████████▌ | 92/283 [00:32<00:32, 5.96it/s] 33%|█████████████████████████████▉ | 93/283 [00:32<00:50, 3.78it/s] [2024-03-18 19:33:15] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.33.post_attention_layernorm.weight", shape: (2560,), dtype: float32 33%|█████████████████████████████▉ | 93/283 [00:32<00:50, 3.78it/s] [2024-03-18 19:33:15] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.33.self_attn.c_attn.bias", shape: (7680,), dtype: float32 33%|█████████████████████████████▉ | 93/283 [00:32<00:50, 3.78it/s] [2024-03-18 19:33:15] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.33.self_attn.c_attn.q_weight", shape: (7680, 640), dtype: uint32 33%|█████████████████████████████▉ | 93/283 [00:32<00:50, 3.78it/s] [2024-03-18 19:33:15] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.33.self_attn.c_attn.q_scale", shape: (7680, 80), dtype: float32 33%|█████████████████████████████▉ | 93/283 [00:32<00:50, 3.78it/s] 34%|██████████████████████████████▊ | 96/283 [00:32<00:36, 5.11it/s] [2024-03-18 19:33:16] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.33.self_attn.o_proj.q_weight", shape: (2560, 640), dtype: uint32 34%|██████████████████████████████▊ | 96/283 [00:32<00:36, 5.11it/s] [2024-03-18 19:33:16] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.33.self_attn.o_proj.q_scale", shape: (2560, 80), dtype: float32 34%|██████████████████████████████▊ | 96/283 [00:32<00:36, 5.11it/s] 34%|███████████████████████████████▏ | 97/283 [00:32<00:34, 5.34it/s] [2024-03-18 19:33:16] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.34.input_layernorm.weight", shape: (2560,), dtype: float32 34%|███████████████████████████████▏ | 97/283 [00:32<00:34, 5.34it/s] [2024-03-18 19:33:16] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.34.mlp.down_proj.q_weight", shape: (2560, 1728), dtype: uint32 34%|███████████████████████████████▏ | 97/283 [00:33<00:34, 5.34it/s] [2024-03-18 19:33:16] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.34.mlp.down_proj.q_scale", shape: (2560, 216), dtype: float32 34%|███████████████████████████████▏ | 97/283 [00:33<00:34, 5.34it/s] 35%|███████████████████████████████▊ | 99/283 [00:33<00:31, 5.86it/s] [2024-03-18 19:33:16] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.34.mlp.gate_up_proj.q_weight", shape: (13824, 640), dtype: uint32 35%|███████████████████████████████▊ | 99/283 [00:33<00:31, 5.86it/s] [2024-03-18 19:33:17] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.34.mlp.gate_up_proj.q_scale", shape: (13824, 80), dtype: float32 35%|███████████████████████████████▊ | 99/283 [00:33<00:31, 5.86it/s] 35%|███████████████████████████████▊ | 100/283 [00:33<00:47, 3.85it/s] [2024-03-18 19:33:17] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.34.post_attention_layernorm.weight", shape: (2560,), dtype: float32 35%|███████████████████████████████▊ | 100/283 [00:33<00:47, 3.85it/s] [2024-03-18 19:33:17] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.34.self_attn.c_attn.bias", shape: (7680,), dtype: float32 35%|███████████████████████████████▊ | 100/283 [00:33<00:47, 3.85it/s] [2024-03-18 19:33:17] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.34.self_attn.c_attn.q_weight", shape: (7680, 640), dtype: uint32 35%|███████████████████████████████▊ | 100/283 [00:34<00:47, 3.85it/s] [2024-03-18 19:33:17] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.34.self_attn.c_attn.q_scale", shape: (7680, 80), dtype: float32 35%|███████████████████████████████▊ | 100/283 [00:34<00:47, 3.85it/s] 36%|████████████████████████████████▊ | 103/283 [00:34<00:34, 5.17it/s] [2024-03-18 19:33:17] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.34.self_attn.o_proj.q_weight", shape: (2560, 640), dtype: uint32 36%|████████████████████████████████▊ | 103/283 [00:34<00:34, 5.17it/s] [2024-03-18 19:33:17] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.34.self_attn.o_proj.q_scale", shape: (2560, 80), dtype: float32 36%|████████████████████████████████▊ | 103/283 [00:34<00:34, 5.17it/s] 37%|█████████████████████████████████ | 104/283 [00:34<00:33, 5.29it/s] [2024-03-18 19:33:17] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.35.input_layernorm.weight", shape: (2560,), dtype: float32 37%|█████████████████████████████████ | 104/283 [00:34<00:33, 5.29it/s] [2024-03-18 19:33:17] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.35.mlp.down_proj.q_weight", shape: (2560, 1728), dtype: uint32 37%|█████████████████████████████████ | 104/283 [00:34<00:33, 5.29it/s] [2024-03-18 19:33:17] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.35.mlp.down_proj.q_scale", shape: (2560, 216), dtype: float32 37%|█████████████████████████████████ | 104/283 [00:34<00:33, 5.29it/s] 37%|█████████████████████████████████▋ | 106/283 [00:34<00:29, 5.91it/s] [2024-03-18 19:33:18] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.35.mlp.gate_up_proj.q_weight", shape: (13824, 640), dtype: uint32 37%|█████████████████████████████████▋ | 106/283 [00:35<00:29, 5.91it/s] [2024-03-18 19:33:18] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.35.mlp.gate_up_proj.q_scale", shape: (13824, 80), dtype: float32 37%|█████████████████████████████████▋ | 106/283 [00:35<00:29, 5.91it/s] 38%|██████████████████████████████████ | 107/283 [00:35<00:45, 3.86it/s] [2024-03-18 19:33:18] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.35.post_attention_layernorm.weight", shape: (2560,), dtype: float32 38%|██████████████████████████████████ | 107/283 [00:35<00:45, 3.86it/s] [2024-03-18 19:33:18] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.35.self_attn.c_attn.bias", shape: (7680,), dtype: float32 38%|██████████████████████████████████ | 107/283 [00:35<00:45, 3.86it/s] [2024-03-18 19:33:18] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.35.self_attn.c_attn.q_weight", shape: (7680, 640), dtype: uint32 38%|██████████████████████████████████ | 107/283 [00:35<00:45, 3.86it/s] [2024-03-18 19:33:18] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.35.self_attn.c_attn.q_scale", shape: (7680, 80), dtype: float32 38%|██████████████████████████████████ | 107/283 [00:35<00:45, 3.86it/s] 39%|██████████████████████████████████▉ | 110/283 [00:35<00:33, 5.17it/s] [2024-03-18 19:33:18] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.35.self_attn.o_proj.q_weight", shape: (2560, 640), dtype: uint32 39%|██████████████████████████████████▉ | 110/283 [00:35<00:33, 5.17it/s] [2024-03-18 19:33:18] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.35.self_attn.o_proj.q_scale", shape: (2560, 80), dtype: float32 39%|██████████████████████████████████▉ | 110/283 [00:35<00:33, 5.17it/s] 39%|███████████████████████████████████▎ | 111/283 [00:35<00:32, 5.33it/s] [2024-03-18 19:33:18] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.36.input_layernorm.weight", shape: (2560,), dtype: float32 39%|███████████████████████████████████▎ | 111/283 [00:35<00:32, 5.33it/s] [2024-03-18 19:33:19] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.36.mlp.down_proj.q_weight", shape: (2560, 1728), dtype: uint32 39%|███████████████████████████████████▎ | 111/283 [00:35<00:32, 5.33it/s] [2024-03-18 19:33:19] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.36.mlp.down_proj.q_scale", shape: (2560, 216), dtype: float32 39%|███████████████████████████████████▎ | 111/283 [00:36<00:32, 5.33it/s] 40%|███████████████████████████████████▉ | 113/283 [00:36<00:28, 5.94it/s] [2024-03-18 19:33:19] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.36.mlp.gate_up_proj.q_weight", shape: (13824, 640), dtype: uint32 40%|███████████████████████████████████▉ | 113/283 [00:36<00:28, 5.94it/s] [2024-03-18 19:33:20] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.36.mlp.gate_up_proj.q_scale", shape: (13824, 80), dtype: float32 40%|███████████████████████████████████▉ | 113/283 [00:36<00:28, 5.94it/s] 40%|████████████████████████████████████▎ | 114/283 [00:37<00:56, 2.99it/s] [2024-03-18 19:33:20] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.36.post_attention_layernorm.weight", shape: (2560,), dtype: float32 40%|████████████████████████████████████▎ | 114/283 [00:37<00:56, 2.99it/s] [2024-03-18 19:33:20] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.36.self_attn.c_attn.bias", shape: (7680,), dtype: float32 40%|████████████████████████████████████▎ | 114/283 [00:37<00:56, 2.99it/s] [2024-03-18 19:33:20] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.36.self_attn.c_attn.q_weight", shape: (7680, 640), dtype: uint32 40%|████████████████████████████████████▎ | 114/283 [00:37<00:56, 2.99it/s] [2024-03-18 19:33:20] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.36.self_attn.c_attn.q_scale", shape: (7680, 80), dtype: float32 40%|████████████████████████████████████▎ | 114/283 [00:37<00:56, 2.99it/s] 41%|█████████████████████████████████████▏ | 117/283 [00:37<00:39, 4.23it/s] [2024-03-18 19:33:20] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.36.self_attn.o_proj.q_weight", shape: (2560, 640), dtype: uint32 41%|█████████████████████████████████████▏ | 117/283 [00:37<00:39, 4.23it/s] [2024-03-18 19:33:20] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.36.self_attn.o_proj.q_scale", shape: (2560, 80), dtype: float32 41%|█████████████████████████████████████▏ | 117/283 [00:37<00:39, 4.23it/s] 42%|█████████████████████████████████████▌ | 118/283 [00:37<00:36, 4.51it/s] [2024-03-18 19:33:20] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.37.input_layernorm.weight", shape: (2560,), dtype: float32 42%|█████████████████████████████████████▌ | 118/283 [00:37<00:36, 4.51it/s] [2024-03-18 19:33:20] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.37.mlp.down_proj.q_weight", shape: (2560, 1728), dtype: uint32 42%|█████████████████████████████████████▌ | 118/283 [00:37<00:36, 4.51it/s] [2024-03-18 19:33:21] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.37.mlp.down_proj.q_scale", shape: (2560, 216), dtype: float32 42%|█████████████████████████████████████▌ | 118/283 [00:37<00:36, 4.51it/s] 42%|██████████████████████████████████████▏ | 120/283 [00:37<00:31, 5.22it/s] [2024-03-18 19:33:21] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.37.mlp.gate_up_proj.q_weight", shape: (13824, 640), dtype: uint32 42%|██████████████████████████████████████▏ | 120/283 [00:38<00:31, 5.22it/s] [2024-03-18 19:33:21] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.37.mlp.gate_up_proj.q_scale", shape: (13824, 80), dtype: float32 42%|██████████████████████████████████████▏ | 120/283 [00:38<00:31, 5.22it/s] 43%|██████████████████████████████████████▍ | 121/283 [00:38<00:44, 3.61it/s] [2024-03-18 19:33:21] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.37.post_attention_layernorm.weight", shape: (2560,), dtype: float32 43%|██████████████████████████████████████▍ | 121/283 [00:38<00:44, 3.61it/s] [2024-03-18 19:33:21] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.37.self_attn.c_attn.bias", shape: (7680,), dtype: float32 43%|██████████████████████████████████████▍ | 121/283 [00:38<00:44, 3.61it/s] [2024-03-18 19:33:21] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.37.self_attn.c_attn.q_weight", shape: (7680, 640), dtype: uint32 43%|██████████████████████████████████████▍ | 121/283 [00:38<00:44, 3.61it/s] [2024-03-18 19:33:21] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.37.self_attn.c_attn.q_scale", shape: (7680, 80), dtype: float32 43%|██████████████████████████████████████▍ | 121/283 [00:38<00:44, 3.61it/s] 44%|███████████████████████████████████████▍ | 124/283 [00:38<00:32, 4.94it/s] [2024-03-18 19:33:22] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.37.self_attn.o_proj.q_weight", shape: (2560, 640), dtype: uint32 44%|███████████████████████████████████████▍ | 124/283 [00:38<00:32, 4.94it/s] [2024-03-18 19:33:22] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.37.self_attn.o_proj.q_scale", shape: (2560, 80), dtype: float32 44%|███████████████████████████████████████▍ | 124/283 [00:38<00:32, 4.94it/s] 44%|███████████████████████████████████████▊ | 125/283 [00:38<00:30, 5.18it/s] [2024-03-18 19:33:22] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.38.input_layernorm.weight", shape: (2560,), dtype: float32 44%|███████████████████████████████████████▊ | 125/283 [00:38<00:30, 5.18it/s] [2024-03-18 19:33:22] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.38.mlp.down_proj.q_weight", shape: (2560, 1728), dtype: uint32 44%|███████████████████████████████████████▊ | 125/283 [00:39<00:30, 5.18it/s] [2024-03-18 19:33:22] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.38.mlp.down_proj.q_scale", shape: (2560, 216), dtype: float32 44%|███████████████████████████████████████▊ | 125/283 [00:39<00:30, 5.18it/s] 45%|████████████████████████████████████████▍ | 127/283 [00:39<00:26, 5.82it/s] [2024-03-18 19:33:23] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.38.mlp.gate_up_proj.q_weight", shape: (13824, 640), dtype: uint32 45%|████████████████████████████████████████▍ | 127/283 [00:39<00:26, 5.82it/s] [2024-03-18 19:33:23] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.38.mlp.gate_up_proj.q_scale", shape: (13824, 80), dtype: float32 45%|████████████████████████████████████████▍ | 127/283 [00:40<00:26, 5.82it/s] 45%|████████████████████████████████████████▋ | 128/283 [00:40<00:48, 3.22it/s] [2024-03-18 19:33:23] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.38.post_attention_layernorm.weight", shape: (2560,), dtype: float32 45%|████████████████████████████████████████▋ | 128/283 [00:40<00:48, 3.22it/s] [2024-03-18 19:33:23] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.38.self_attn.c_attn.bias", shape: (7680,), dtype: float32 45%|████████████████████████████████████████▋ | 128/283 [00:40<00:48, 3.22it/s] [2024-03-18 19:33:23] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.38.self_attn.c_attn.q_weight", shape: (7680, 640), dtype: uint32 45%|████████████████████████████████████████▋ | 128/283 [00:40<00:48, 3.22it/s] [2024-03-18 19:33:23] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.38.self_attn.c_attn.q_scale", shape: (7680, 80), dtype: float32 45%|████████████████████████████████████████▋ | 128/283 [00:40<00:48, 3.22it/s] 46%|█████████████████████████████████████████▋ | 131/283 [00:40<00:33, 4.53it/s] [2024-03-18 19:33:23] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.38.self_attn.o_proj.q_weight", shape: (2560, 640), dtype: uint32 46%|█████████████████████████████████████████▋ | 131/283 [00:40<00:33, 4.53it/s] [2024-03-18 19:33:23] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.38.self_attn.o_proj.q_scale", shape: (2560, 80), dtype: float32 46%|█████████████████████████████████████████▋ | 131/283 [00:40<00:33, 4.53it/s] 47%|█████████████████████████████████████████▉ | 132/283 [00:40<00:31, 4.77it/s] [2024-03-18 19:33:23] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.39.input_layernorm.weight", shape: (2560,), dtype: float32 47%|█████████████████████████████████████████▉ | 132/283 [00:40<00:31, 4.77it/s] [2024-03-18 19:33:23] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.39.mlp.down_proj.q_weight", shape: (2560, 1728), dtype: uint32 47%|█████████████████████████████████████████▉ | 132/283 [00:40<00:31, 4.77it/s] [2024-03-18 19:33:24] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.39.mlp.down_proj.q_scale", shape: (2560, 216), dtype: float32 47%|█████████████████████████████████████████▉ | 132/283 [00:40<00:31, 4.77it/s] 47%|██████████████████████████████████████████▌ | 134/283 [00:40<00:27, 5.47it/s] [2024-03-18 19:33:24] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.39.mlp.gate_up_proj.q_weight", shape: (13824, 640), dtype: uint32 47%|██████████████████████████████████████████▌ | 134/283 [00:41<00:27, 5.47it/s] [2024-03-18 19:33:25] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.39.mlp.gate_up_proj.q_scale", shape: (13824, 80), dtype: float32 47%|██████████████████████████████████████████▌ | 134/283 [00:41<00:27, 5.47it/s] 48%|██████████████████████████████████████████▉ | 135/283 [00:41<00:53, 2.79it/s] [2024-03-18 19:33:25] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.39.post_attention_layernorm.weight", shape: (2560,), dtype: float32 48%|██████████████████████████████████████████▉ | 135/283 [00:41<00:53, 2.79it/s] [2024-03-18 19:33:25] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.39.self_attn.c_attn.bias", shape: (7680,), dtype: float32 48%|██████████████████████████████████████████▉ | 135/283 [00:41<00:53, 2.79it/s] [2024-03-18 19:33:25] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.39.self_attn.c_attn.q_weight", shape: (7680, 640), dtype: uint32 48%|██████████████████████████████████████████▉ | 135/283 [00:42<00:53, 2.79it/s] [2024-03-18 19:33:25] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.39.self_attn.c_attn.q_scale", shape: (7680, 80), dtype: float32 48%|██████████████████████████████████████████▉ | 135/283 [00:42<00:53, 2.79it/s] 49%|███████████████████████████████████████████▉ | 138/283 [00:42<00:35, 4.04it/s] [2024-03-18 19:33:25] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.39.self_attn.o_proj.q_weight", shape: (2560, 640), dtype: uint32 49%|███████████████████████████████████████████▉ | 138/283 [00:42<00:35, 4.04it/s] [2024-03-18 19:33:25] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.39.self_attn.o_proj.q_scale", shape: (2560, 80), dtype: float32 49%|███████████████████████████████████████████▉ | 138/283 [00:42<00:35, 4.04it/s] 49%|████████████████████████████████████████████▏ | 139/283 [00:42<00:33, 4.34it/s] [2024-03-18 19:33:25] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.norm.weight", shape: (2560,), dtype: float32 49%|████████████████████████████████████████████▏ | 139/283 [00:42<00:33, 4.34it/s] [2024-03-18 19:33:25] INFO huggingface_loader.py:194: Unloading HF weight file: ../dist/models/Qwen1.5-4B/model-00002-of-00002.safetensors 49%|████████████████████████████████████████████▏ | 139/283 [00:42<00:33, 4.34it/s] [2024-03-18 19:33:26] INFO huggingface_loader.py:182: Loading HF parameters from: ../dist/models/Qwen1.5-4B/model-00001-of-00002.safetensors 49%|████████████████████████████████████████████▏ | 139/283 [00:42<00:33, 4.34it/s][19:33:33] /workspace/tvm/src/runtime/memory/pooled_allocator.h:65: Warning: PooledAllocator got InternalError during allocation: InternalError: Check failed: (e == cudaSuccess || e == cudaErrorCudartUnloading) is false: CUDA: out of memory [19:33:33] /workspace/tvm/src/runtime/memory/pooled_allocator.h:66: Warning: Trying to release all unused memory and reallocate... terminate called after throwing an instance of 'tvm::runtime::InternalError' what(): [19:33:33] /workspace/tvm/include/tvm/runtime/packed_func.h:1346: unknown type = 0 Stack trace: 0: _ZN3tvm7runtime6deta 1: _ZN3tvm7runtime6memory13MemoryM 2: _ZN3tvm7runtime18SimpleObjAllocator7HandlerINS0_ 3: tvm::runtime::relax_vm::VMAllocStorage(void*, tvm::runtime::ShapeTuple, long, DLDataType, tvm::runtime::String) [clone .cold] 4: tvm::runtime::TypedPackedFunc::AssignTypedLambda(tvm::runtime::memory::Storage (*)(void*, tvm::runtime::ShapeTuple, long, DLDataType, tvm::runtime::String), std::__cxx11::basic_string, std::allocator >)::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}::operator()(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*) const 5: _ZN3tvm7runtime13PackedFun 6: tvm::runtime::relax_vm::VirtualMachineImpl::RunInstrCall(tvm::runtime::relax_vm::VMFrame*, tvm::runtime::relax_vm::Instruction) 7: tvm::runtime::relax_vm::VirtualMachineImpl::RunLoop() 8: tvm::runtime::relax_vm::VirtualMachineImpl::InvokeBytecode(long, std::vector > const&) 9: tvm::runtime::PackedFuncObj::Extractor >::Call(tvm::runtime::PackedFuncObj const*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) 10: tvm::runtime::relax_vm::VirtualMachineImpl::InvokeClosurePacked(tvm::runtime::ObjectRef const&, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)