|
/home/cfruan/.conda/envs/mlc-source-311/bin/python -m mlc_llm gen_config /models/Meta-Llama-3-8B-Instruct --quantization q0f32 --conv-template llama-3 --output /tmp/tmpq8el2iww --context-window-size 8192 --prefill-chunk-size 1024 |
|
[2024-04-18 15:59:56] INFO auto_config.py:115: [92mFound[0m model configuration: /models/Meta-Llama-3-8B-Instruct/config.json |
|
[2024-04-18 15:59:56] INFO auto_config.py:153: [92mFound[0m model type: [1mllama[0m. Use `--model-type` to override. |
|
[2024-04-18 15:59:56] INFO llama_model.py:52: [1mcontext_window_size[0m not found in config.json. Falling back to [1mmax_position_embeddings[0m (8192) |
|
[2024-04-18 15:59:56] INFO llama_model.py:72: [1mprefill_chunk_size[0m defaults to [1mcontext_window_size[0m (8192) |
|
[2024-04-18 15:59:56] INFO config.py:106: Overriding [1mcontext_window_size[0m from 8192 to 8192 |
|
[2024-04-18 15:59:56] INFO config.py:106: Overriding [1mprefill_chunk_size[0m from 8192 to 1024 |
|
[2024-04-18 15:59:56] INFO config.py:106: Overriding [1mmax_batch_size[0m from 1 to 80 |
|
[2024-04-18 15:59:56] INFO gen_config.py:187: [generation_config.json] Setting [1mbos_token_id[0m: 128000 |
|
[2024-04-18 15:59:56] INFO gen_config.py:187: [generation_config.json] Setting [1meos_token_id[0m: 128001 |
|
[2024-04-18 15:59:56] INFO gen_config.py:201: [91mNot found[0m tokenizer config: /models/Meta-Llama-3-8B-Instruct/tokenizer.model |
|
[2024-04-18 15:59:56] INFO gen_config.py:199: [92mFound[0m tokenizer config: /models/Meta-Llama-3-8B-Instruct/tokenizer.json. Copying to [1m/tmp/tmpq8el2iww/tokenizer.json[0m |
|
[2024-04-18 15:59:56] INFO gen_config.py:201: [91mNot found[0m tokenizer config: /models/Meta-Llama-3-8B-Instruct/vocab.json |
|
[2024-04-18 15:59:56] INFO gen_config.py:201: [91mNot found[0m tokenizer config: /models/Meta-Llama-3-8B-Instruct/merges.txt |
|
[2024-04-18 15:59:56] INFO gen_config.py:201: [91mNot found[0m tokenizer config: /models/Meta-Llama-3-8B-Instruct/added_tokens.json |
|
[2024-04-18 15:59:56] INFO gen_config.py:199: [92mFound[0m tokenizer config: /models/Meta-Llama-3-8B-Instruct/tokenizer_config.json. Copying to [1m/tmp/tmpq8el2iww/tokenizer_config.json[0m |
|
[2024-04-18 15:59:56] INFO gen_config.py:76: [System default] Setting [1mpad_token_id[0m: 0 |
|
[2024-04-18 15:59:56] INFO gen_config.py:76: [System default] Setting [1mtemperature[0m: 0.7 |
|
[2024-04-18 15:59:56] INFO gen_config.py:76: [System default] Setting [1mpresence_penalty[0m: 0.0 |
|
[2024-04-18 15:59:56] INFO gen_config.py:76: [System default] Setting [1mfrequency_penalty[0m: 0.0 |
|
[2024-04-18 15:59:56] INFO gen_config.py:76: [System default] Setting [1mrepetition_penalty[0m: 1.0 |
|
[2024-04-18 15:59:56] INFO gen_config.py:76: [System default] Setting [1mtop_p[0m: 0.95 |
|
[2024-04-18 15:59:56] INFO gen_config.py:76: [System default] Setting [1mmean_gen_len[0m: 128 |
|
[2024-04-18 15:59:56] INFO gen_config.py:76: [System default] Setting [1mmax_gen_len[0m: 512 |
|
[2024-04-18 15:59:56] INFO gen_config.py:76: [System default] Setting [1mshift_fill_factor[0m: 0.3 |
|
[2024-04-18 15:59:56] INFO gen_config.py:263: Dumping configuration file to: [1m/tmp/tmpq8el2iww/mlc-chat-config.json[0m |
|
/home/cfruan/.conda/envs/mlc-source-311/bin/python -m mlc_llm convert_weight /models/Meta-Llama-3-8B-Instruct --quantization q0f32 --source-format auto --output /tmp/tmpq8el2iww |
|
[2024-04-18 15:59:57] INFO auto_config.py:115: [92mFound[0m model configuration: /models/Meta-Llama-3-8B-Instruct/config.json |
|
[2024-04-18 15:59:58] INFO auto_device.py:76: [92mFound[0m device: cuda:0 |
|
[2024-04-18 15:59:58] INFO auto_device.py:76: [92mFound[0m device: cuda:1 |
|
[2024-04-18 15:59:59] INFO auto_device.py:85: [91mNot found[0m device: rocm:0 |
|
[2024-04-18 16:00:00] INFO auto_device.py:85: [91mNot found[0m device: metal:0 |
|
[2024-04-18 16:00:01] INFO auto_device.py:76: [92mFound[0m device: vulkan:0 |
|
[2024-04-18 16:00:01] INFO auto_device.py:76: [92mFound[0m device: vulkan:1 |
|
[2024-04-18 16:00:01] INFO auto_device.py:76: [92mFound[0m device: vulkan:2 |
|
[2024-04-18 16:00:02] INFO auto_device.py:85: [91mNot found[0m device: opencl:0 |
|
[2024-04-18 16:00:02] INFO auto_device.py:33: Using device: [1mcuda:0[0m |
|
[2024-04-18 16:00:02] INFO auto_weight.py:70: Finding weights in: /models/Meta-Llama-3-8B-Instruct |
|
[2024-04-18 16:00:02] INFO auto_weight.py:136: [91mNot found[0m Huggingface PyTorch |
|
[2024-04-18 16:00:02] INFO auto_weight.py:143: [92mFound[0m source weight format: huggingface-safetensor. Source configuration: /models/Meta-Llama-3-8B-Instruct/model.safetensors.index.json |
|
[2024-04-18 16:00:02] INFO auto_weight.py:106: Using source weight configuration: [1m/models/Meta-Llama-3-8B-Instruct/model.safetensors.index.json[0m. Use `--source` to override. |
|
[2024-04-18 16:00:02] INFO auto_weight.py:110: Using source weight format: [1mhuggingface-safetensor[0m. Use `--source-format` to override. |
|
[2024-04-18 16:00:02] INFO auto_config.py:153: [92mFound[0m model type: [1mllama[0m. Use `--model-type` to override. |
|
[2024-04-18 16:00:02] INFO llama_model.py:52: [1mcontext_window_size[0m not found in config.json. Falling back to [1mmax_position_embeddings[0m (8192) |
|
[2024-04-18 16:00:02] INFO llama_model.py:72: [1mprefill_chunk_size[0m defaults to [1mcontext_window_size[0m (8192) |
|
[1mWeight conversion with arguments:[0m |
|
[1m--config[0m /models/Meta-Llama-3-8B-Instruct/config.json |
|
[1m--quantization[0m NoQuantize(name='q0f32', kind='no-quant', model_dtype='float32') |
|
[1m--model-type[0m llama |
|
[1m--device[0m cuda:0 |
|
[1m--source[0m /models/Meta-Llama-3-8B-Instruct/model.safetensors.index.json |
|
[1m--source-format[0m huggingface-safetensor |
|
[1m--output[0m /tmp/tmpq8el2iww |
|
Start storing to cache /tmp/tmpq8el2iww |
|
0%| | 0/195 [00:00<?, ?it/s]
[2024-04-18 16:00:06] INFO huggingface_loader.py:184: Loading HF parameters from: /models/Meta-Llama-3-8B-Instruct/model-00004-of-00004.safetensors |
|
0%| | 0/195 [00:00<?, ?it/s]
[2024-04-18 16:00:16] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mlm_head.weight[0m", shape: (128256, 4096), dtype: float32 |
|
0%| | 0/195 [00:09<?, ?it/s]/home/cfruan/.conda/envs/mlc-source-311/lib/python3.11/site-packages/numpy/core/getlimits.py:549: UserWarning: The value of the smallest subnormal for <class 'numpy.float32'> type is zero. |
|
setattr(self, word, getattr(machar, word).flat[0]) |
|
/home/cfruan/.conda/envs/mlc-source-311/lib/python3.11/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for <class 'numpy.float32'> type is zero. |
|
return self._float_to_str(self.smallest_subnormal) |
|
1%|β | 1/195 [00:25<1:22:17, 25.45s/it]
[2024-04-18 16:00:32] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.31.input_layernorm.weight[0m", shape: (4096,), dtype: float32 |
|
1%|β | 1/195 [00:25<1:22:17, 25.45s/it]
1%|ββ | 2/195 [00:25<33:57, 10.56s/it]
[2024-04-18 16:00:32] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.31.mlp.down_proj.weight[0m", shape: (4096, 14336), dtype: float32 |
|
1%|ββ | 2/195 [00:25<33:57, 10.56s/it]
2%|βββ | 3/195 [00:26<19:22, 6.05s/it]
[2024-04-18 16:00:33] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.31.post_attention_layernorm.weight[0m", shape: (4096,), dtype: float32 |
|
2%|βββ | 3/195 [00:26<19:22, 6.05s/it]
[2024-04-18 16:00:33] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.norm.weight[0m", shape: (4096,), dtype: float32 |
|
2%|βββ | 3/195 [00:26<19:22, 6.05s/it]
[2024-04-18 16:00:33] INFO huggingface_loader.py:196: Unloading HF weight file: /models/Meta-Llama-3-8B-Instruct/model-00004-of-00004.safetensors |
|
2%|βββ | 3/195 [00:26<19:22, 6.05s/it]
[2024-04-18 16:00:33] INFO huggingface_loader.py:184: Loading HF parameters from: /models/Meta-Llama-3-8B-Instruct/model-00001-of-00004.safetensors |
|
2%|βββ | 3/195 [00:26<19:22, 6.05s/it]
[2024-04-18 16:00:45] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.embed_tokens.weight[0m", shape: (128256, 4096), dtype: float32 |
|
2%|βββ | 3/195 [00:38<19:22, 6.05s/it]
3%|βββββ | 6/195 [00:59<29:18, 9.30s/it]
[2024-04-18 16:01:06] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.0.input_layernorm.weight[0m", shape: (4096,), dtype: float32 |
|
3%|βββββ | 6/195 [00:59<29:18, 9.30s/it]
4%|ββββββ | 7/195 [00:59<22:20, 7.13s/it]
[2024-04-18 16:01:06] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.0.mlp.down_proj.weight[0m", shape: (4096, 14336), dtype: float32 |
|
4%|ββββββ | 7/195 [00:59<22:20, 7.13s/it]
4%|βββββββ | 8/195 [01:00<17:06, 5.49s/it]
[2024-04-18 16:01:07] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.0.mlp.gate_up_proj.weight[0m", shape: (28672, 4096), dtype: float32 |
|
4%|βββββββ | 8/195 [01:00<17:06, 5.49s/it]
5%|βββββββ | 9/195 [01:02<14:08, 4.56s/it]
[2024-04-18 16:01:08] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.0.post_attention_layernorm.weight[0m", shape: (4096,), dtype: float32 |
|
5%|βββββββ | 9/195 [01:02<14:08, 4.56s/it]
[2024-04-18 16:01:09] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.0.self_attn.qkv_proj.weight[0m", shape: (6144, 4096), dtype: float32 |
|
5%|βββββββ | 9/195 [01:02<14:08, 4.56s/it]
6%|βββββββββ | 11/195 [01:02<08:10, 2.66s/it]
[2024-04-18 16:01:09] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.0.self_attn.o_proj.weight[0m", shape: (4096, 4096), dtype: float32 |
|
6%|βββββββββ | 11/195 [01:02<08:10, 2.66s/it]
6%|ββββββββββ | 12/195 [01:02<06:21, 2.08s/it]
[2024-04-18 16:01:09] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.1.input_layernorm.weight[0m", shape: (4096,), dtype: float32 |
|
6%|ββββββββββ | 12/195 [01:02<06:21, 2.08s/it]
[2024-04-18 16:01:09] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.1.mlp.down_proj.weight[0m", shape: (4096, 14336), dtype: float32 |
|
6%|ββββββββββ | 12/195 [01:02<06:21, 2.08s/it]
7%|βββββββββββ | 14/195 [01:03<04:10, 1.39s/it]
[2024-04-18 16:01:10] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.1.mlp.gate_up_proj.weight[0m", shape: (28672, 4096), dtype: float32 |
|
7%|βββββββββββ | 14/195 [01:03<04:10, 1.39s/it]
8%|ββββββββββββ | 15/195 [01:04<04:13, 1.41s/it]
[2024-04-18 16:01:11] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.1.post_attention_layernorm.weight[0m", shape: (4096,), dtype: float32 |
|
8%|ββββββββββββ | 15/195 [01:04<04:13, 1.41s/it]
[2024-04-18 16:01:11] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.1.self_attn.qkv_proj.weight[0m", shape: (6144, 4096), dtype: float32 |
|
8%|ββββββββββββ | 15/195 [01:04<04:13, 1.41s/it]
9%|ββββββββββββββ | 17/195 [01:05<02:45, 1.08it/s]
[2024-04-18 16:01:12] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.1.self_attn.o_proj.weight[0m", shape: (4096, 4096), dtype: float32 |
|
9%|ββββββββββββββ | 17/195 [01:05<02:45, 1.08it/s]
9%|ββββββββββββββ | 18/195 [01:05<02:16, 1.30it/s]
[2024-04-18 16:01:12] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.2.input_layernorm.weight[0m", shape: (4096,), dtype: float32 |
|
9%|ββββββββββββββ | 18/195 [01:05<02:16, 1.30it/s]
[2024-04-18 16:01:12] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.2.mlp.down_proj.weight[0m", shape: (4096, 14336), dtype: float32 |
|
9%|ββββββββββββββ | 18/195 [01:05<02:16, 1.30it/s]
10%|ββββββββββββββββ | 20/195 [01:06<01:47, 1.63it/s]
[2024-04-18 16:01:13] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.2.mlp.gate_up_proj.weight[0m", shape: (28672, 4096), dtype: float32 |
|
10%|ββββββββββββββββ | 20/195 [01:06<01:47, 1.63it/s]
11%|βββββββββββββββββ | 21/195 [01:07<02:20, 1.24it/s]
[2024-04-18 16:01:14] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.2.post_attention_layernorm.weight[0m", shape: (4096,), dtype: float32 |
|
11%|βββββββββββββββββ | 21/195 [01:07<02:20, 1.24it/s]
[2024-04-18 16:01:14] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.2.self_attn.qkv_proj.weight[0m", shape: (6144, 4096), dtype: float32 |
|
11%|βββββββββββββββββ | 21/195 [01:07<02:20, 1.24it/s]
12%|ββββββββββββββββββ | 23/195 [01:08<01:37, 1.76it/s]
[2024-04-18 16:01:14] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.2.self_attn.o_proj.weight[0m", shape: (4096, 4096), dtype: float32 |
|
12%|ββββββββββββββββββ | 23/195 [01:08<01:37, 1.76it/s]
12%|βββββββββββββββββββ | 24/195 [01:08<01:24, 2.04it/s]
[2024-04-18 16:01:15] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.3.input_layernorm.weight[0m", shape: (4096,), dtype: float32 |
|
12%|βββββββββββββββββββ | 24/195 [01:08<01:24, 2.04it/s]
[2024-04-18 16:01:15] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.3.mlp.down_proj.weight[0m", shape: (4096, 14336), dtype: float32 |
|
12%|βββββββββββββββββββ | 24/195 [01:08<01:24, 2.04it/s]
13%|βββββββββββββββββββββ | 26/195 [01:08<01:14, 2.27it/s]
[2024-04-18 16:01:15] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.3.mlp.gate_up_proj.weight[0m", shape: (28672, 4096), dtype: float32 |
|
13%|βββββββββββββββββββββ | 26/195 [01:09<01:14, 2.27it/s]
14%|βββββββββββββββββββββ | 27/195 [01:10<01:50, 1.51it/s]
[2024-04-18 16:01:17] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.3.post_attention_layernorm.weight[0m", shape: (4096,), dtype: float32 |
|
14%|βββββββββββββββββββββ | 27/195 [01:10<01:50, 1.51it/s]
[2024-04-18 16:01:17] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.3.self_attn.qkv_proj.weight[0m", shape: (6144, 4096), dtype: float32 |
|
14%|βββββββββββββββββββββ | 27/195 [01:10<01:50, 1.51it/s]
15%|βββββββββββββββββββββββ | 29/195 [01:10<01:19, 2.09it/s]
[2024-04-18 16:01:17] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.3.self_attn.o_proj.weight[0m", shape: (4096, 4096), dtype: float32 |
|
15%|βββββββββββββββββββββββ | 29/195 [01:10<01:19, 2.09it/s]
15%|ββββββββββββββββββββββββ | 30/195 [01:10<01:09, 2.36it/s]
[2024-04-18 16:01:17] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.4.input_layernorm.weight[0m", shape: (4096,), dtype: float32 |
|
15%|ββββββββββββββββββββββββ | 30/195 [01:10<01:09, 2.36it/s]
[2024-04-18 16:01:17] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.4.mlp.down_proj.weight[0m", shape: (4096, 14336), dtype: float32 |
|
15%|ββββββββββββββββββββββββ | 30/195 [01:11<01:09, 2.36it/s]
16%|βββββββββββββββββββββββββ | 32/195 [01:11<01:04, 2.51it/s]
[2024-04-18 16:01:18] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.4.mlp.gate_up_proj.weight[0m", shape: (28672, 4096), dtype: float32 |
|
16%|βββββββββββββββββββββββββ | 32/195 [01:11<01:04, 2.51it/s]
17%|ββββββββββββββββββββββββββ | 33/195 [01:13<01:40, 1.61it/s]
[2024-04-18 16:01:19] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.4.post_attention_layernorm.weight[0m", shape: (4096,), dtype: float32 |
|
17%|ββββββββββββββββββββββββββ | 33/195 [01:13<01:40, 1.61it/s]
[2024-04-18 16:01:20] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.4.self_attn.qkv_proj.weight[0m", shape: (6144, 4096), dtype: float32 |
|
17%|ββββββββββββββββββββββββββ | 33/195 [01:13<01:40, 1.61it/s]
18%|βββββββββββββββββββββββββββ | 35/195 [01:13<01:12, 2.20it/s]
[2024-04-18 16:01:20] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.4.self_attn.o_proj.weight[0m", shape: (4096, 4096), dtype: float32 |
|
18%|βββββββββββββββββββββββββββ | 35/195 [01:13<01:12, 2.20it/s]
18%|ββββββββββββββββββββββββββββ | 36/195 [01:13<01:04, 2.48it/s]
[2024-04-18 16:01:20] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.5.input_layernorm.weight[0m", shape: (4096,), dtype: float32 |
|
18%|ββββββββββββββββββββββββββββ | 36/195 [01:13<01:04, 2.48it/s]
[2024-04-18 16:01:20] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.5.mlp.down_proj.weight[0m", shape: (4096, 14336), dtype: float32 |
|
18%|ββββββββββββββββββββββββββββ | 36/195 [01:13<01:04, 2.48it/s]
19%|ββββββββββββββββββββββββββββββ | 38/195 [01:14<01:00, 2.61it/s]
[2024-04-18 16:01:21] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.5.mlp.gate_up_proj.weight[0m", shape: (28672, 4096), dtype: float32 |
|
19%|ββββββββββββββββββββββββββββββ | 38/195 [01:14<01:00, 2.61it/s]
20%|βββββββββββββββββββββββββββββββ | 39/195 [01:15<01:34, 1.65it/s]
[2024-04-18 16:01:22] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.5.post_attention_layernorm.weight[0m", shape: (4096,), dtype: float32 |
|
20%|βββββββββββββββββββββββββββββββ | 39/195 [01:15<01:34, 1.65it/s]
[2024-04-18 16:01:22] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.5.self_attn.qkv_proj.weight[0m", shape: (6144, 4096), dtype: float32 |
|
20%|βββββββββββββββββββββββββββββββ | 39/195 [01:15<01:34, 1.65it/s]
21%|ββββββββββββββββββββββββββββββββ | 41/195 [01:16<01:08, 2.26it/s]
[2024-04-18 16:01:23] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.5.self_attn.o_proj.weight[0m", shape: (4096, 4096), dtype: float32 |
|
21%|ββββββββββββββββββββββββββββββββ | 41/195 [01:16<01:08, 2.26it/s]
22%|βββββββββββββββββββββββββββββββββ | 42/195 [01:16<01:00, 2.55it/s]
[2024-04-18 16:01:23] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.6.input_layernorm.weight[0m", shape: (4096,), dtype: float32 |
|
22%|βββββββββββββββββββββββββββββββββ | 42/195 [01:16<01:00, 2.55it/s]
[2024-04-18 16:01:23] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.6.mlp.down_proj.weight[0m", shape: (4096, 14336), dtype: float32 |
|
22%|βββββββββββββββββββββββββββββββββ | 42/195 [01:16<01:00, 2.55it/s]
23%|ββββββββββββββββββββββββββββββββββ | 44/195 [01:17<00:56, 2.68it/s]
[2024-04-18 16:01:24] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.6.mlp.gate_up_proj.weight[0m", shape: (28672, 4096), dtype: float32 |
|
23%|ββββββββββββββββββββββββββββββββββ | 44/195 [01:17<00:56, 2.68it/s]
23%|βββββββββββββββββββββββββββββββββββ | 45/195 [01:18<01:28, 1.69it/s]
[2024-04-18 16:01:25] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.6.post_attention_layernorm.weight[0m", shape: (4096,), dtype: float32 |
|
23%|βββββββββββββββββββββββββββββββββββ | 45/195 [01:18<01:28, 1.69it/s]
[2024-04-18 16:01:25] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.6.self_attn.qkv_proj.weight[0m", shape: (6144, 4096), dtype: float32 |
|
23%|βββββββββββββββββββββββββββββββββββ | 45/195 [01:18<01:28, 1.69it/s]
24%|βββββββββββββββββββββββββββββββββββββ | 47/195 [01:18<01:03, 2.32it/s]
[2024-04-18 16:01:25] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.6.self_attn.o_proj.weight[0m", shape: (4096, 4096), dtype: float32 |
|
24%|βββββββββββββββββββββββββββββββββββββ | 47/195 [01:18<01:03, 2.32it/s]
25%|ββββββββββββββββββββββββββββββββββββββ | 48/195 [01:19<00:56, 2.62it/s]
[2024-04-18 16:01:25] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.7.input_layernorm.weight[0m", shape: (4096,), dtype: float32 |
|
25%|ββββββββββββββββββββββββββββββββββββββ | 48/195 [01:19<00:56, 2.62it/s]
[2024-04-18 16:01:25] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.7.mlp.down_proj.weight[0m", shape: (4096, 14336), dtype: float32 |
|
25%|ββββββββββββββββββββββββββββββββββββββ | 48/195 [01:19<00:56, 2.62it/s]
26%|βββββββββββββββββββββββββββββββββββββββ | 50/195 [01:19<00:52, 2.74it/s]
[2024-04-18 16:01:26] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.7.mlp.gate_up_proj.weight[0m", shape: (28672, 4096), dtype: float32 |
|
26%|βββββββββββββββββββββββββββββββββββββββ | 50/195 [01:19<00:52, 2.74it/s]
26%|ββββββββββββββββββββββββββββββββββββββββ | 51/195 [01:21<01:24, 1.71it/s]
[2024-04-18 16:01:27] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.7.post_attention_layernorm.weight[0m", shape: (4096,), dtype: float32 |
|
26%|ββββββββββββββββββββββββββββββββββββββββ | 51/195 [01:21<01:24, 1.71it/s]
[2024-04-18 16:01:27] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.7.self_attn.qkv_proj.weight[0m", shape: (6144, 4096), dtype: float32 |
|
26%|ββββββββββββββββββββββββββββββββββββββββ | 51/195 [01:21<01:24, 1.71it/s]
27%|βββββββββββββββββββββββββββββββββββββββββ | 53/195 [01:21<01:00, 2.35it/s]
[2024-04-18 16:01:28] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.7.self_attn.o_proj.weight[0m", shape: (4096, 4096), dtype: float32 |
|
27%|βββββββββββββββββββββββββββββββββββββββββ | 53/195 [01:21<01:00, 2.35it/s]
28%|ββββββββββββββββββββββββββββββββββββββββββ | 54/195 [01:21<00:53, 2.65it/s]
[2024-04-18 16:01:28] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.8.input_layernorm.weight[0m", shape: (4096,), dtype: float32 |
|
28%|ββββββββββββββββββββββββββββββββββββββββββ | 54/195 [01:21<00:53, 2.65it/s]
[2024-04-18 16:01:28] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.8.mlp.down_proj.weight[0m", shape: (4096, 14336), dtype: float32 |
|
28%|ββββββββββββββββββββββββββββββββββββββββββ | 54/195 [01:21<00:53, 2.65it/s]
29%|ββββββββββββββββββββββββββββββββββββββββββββ | 56/195 [01:22<00:49, 2.78it/s]
[2024-04-18 16:01:29] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.8.mlp.gate_up_proj.weight[0m", shape: (28672, 4096), dtype: float32 |
|
29%|ββββββββββββββββββββββββββββββββββββββββββββ | 56/195 [01:22<00:49, 2.78it/s]
29%|βββββββββββββββββββββββββββββββββββββββββββββ | 57/195 [01:23<01:19, 1.74it/s]
[2024-04-18 16:01:30] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.8.post_attention_layernorm.weight[0m", shape: (4096,), dtype: float32 |
|
29%|βββββββββββββββββββββββββββββββββββββββββββββ | 57/195 [01:23<01:19, 1.74it/s]
[2024-04-18 16:01:30] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.8.self_attn.qkv_proj.weight[0m", shape: (6144, 4096), dtype: float32 |
|
29%|βββββββββββββββββββββββββββββββββββββββββββββ | 57/195 [01:23<01:19, 1.74it/s]
30%|ββββββββββββββββββββββββββββββββββββββββββββββ | 59/195 [01:23<00:56, 2.39it/s]
[2024-04-18 16:01:30] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.8.self_attn.o_proj.weight[0m", shape: (4096, 4096), dtype: float32 |
|
30%|ββββββββββββββββββββββββββββββββββββββββββββββ | 59/195 [01:23<00:56, 2.39it/s]
31%|βββββββββββββββββββββββββββββββββββββββββββββββ | 60/195 [01:24<00:50, 2.70it/s]
[2024-04-18 16:01:31] INFO huggingface_loader.py:196: Unloading HF weight file: /models/Meta-Llama-3-8B-Instruct/model-00001-of-00004.safetensors |
|
31%|βββββββββββββββββββββββββββββββββββββββββββββββ | 60/195 [01:24<00:50, 2.70it/s]
[2024-04-18 16:01:31] INFO huggingface_loader.py:184: Loading HF parameters from: /models/Meta-Llama-3-8B-Instruct/model-00002-of-00004.safetensors |
|
31%|βββββββββββββββββββββββββββββββββββββββββββββββ | 60/195 [01:24<00:50, 2.70it/s]
[2024-04-18 16:01:42] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.10.input_layernorm.weight[0m", shape: (4096,), dtype: float32 |
|
31%|βββββββββββββββββββββββββββββββββββββββββββββββ | 60/195 [01:35<00:50, 2.70it/s]
31%|ββββββββββββββββββββββββββββββββββββββββββββββββ | 61/195 [01:35<06:37, 2.97s/it]
[2024-04-18 16:01:42] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.10.mlp.down_proj.weight[0m", shape: (4096, 14336), dtype: float32 |
|
31%|ββββββββββββββββββββββββββββββββββββββββββββββββ | 61/195 [01:35<06:37, 2.97s/it]
32%|ββββββββββββββββββββββββββββββββββββββββββββββββ | 62/195 [01:36<05:17, 2.39s/it]
[2024-04-18 16:01:43] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.10.mlp.gate_up_proj.weight[0m", shape: (28672, 4096), dtype: float32 |
|
32%|ββββββββββββββββββββββββββββββββββββββββββββββββ | 62/195 [01:36<05:17, 2.39s/it]
32%|βββββββββββββββββββββββββββββββββββββββββββββββββ | 63/195 [01:37<04:44, 2.16s/it]
[2024-04-18 16:01:44] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.10.post_attention_layernorm.weight[0m", shape: (4096,), dtype: float32 |
|
32%|βββββββββββββββββββββββββββββββββββββββββββββββββ | 63/195 [01:37<04:44, 2.16s/it]
[2024-04-18 16:01:44] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.10.self_attn.qkv_proj.weight[0m", shape: (6144, 4096), dtype: float32 |
|
32%|βββββββββββββββββββββββββββββββββββββββββββββββββ | 63/195 [01:37<04:44, 2.16s/it]
33%|βββββββββββββββββββββββββββββββββββββββββββββββββββ | 65/195 [01:38<02:48, 1.30s/it]
[2024-04-18 16:01:45] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.10.self_attn.o_proj.weight[0m", shape: (4096, 4096), dtype: float32 |
|
33%|βββββββββββββββββββββββββββββββββββββββββββββββββββ | 65/195 [01:38<02:48, 1.30s/it]
34%|βββββββββββββββββββββββββββββββββββββββββββββββββββ | 66/195 [01:38<02:14, 1.04s/it]
[2024-04-18 16:01:45] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.11.input_layernorm.weight[0m", shape: (4096,), dtype: float32 |
|
34%|βββββββββββββββββββββββββββββββββββββββββββββββββββ | 66/195 [01:38<02:14, 1.04s/it]
[2024-04-18 16:01:45] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.11.mlp.down_proj.weight[0m", shape: (4096, 14336), dtype: float32 |
|
34%|βββββββββββββββββββββββββββββββββββββββββββββββββββ | 66/195 [01:38<02:14, 1.04s/it]
35%|βββββββββββββββββββββββββββββββββββββββββββββββββββββ | 68/195 [01:39<01:35, 1.32it/s]
[2024-04-18 16:01:46] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.11.mlp.gate_up_proj.weight[0m", shape: (28672, 4096), dtype: float32 |
|
35%|βββββββββββββββββββββββββββββββββββββββββββββββββββββ | 68/195 [01:39<01:35, 1.32it/s]
35%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 69/195 [01:41<02:13, 1.06s/it]
[2024-04-18 16:01:47] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.11.post_attention_layernorm.weight[0m", shape: (4096,), dtype: float32 |
|
35%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 69/195 [01:41<02:13, 1.06s/it]
[2024-04-18 16:01:48] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.11.self_attn.qkv_proj.weight[0m", shape: (6144, 4096), dtype: float32 |
|
35%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 69/195 [01:41<02:13, 1.06s/it]
36%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 71/195 [01:41<01:27, 1.41it/s]
[2024-04-18 16:01:48] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.11.self_attn.o_proj.weight[0m", shape: (4096, 4096), dtype: float32 |
|
36%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 71/195 [01:41<01:27, 1.41it/s]
37%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 72/195 [01:41<01:13, 1.67it/s]
[2024-04-18 16:01:48] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.12.input_layernorm.weight[0m", shape: (4096,), dtype: float32 |
|
37%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 72/195 [01:41<01:13, 1.67it/s]
[2024-04-18 16:01:48] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.12.mlp.down_proj.weight[0m", shape: (4096, 14336), dtype: float32 |
|
37%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 72/195 [01:41<01:13, 1.67it/s]
38%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 74/195 [01:42<00:59, 2.02it/s]
[2024-04-18 16:01:49] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.12.mlp.gate_up_proj.weight[0m", shape: (28672, 4096), dtype: float32 |
|
38%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 74/195 [01:42<00:59, 2.02it/s]
38%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 75/195 [01:45<02:01, 1.02s/it]
[2024-04-18 16:01:52] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.12.post_attention_layernorm.weight[0m", shape: (4096,), dtype: float32 |
|
38%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 75/195 [01:45<02:01, 1.02s/it]
[2024-04-18 16:01:52] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.12.self_attn.qkv_proj.weight[0m", shape: (6144, 4096), dtype: float32 |
|
38%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 75/195 [01:45<02:01, 1.02s/it]
39%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 77/195 [01:45<01:21, 1.44it/s]
[2024-04-18 16:01:52] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.12.self_attn.o_proj.weight[0m", shape: (4096, 4096), dtype: float32 |
|
39%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 77/195 [01:45<01:21, 1.44it/s]
40%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 78/195 [01:45<01:08, 1.70it/s]
[2024-04-18 16:01:52] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.13.input_layernorm.weight[0m", shape: (4096,), dtype: float32 |
|
40%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 78/195 [01:45<01:08, 1.70it/s]
[2024-04-18 16:01:52] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.13.mlp.down_proj.weight[0m", shape: (4096, 14336), dtype: float32 |
|
40%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 78/195 [01:45<01:08, 1.70it/s]
41%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 80/195 [01:46<00:57, 1.99it/s]
[2024-04-18 16:01:53] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.13.mlp.gate_up_proj.weight[0m", shape: (28672, 4096), dtype: float32 |
|
41%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 80/195 [01:47<00:57, 1.99it/s]
42%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 81/195 [01:49<02:10, 1.14s/it]
[2024-04-18 16:01:56] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.13.post_attention_layernorm.weight[0m", shape: (4096,), dtype: float32 |
|
42%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 81/195 [01:49<02:10, 1.14s/it]
[2024-04-18 16:01:56] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.13.self_attn.qkv_proj.weight[0m", shape: (6144, 4096), dtype: float32 |
|
42%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 81/195 [01:50<02:10, 1.14s/it]
43%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 83/195 [01:50<01:26, 1.29it/s]
[2024-04-18 16:01:57] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.13.self_attn.o_proj.weight[0m", shape: (4096, 4096), dtype: float32 |
|
43%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 83/195 [01:50<01:26, 1.29it/s]
43%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 84/195 [01:50<01:12, 1.53it/s]
[2024-04-18 16:01:57] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.14.input_layernorm.weight[0m", shape: (4096,), dtype: float32 |
|
43%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 84/195 [01:50<01:12, 1.53it/s]
[2024-04-18 16:01:57] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.14.mlp.down_proj.weight[0m", shape: (4096, 14336), dtype: float32 |
|
43%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 84/195 [01:50<01:12, 1.53it/s]
44%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 86/195 [01:51<01:04, 1.69it/s]
[2024-04-18 16:01:58] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.14.mlp.gate_up_proj.weight[0m", shape: (28672, 4096), dtype: float32 |
|
44%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 86/195 [01:52<01:04, 1.69it/s]
45%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 87/195 [01:54<02:06, 1.18s/it]
[2024-04-18 16:02:01] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.14.post_attention_layernorm.weight[0m", shape: (4096,), dtype: float32 |
|
45%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 87/195 [01:54<02:06, 1.18s/it]
[2024-04-18 16:02:01] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.14.self_attn.qkv_proj.weight[0m", shape: (6144, 4096), dtype: float32 |
|
45%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 87/195 [01:54<02:06, 1.18s/it]
46%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 89/195 [01:55<01:24, 1.25it/s]
[2024-04-18 16:02:02] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.14.self_attn.o_proj.weight[0m", shape: (4096, 4096), dtype: float32 |
|
46%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 89/195 [01:55<01:24, 1.25it/s]
46%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 90/195 [01:55<01:10, 1.49it/s]
[2024-04-18 16:02:02] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.15.input_layernorm.weight[0m", shape: (4096,), dtype: float32 |
|
46%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 90/195 [01:55<01:10, 1.49it/s]
[2024-04-18 16:02:02] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.15.mlp.down_proj.weight[0m", shape: (4096, 14336), dtype: float32 |
|
46%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 90/195 [01:55<01:10, 1.49it/s]
47%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 92/195 [01:56<01:01, 1.67it/s]
[2024-04-18 16:02:03] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.15.mlp.gate_up_proj.weight[0m", shape: (28672, 4096), dtype: float32 |
|
47%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 92/195 [01:56<01:01, 1.67it/s]
48%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 93/195 [01:59<01:59, 1.17s/it]
[2024-04-18 16:02:06] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.15.post_attention_layernorm.weight[0m", shape: (4096,), dtype: float32 |
|
48%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 93/195 [01:59<01:59, 1.17s/it]
[2024-04-18 16:02:06] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.15.self_attn.qkv_proj.weight[0m", shape: (6144, 4096), dtype: float32 |
|
48%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 93/195 [01:59<01:59, 1.17s/it]
49%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 95/195 [02:00<01:19, 1.26it/s]
[2024-04-18 16:02:06] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.15.self_attn.o_proj.weight[0m", shape: (4096, 4096), dtype: float32 |
|
49%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 95/195 [02:00<01:19, 1.26it/s]
49%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 96/195 [02:00<01:06, 1.49it/s]
[2024-04-18 16:02:07] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.16.input_layernorm.weight[0m", shape: (4096,), dtype: float32 |
|
49%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 96/195 [02:00<01:06, 1.49it/s]
[2024-04-18 16:02:07] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.16.mlp.down_proj.weight[0m", shape: (4096, 14336), dtype: float32 |
|
49%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 96/195 [02:00<01:06, 1.49it/s]
50%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 98/195 [02:01<01:01, 1.58it/s]
[2024-04-18 16:02:08] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.16.mlp.gate_up_proj.weight[0m", shape: (28672, 4096), dtype: float32 |
|
50%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 98/195 [02:01<01:01, 1.58it/s]
51%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 99/195 [02:04<01:53, 1.18s/it]
[2024-04-18 16:02:11] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.16.post_attention_layernorm.weight[0m", shape: (4096,), dtype: float32 |
|
51%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 99/195 [02:04<01:53, 1.18s/it]
[2024-04-18 16:02:11] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.16.self_attn.qkv_proj.weight[0m", shape: (6144, 4096), dtype: float32 |
|
51%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 99/195 [02:04<01:53, 1.18s/it]
52%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 101/195 [02:04<01:15, 1.24it/s]
[2024-04-18 16:02:11] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.16.self_attn.o_proj.weight[0m", shape: (4096, 4096), dtype: float32 |
|
52%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 101/195 [02:04<01:15, 1.24it/s]
52%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 102/195 [02:05<01:03, 1.47it/s]
[2024-04-18 16:02:11] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.17.input_layernorm.weight[0m", shape: (4096,), dtype: float32 |
|
52%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 102/195 [02:05<01:03, 1.47it/s]
[2024-04-18 16:02:12] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.17.mlp.down_proj.weight[0m", shape: (4096, 14336), dtype: float32 |
|
52%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 102/195 [02:05<01:03, 1.47it/s]
53%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 104/195 [02:06<00:59, 1.54it/s]
[2024-04-18 16:02:13] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.17.mlp.gate_up_proj.weight[0m", shape: (28672, 4096), dtype: float32 |
|
53%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 104/195 [02:06<00:59, 1.54it/s]
54%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 105/195 [02:09<01:48, 1.20s/it]
[2024-04-18 16:02:16] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.17.post_attention_layernorm.weight[0m", shape: (4096,), dtype: float32 |
|
54%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 105/195 [02:09<01:48, 1.20s/it]
[2024-04-18 16:02:16] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.17.self_attn.qkv_proj.weight[0m", shape: (6144, 4096), dtype: float32 |
|
54%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 105/195 [02:09<01:48, 1.20s/it]
55%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 107/195 [02:09<01:11, 1.23it/s]
[2024-04-18 16:02:16] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.17.self_attn.o_proj.weight[0m", shape: (4096, 4096), dtype: float32 |
|
55%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 107/195 [02:09<01:11, 1.23it/s]
55%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 108/195 [02:10<00:59, 1.46it/s]
[2024-04-18 16:02:17] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.18.input_layernorm.weight[0m", shape: (4096,), dtype: float32 |
|
55%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 108/195 [02:10<00:59, 1.46it/s]
[2024-04-18 16:02:17] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.18.mlp.down_proj.weight[0m", shape: (4096, 14336), dtype: float32 |
|
55%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 108/195 [02:10<00:59, 1.46it/s]
56%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 110/195 [02:11<00:53, 1.58it/s]
[2024-04-18 16:02:18] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.18.mlp.gate_up_proj.weight[0m", shape: (28672, 4096), dtype: float32 |
|
56%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 110/195 [02:11<00:53, 1.58it/s]
57%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 111/195 [02:14<01:37, 1.17s/it]
[2024-04-18 16:02:21] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.18.post_attention_layernorm.weight[0m", shape: (4096,), dtype: float32 |
|
57%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 111/195 [02:14<01:37, 1.17s/it]
[2024-04-18 16:02:21] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.18.self_attn.qkv_proj.weight[0m", shape: (6144, 4096), dtype: float32 |
|
57%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 111/195 [02:14<01:37, 1.17s/it]
58%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 113/195 [02:14<01:04, 1.27it/s]
[2024-04-18 16:02:21] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.18.self_attn.o_proj.weight[0m", shape: (4096, 4096), dtype: float32 |
|
58%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 113/195 [02:14<01:04, 1.27it/s]
58%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 114/195 [02:14<00:53, 1.50it/s]
[2024-04-18 16:02:21] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.19.input_layernorm.weight[0m", shape: (4096,), dtype: float32 |
|
58%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 114/195 [02:14<00:53, 1.50it/s]
[2024-04-18 16:02:21] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.19.mlp.down_proj.weight[0m", shape: (4096, 14336), dtype: float32 |
|
58%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 114/195 [02:15<00:53, 1.50it/s]
59%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 116/195 [02:16<00:48, 1.61it/s]
[2024-04-18 16:02:23] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.19.mlp.gate_up_proj.weight[0m", shape: (28672, 4096), dtype: float32 |
|
59%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 116/195 [02:16<00:48, 1.61it/s]
60%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 117/195 [02:19<01:33, 1.19s/it]
[2024-04-18 16:02:26] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.19.post_attention_layernorm.weight[0m", shape: (4096,), dtype: float32 |
|
60%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 117/195 [02:19<01:33, 1.19s/it]
[2024-04-18 16:02:26] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.19.self_attn.qkv_proj.weight[0m", shape: (6144, 4096), dtype: float32 |
|
60%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 117/195 [02:19<01:33, 1.19s/it]
61%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 119/195 [02:19<01:01, 1.24it/s]
[2024-04-18 16:02:26] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.19.self_attn.o_proj.weight[0m", shape: (4096, 4096), dtype: float32 |
|
61%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 119/195 [02:19<01:01, 1.24it/s]
62%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 120/195 [02:19<00:51, 1.46it/s]
[2024-04-18 16:02:26] INFO huggingface_loader.py:184: Loading HF parameters from: /models/Meta-Llama-3-8B-Instruct/model-00003-of-00004.safetensors |
|
62%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 120/195 [02:19<00:51, 1.46it/s]
[2024-04-18 16:02:38] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.20.mlp.gate_up_proj.weight[0m", shape: (28672, 4096), dtype: float32 |
|
62%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 120/195 [02:31<00:51, 1.46it/s]
62%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 121/195 [02:34<04:56, 4.01s/it]
[2024-04-18 16:02:41] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.20.self_attn.qkv_proj.weight[0m", shape: (6144, 4096), dtype: float32 |
|
62%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 121/195 [02:35<04:56, 4.01s/it]
63%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 122/195 [02:35<03:59, 3.28s/it]
[2024-04-18 16:02:42] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.20.self_attn.o_proj.weight[0m", shape: (4096, 4096), dtype: float32 |
|
63%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 122/195 [02:36<03:59, 3.28s/it]
63%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 123/195 [02:36<03:02, 2.53s/it]
[2024-04-18 16:02:43] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.9.input_layernorm.weight[0m", shape: (4096,), dtype: float32 |
|
63%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 123/195 [02:36<03:02, 2.53s/it]
[2024-04-18 16:02:43] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.9.mlp.down_proj.weight[0m", shape: (4096, 14336), dtype: float32 |
|
63%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 123/195 [02:36<03:02, 2.53s/it]
64%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 125/195 [02:38<02:05, 1.79s/it]
[2024-04-18 16:02:45] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.9.mlp.gate_up_proj.weight[0m", shape: (28672, 4096), dtype: float32 |
|
64%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 125/195 [02:38<02:05, 1.79s/it]
65%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 126/195 [02:41<02:33, 2.22s/it]
[2024-04-18 16:02:48] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.9.post_attention_layernorm.weight[0m", shape: (4096,), dtype: float32 |
|
65%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 126/195 [02:41<02:33, 2.22s/it]
[2024-04-18 16:02:48] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.9.self_attn.qkv_proj.weight[0m", shape: (6144, 4096), dtype: float32 |
|
65%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 126/195 [02:41<02:33, 2.22s/it]
66%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 128/195 [02:42<01:38, 1.47s/it]
[2024-04-18 16:02:49] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.9.self_attn.o_proj.weight[0m", shape: (4096, 4096), dtype: float32 |
|
66%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 128/195 [02:42<01:38, 1.47s/it]
66%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 129/195 [02:42<01:18, 1.19s/it]
[2024-04-18 16:02:49] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.20.input_layernorm.weight[0m", shape: (4096,), dtype: float32 |
|
66%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 129/195 [02:42<01:18, 1.19s/it]
[2024-04-18 16:02:49] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.20.mlp.down_proj.weight[0m", shape: (4096, 14336), dtype: float32 |
|
66%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 129/195 [02:42<01:18, 1.19s/it]
67%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 131/195 [02:44<01:06, 1.04s/it]
[2024-04-18 16:02:51] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.20.post_attention_layernorm.weight[0m", shape: (4096,), dtype: float32 |
|
67%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 131/195 [02:44<01:06, 1.04s/it]
[2024-04-18 16:02:51] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.21.input_layernorm.weight[0m", shape: (4096,), dtype: float32 |
|
67%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 131/195 [02:44<01:06, 1.04s/it]
[2024-04-18 16:02:51] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.21.mlp.down_proj.weight[0m", shape: (4096, 14336), dtype: float32 |
|
67%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 131/195 [02:44<01:06, 1.04s/it]
69%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 134/195 [02:45<00:47, 1.28it/s]
[2024-04-18 16:02:53] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.21.mlp.gate_up_proj.weight[0m", shape: (28672, 4096), dtype: float32 |
|
69%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 134/195 [02:46<00:47, 1.28it/s]
69%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 135/195 [02:49<01:15, 1.26s/it]
[2024-04-18 16:02:55] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.21.post_attention_layernorm.weight[0m", shape: (4096,), dtype: float32 |
|
69%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 135/195 [02:49<01:15, 1.26s/it]
[2024-04-18 16:02:55] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.21.self_attn.qkv_proj.weight[0m", shape: (6144, 4096), dtype: float32 |
|
69%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 135/195 [02:49<01:15, 1.26s/it]
70%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 137/195 [02:49<00:52, 1.11it/s]
[2024-04-18 16:02:56] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.21.self_attn.o_proj.weight[0m", shape: (4096, 4096), dtype: float32 |
|
70%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 137/195 [02:49<00:52, 1.11it/s]
71%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 138/195 [02:49<00:43, 1.30it/s]
[2024-04-18 16:02:56] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.22.input_layernorm.weight[0m", shape: (4096,), dtype: float32 |
|
71%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 138/195 [02:49<00:43, 1.30it/s]
[2024-04-18 16:02:56] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.22.mlp.down_proj.weight[0m", shape: (4096, 14336), dtype: float32 |
|
71%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 138/195 [02:49<00:43, 1.30it/s]
72%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 140/195 [02:50<00:38, 1.42it/s]
[2024-04-18 16:02:58] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.22.mlp.gate_up_proj.weight[0m", shape: (28672, 4096), dtype: float32 |
|
72%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 140/195 [02:51<00:38, 1.42it/s]
72%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 141/195 [02:54<01:09, 1.29s/it]
[2024-04-18 16:03:01] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.22.post_attention_layernorm.weight[0m", shape: (4096,), dtype: float32 |
|
72%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 141/195 [02:54<01:09, 1.29s/it]
[2024-04-18 16:03:01] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.22.self_attn.qkv_proj.weight[0m", shape: (6144, 4096), dtype: float32 |
|
72%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 141/195 [02:54<01:09, 1.29s/it]
73%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 143/195 [02:54<00:46, 1.12it/s]
[2024-04-18 16:03:01] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.22.self_attn.o_proj.weight[0m", shape: (4096, 4096), dtype: float32 |
|
73%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 143/195 [02:54<00:46, 1.12it/s]
74%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 144/195 [02:55<00:38, 1.33it/s]
[2024-04-18 16:03:01] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.23.input_layernorm.weight[0m", shape: (4096,), dtype: float32 |
|
74%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 144/195 [02:55<00:38, 1.33it/s]
[2024-04-18 16:03:01] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.23.mlp.down_proj.weight[0m", shape: (4096, 14336), dtype: float32 |
|
74%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 144/195 [02:55<00:38, 1.33it/s]
75%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 146/195 [02:56<00:34, 1.41it/s]
[2024-04-18 16:03:03] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.23.mlp.gate_up_proj.weight[0m", shape: (28672, 4096), dtype: float32 |
|
75%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 146/195 [02:57<00:34, 1.41it/s]
75%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 147/195 [02:59<01:03, 1.32s/it]
[2024-04-18 16:03:06] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.23.post_attention_layernorm.weight[0m", shape: (4096,), dtype: float32 |
|
75%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 147/195 [02:59<01:03, 1.32s/it]
[2024-04-18 16:03:06] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.23.self_attn.qkv_proj.weight[0m", shape: (6144, 4096), dtype: float32 |
|
75%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 147/195 [03:00<01:03, 1.32s/it]
76%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 149/195 [03:00<00:41, 1.12it/s]
[2024-04-18 16:03:07] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.23.self_attn.o_proj.weight[0m", shape: (4096, 4096), dtype: float32 |
|
76%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 149/195 [03:00<00:41, 1.12it/s]
77%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 150/195 [03:00<00:33, 1.33it/s]
[2024-04-18 16:03:07] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.24.input_layernorm.weight[0m", shape: (4096,), dtype: float32 |
|
77%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 150/195 [03:00<00:33, 1.33it/s]
[2024-04-18 16:03:07] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.24.mlp.down_proj.weight[0m", shape: (4096, 14336), dtype: float32 |
|
77%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 150/195 [03:00<00:33, 1.33it/s]
78%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 152/195 [03:01<00:30, 1.40it/s]
[2024-04-18 16:03:09] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.24.mlp.gate_up_proj.weight[0m", shape: (28672, 4096), dtype: float32 |
|
78%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 152/195 [03:02<00:30, 1.40it/s]
78%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 153/195 [03:05<00:52, 1.25s/it]
[2024-04-18 16:03:11] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.24.post_attention_layernorm.weight[0m", shape: (4096,), dtype: float32 |
|
78%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 153/195 [03:05<00:52, 1.25s/it]
[2024-04-18 16:03:12] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.24.self_attn.qkv_proj.weight[0m", shape: (6144, 4096), dtype: float32 |
|
78%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 153/195 [03:05<00:52, 1.25s/it]
79%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 155/195 [03:05<00:33, 1.18it/s]
[2024-04-18 16:03:12] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.24.self_attn.o_proj.weight[0m", shape: (4096, 4096), dtype: float32 |
|
79%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 155/195 [03:05<00:33, 1.18it/s]
80%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 156/195 [03:05<00:27, 1.40it/s]
[2024-04-18 16:03:12] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.25.input_layernorm.weight[0m", shape: (4096,), dtype: float32 |
|
80%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 156/195 [03:05<00:27, 1.40it/s]
[2024-04-18 16:03:12] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.25.mlp.down_proj.weight[0m", shape: (4096, 14336), dtype: float32 |
|
80%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 156/195 [03:05<00:27, 1.40it/s]
81%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 158/195 [03:06<00:24, 1.53it/s]
[2024-04-18 16:03:14] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.25.mlp.gate_up_proj.weight[0m", shape: (28672, 4096), dtype: float32 |
|
81%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 158/195 [03:07<00:24, 1.53it/s]
82%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 159/195 [03:10<00:44, 1.22s/it]
[2024-04-18 16:03:16] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.25.post_attention_layernorm.weight[0m", shape: (4096,), dtype: float32 |
|
82%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 159/195 [03:10<00:44, 1.22s/it]
[2024-04-18 16:03:17] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.25.self_attn.qkv_proj.weight[0m", shape: (6144, 4096), dtype: float32 |
|
82%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 159/195 [03:10<00:44, 1.22s/it]
83%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 161/195 [03:10<00:28, 1.21it/s]
[2024-04-18 16:03:17] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.25.self_attn.o_proj.weight[0m", shape: (4096, 4096), dtype: float32 |
|
83%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 161/195 [03:10<00:28, 1.21it/s]
83%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 162/195 [03:10<00:22, 1.44it/s]
[2024-04-18 16:03:17] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.26.input_layernorm.weight[0m", shape: (4096,), dtype: float32 |
|
83%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 162/195 [03:10<00:22, 1.44it/s]
[2024-04-18 16:03:17] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.26.mlp.down_proj.weight[0m", shape: (4096, 14336), dtype: float32 |
|
83%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 162/195 [03:10<00:22, 1.44it/s]
84%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 164/195 [03:11<00:20, 1.52it/s]
[2024-04-18 16:03:19] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.26.mlp.gate_up_proj.weight[0m", shape: (28672, 4096), dtype: float32 |
|
84%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 164/195 [03:12<00:20, 1.52it/s]
85%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 165/195 [03:15<00:36, 1.22s/it]
[2024-04-18 16:03:22] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.26.post_attention_layernorm.weight[0m", shape: (4096,), dtype: float32 |
|
85%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 165/195 [03:15<00:36, 1.22s/it]
[2024-04-18 16:03:22] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.26.self_attn.qkv_proj.weight[0m", shape: (6144, 4096), dtype: float32 |
|
85%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 165/195 [03:15<00:36, 1.22s/it]
86%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 167/195 [03:15<00:23, 1.21it/s]
[2024-04-18 16:03:22] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.26.self_attn.o_proj.weight[0m", shape: (4096, 4096), dtype: float32 |
|
86%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 167/195 [03:15<00:23, 1.21it/s]
86%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 168/195 [03:15<00:18, 1.44it/s]
[2024-04-18 16:03:22] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.27.input_layernorm.weight[0m", shape: (4096,), dtype: float32 |
|
86%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 168/195 [03:15<00:18, 1.44it/s]
[2024-04-18 16:03:22] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.27.mlp.down_proj.weight[0m", shape: (4096, 14336), dtype: float32 |
|
86%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 168/195 [03:15<00:18, 1.44it/s]
87%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 170/195 [03:16<00:15, 1.61it/s]
[2024-04-18 16:03:24] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.27.mlp.gate_up_proj.weight[0m", shape: (28672, 4096), dtype: float32 |
|
87%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 170/195 [03:17<00:15, 1.61it/s]
88%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 171/195 [03:19<00:28, 1.17s/it]
[2024-04-18 16:03:26] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.27.post_attention_layernorm.weight[0m", shape: (4096,), dtype: float32 |
|
88%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 171/195 [03:19<00:28, 1.17s/it]
[2024-04-18 16:03:26] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.27.self_attn.qkv_proj.weight[0m", shape: (6144, 4096), dtype: float32 |
|
88%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 171/195 [03:20<00:28, 1.17s/it]
89%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 173/195 [03:20<00:17, 1.26it/s]
[2024-04-18 16:03:27] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.27.self_attn.o_proj.weight[0m", shape: (4096, 4096), dtype: float32 |
|
89%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 173/195 [03:20<00:17, 1.26it/s]
89%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 174/195 [03:20<00:14, 1.50it/s]
[2024-04-18 16:03:27] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.28.input_layernorm.weight[0m", shape: (4096,), dtype: float32 |
|
89%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 174/195 [03:20<00:14, 1.50it/s]
[2024-04-18 16:03:27] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.28.mlp.down_proj.weight[0m", shape: (4096, 14336), dtype: float32 |
|
89%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 174/195 [03:20<00:14, 1.50it/s]
90%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 176/195 [03:21<00:11, 1.64it/s]
[2024-04-18 16:03:29] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.28.mlp.gate_up_proj.weight[0m", shape: (28672, 4096), dtype: float32 |
|
90%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 176/195 [03:22<00:11, 1.64it/s]
91%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 177/195 [03:25<00:22, 1.25s/it]
[2024-04-18 16:03:32] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.28.post_attention_layernorm.weight[0m", shape: (4096,), dtype: float32 |
|
91%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 177/195 [03:25<00:22, 1.25s/it]
[2024-04-18 16:03:32] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.28.self_attn.qkv_proj.weight[0m", shape: (6144, 4096), dtype: float32 |
|
91%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 177/195 [03:25<00:22, 1.25s/it]
92%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 179/195 [03:25<00:13, 1.18it/s]
[2024-04-18 16:03:32] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.28.self_attn.o_proj.weight[0m", shape: (4096, 4096), dtype: float32 |
|
92%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 179/195 [03:25<00:13, 1.18it/s]
92%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 180/195 [03:25<00:10, 1.41it/s]
[2024-04-18 16:03:32] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.29.input_layernorm.weight[0m", shape: (4096,), dtype: float32 |
|
92%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 180/195 [03:25<00:10, 1.41it/s]
[2024-04-18 16:03:32] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.29.mlp.down_proj.weight[0m", shape: (4096, 14336), dtype: float32 |
|
92%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 180/195 [03:25<00:10, 1.41it/s]
93%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 182/195 [03:26<00:08, 1.58it/s]
[2024-04-18 16:03:34] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.29.mlp.gate_up_proj.weight[0m", shape: (28672, 4096), dtype: float32 |
|
93%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 182/195 [03:27<00:08, 1.58it/s]
94%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 183/195 [03:29<00:14, 1.18s/it]
[2024-04-18 16:03:36] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.29.post_attention_layernorm.weight[0m", shape: (4096,), dtype: float32 |
|
94%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 183/195 [03:29<00:14, 1.18s/it]
[2024-04-18 16:03:36] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.29.self_attn.qkv_proj.weight[0m", shape: (6144, 4096), dtype: float32 |
|
94%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 183/195 [03:30<00:14, 1.18s/it]
95%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 185/195 [03:30<00:08, 1.25it/s]
[2024-04-18 16:03:37] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.29.self_attn.o_proj.weight[0m", shape: (4096, 4096), dtype: float32 |
|
95%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 185/195 [03:30<00:08, 1.25it/s]
95%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 186/195 [03:30<00:06, 1.48it/s]
[2024-04-18 16:03:37] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.30.input_layernorm.weight[0m", shape: (4096,), dtype: float32 |
|
95%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 186/195 [03:30<00:06, 1.48it/s]
[2024-04-18 16:03:37] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.30.mlp.down_proj.weight[0m", shape: (4096, 14336), dtype: float32 |
|
95%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 186/195 [03:30<00:06, 1.48it/s]
96%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 188/195 [03:31<00:04, 1.56it/s]
[2024-04-18 16:03:39] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.30.mlp.gate_up_proj.weight[0m", shape: (28672, 4096), dtype: float32 |
|
96%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 188/195 [03:32<00:04, 1.56it/s]
97%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 189/195 [03:34<00:07, 1.19s/it]
[2024-04-18 16:03:41] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.30.post_attention_layernorm.weight[0m", shape: (4096,), dtype: float32 |
|
97%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 189/195 [03:34<00:07, 1.19s/it]
[2024-04-18 16:03:41] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.30.self_attn.qkv_proj.weight[0m", shape: (6144, 4096), dtype: float32 |
|
97%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 189/195 [03:34<00:07, 1.19s/it]
98%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 191/195 [03:35<00:03, 1.24it/s]
[2024-04-18 16:03:42] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.30.self_attn.o_proj.weight[0m", shape: (4096, 4096), dtype: float32 |
|
98%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 191/195 [03:35<00:03, 1.24it/s]
98%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 192/195 [03:35<00:02, 1.47it/s]
[2024-04-18 16:03:42] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.31.mlp.gate_up_proj.weight[0m", shape: (28672, 4096), dtype: float32 |
|
98%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 192/195 [03:36<00:02, 1.47it/s]
99%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 193/195 [03:38<00:02, 1.32s/it]
[2024-04-18 16:03:45] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.31.self_attn.qkv_proj.weight[0m", shape: (6144, 4096), dtype: float32 |
|
99%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 193/195 [03:38<00:02, 1.32s/it]
99%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 194/195 [03:39<00:01, 1.09s/it]
[2024-04-18 16:03:46] INFO huggingface_loader.py:174: [Not quantized] Parameter: "[1mmodel.layers.31.self_attn.o_proj.weight[0m", shape: (4096, 4096), dtype: float32 |
|
99%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 194/195 [03:39<00:01, 1.09s/it]
100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 195/195 [03:39<00:00, 1.17it/s]
100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 195/195 [03:39<00:00, 1.13s/it] |
|
[2024-04-18 16:03:46] INFO huggingface_loader.py:196: Unloading HF weight file: /models/Meta-Llama-3-8B-Instruct/model-00002-of-00004.safetensors |
|
[2024-04-18 16:03:46] INFO huggingface_loader.py:196: Unloading HF weight file: /models/Meta-Llama-3-8B-Instruct/model-00003-of-00004.safetensors |
|
[2024-04-18 16:03:47] INFO stats.py:76: [92mTime usage[0m: HF loading: 36.734 sec; Pre-quantization mapping: 24.043 sec; Quantization: 0.000 sec |
|
[2024-04-18 16:03:47] INFO stats.py:90: [92mRAM usage[0m: Peak RAM: 18.469 GB. Total bytes loaded from disk: 29.915 GB |
|
[2024-04-18 16:03:47] INFO convert_weight.py:156: [92mParameter size[0m after quantization: 29.915 GB |
|
[2024-04-18 16:03:47] INFO convert_weight.py:161: [92mTotal parameters[0m: 8,030,261,248 |
|
[2024-04-18 16:03:47] INFO convert_weight.py:162: [92mBits per parameter[0m: 32.000 |
|
[2024-04-18 16:03:47] INFO convert_weight.py:167: Saved to directory: [1m/tmp/tmpq8el2iww[0m |
|
|
|
All finished, 131 total shards committed, record saved to /tmp/tmpq8el2iww/ndarray-cache.json |
|
Also saved a bf16 record to /tmp/tmpq8el2iww/ndarray-cache-b16.json |
|
|