INFO:nncf:NNCF initialized successfully. Supported frameworks detected: torch, onnx, openvino WARNING:nncf:NNCF provides best results with torch==2.2.*, while current torch version is 2.1.0. If you encounter issues, consider switching to torch==2.2.* Args(model_id='meta-llama/Llama-2-7b-hf', torch_dtype='float16', device='cuda', compress_weights_mode='int8_asym', up=0.3, gate=0.3, down=0.5, batch_size=4, num_calibration_samples=64, eval_task='wikitext', eval_limit=None, save_folder='./models/Llama-2-7b-hf/int8_asym_up30+down50/') Loading checkpoint shards: 0%| | 0/2 [00:00 4096). Running this sequence through the model will result in indexing errors 2%|▏ | 1/62 [00:06<06:11, 6.09s/it] 3%|▎ | 2/62 [00:06<02:58, 2.98s/it] 5%|▍ | 3/62 [00:10<03:25, 3.48s/it] 6%|▋ | 4/62 [00:17<04:22, 4.53s/it] 8%|▊ | 5/62 [00:21<04:09, 4.38s/it] 10%|▉ | 6/62 [00:27<04:39, 4.99s/it] 11%|█▏ | 7/62 [00:28<03:23, 3.69s/it] 13%|█▎ | 8/62 [00:29<02:34, 2.86s/it] 15%|█▍ | 9/62 [00:33<02:52, 3.25s/it] 16%|█▌ | 10/62 [00:37<03:02, 3.51s/it] 18%|█▊ | 11/62 [00:41<03:08, 3.70s/it] 19%|█▉ | 12/62 [00:42<02:25, 2.91s/it] 21%|██ | 13/62 [00:51<03:42, 4.53s/it] 23%|██▎ | 14/62 [00:53<03:01, 3.78s/it] 24%|██▍ | 15/62 [01:01<04:00, 5.12s/it] 26%|██▌ | 16/62 [01:02<02:58, 3.89s/it] 27%|██▋ | 17/62 [01:03<02:12, 2.94s/it] 29%|██▉ | 18/62 [01:09<02:52, 3.92s/it] 31%|███ | 19/62 [01:09<02:04, 2.89s/it] 32%|███▏ | 20/62 [01:10<01:33, 2.24s/it] 34%|███▍ | 21/62 [01:11<01:20, 1.97s/it] 35%|███▌ | 22/62 [01:12<01:03, 1.58s/it] 37%|███▋ | 23/62 [01:14<01:06, 1.70s/it] 39%|███▊ | 24/62 [01:15<00:53, 1.40s/it] 40%|████ | 25/62 [01:19<01:21, 2.20s/it] 42%|████▏ | 26/62 [01:21<01:15, 2.11s/it] 44%|████▎ | 27/62 [01:25<01:34, 2.71s/it] 45%|████▌ | 28/62 [01:27<01:21, 2.39s/it] 47%|████▋ | 29/62 [01:27<01:02, 1.89s/it] 48%|████▊ | 30/62 [01:28<00:49, 1.56s/it] 50%|█████ | 31/62 [01:34<01:31, 2.94s/it] 52%|█████▏ | 32/62 [01:35<01:05, 2.20s/it] 53%|█████▎ | 33/62 [01:35<00:47, 1.65s/it] 55%|█████▍ | 34/62 [01:37<00:46, 1.65s/it] 56%|█████▋ | 35/62 [01:45<01:37, 3.62s/it] 58%|█████▊ | 36/62 [01:46<01:13, 2.82s/it] 60%|█████▉ | 37/62 [01:47<00:54, 2.17s/it] 61%|██████▏ | 38/62 [01:53<01:20, 3.37s/it] 63%|██████▎ | 39/62 [01:54<01:04, 2.78s/it] 65%|██████▍ | 40/62 [01:58<01:09, 3.17s/it] 66%|██████▌ | 41/62 [02:02<01:12, 3.46s/it] 68%|██████▊ | 42/62 [02:03<00:52, 2.61s/it] 69%|██████▉ | 43/62 [02:04<00:43, 2.28s/it] 71%|███████ | 44/62 [02:05<00:33, 1.86s/it] 73%|███████▎ | 45/62 [02:09<00:43, 2.53s/it] 74%|███████▍ | 46/62 [02:11<00:36, 2.26s/it] 76%|███████▌ | 47/62 [02:12<00:27, 1.81s/it] 77%|███████▋ | 48/62 [02:22<01:00, 4.35s/it] 79%|███████▉ | 49/62 [02:24<00:46, 3.54s/it] 81%|████████ | 50/62 [02:30<00:52, 4.34s/it] 82%|████████▏ | 51/62 [02:31<00:35, 3.25s/it] 84%|████████▍ | 52/62 [02:35<00:35, 3.50s/it] 85%|████████▌ | 53/62 [02:43<00:44, 4.91s/it] 87%|████████▋ | 54/62 [02:47<00:37, 4.66s/it] 89%|████████▊ | 55/62 [02:55<00:40, 5.74s/it] 90%|█████████ | 56/62 [02:56<00:24, 4.14s/it] 92%|█████████▏| 57/62 [02:57<00:16, 3.24s/it] 94%|█████████▎| 58/62 [02:58<00:09, 2.49s/it] 95%|█████████▌| 59/62 [03:04<00:10, 3.58s/it] 97%|█████████▋| 60/62 [03:08<00:07, 3.74s/it] 98%|█████████▊| 61/62 [03:09<00:03, 3.11s/it] 100%|██████████| 62/62 [03:18<00:00, 4.64s/it] 100%|██████████| 62/62 [03:18<00:00, 3.20s/it] Torch evaluation result: {'results': {'wikitext': {'word_perplexity': 9.010890639006227, 'byte_perplexity': 1.5085035483088687, 'bits_per_byte': 0.5931180900024063}}, 'versions': {'wikitext': 1}, 'config': {'model': None, 'model_args': None, 'num_fewshot': 0, 'batch_size': 1, 'batch_sizes': [], 'device': 'cuda', 'no_cache': True, 'limit': None, 'bootstrap_iters': 100000, 'description_dict': None}} Automatic task detection to: text-generation-with-past. Using framework PyTorch: 2.1.0 Overriding 1 configuration item(s) - use_cache -> True WARNING:nncf:You are setting `forward` on an NNCF-processed model object. NNCF relies on custom-wrapping the `forward` call in order to function properly. Arbitrary adjustments to the forward function on an NNCFNetwork object have undefined behavior. If you need to replace the underlying forward function of the original model so that NNCF should be using that instead of the original forward function that NNCF saved during the compressed model creation, you can do this by calling: model.nncf.set_original_unbound_forward(fn) if `fn` has an unbound 0-th `self` argument, or with model.nncf.temporary_bound_original_forward(fn): ... if `fn` already had 0-th `self` argument bound or never had it in the first place. The cos_cached attribute will be removed in 4.39. Bear in mind that its contents changed in v4.38. Use the forward method of RoPE from now on instead. It is not used in the `LlamaAttention` class The sin_cached attribute will be removed in 4.39. Bear in mind that its contents changed in v4.38. Use the forward method of RoPE from now on instead. It is not used in the `LlamaAttention` class /nvme2/yujiepan/tools/miniconda3/envs/llm-sparse-training-autoawq/lib/python3.10/site-packages/optimum/exporters/openvino/model_patcher.py:340: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if sequence_length != 1: /nvme2/yujiepan/tools/miniconda3/envs/llm-sparse-training-autoawq/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py:382: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if attn_output.size() != (bsz, self.num_heads, q_len, self.head_dim): WARNING:nncf:You are setting `forward` on an NNCF-processed model object. NNCF relies on custom-wrapping the `forward` call in order to function properly. Arbitrary adjustments to the forward function on an NNCFNetwork object have undefined behavior. If you need to replace the underlying forward function of the original model so that NNCF should be using that instead of the original forward function that NNCF saved during the compressed model creation, you can do this by calling: model.nncf.set_original_unbound_forward(fn) if `fn` has an unbound 0-th `self` argument, or with model.nncf.temporary_bound_original_forward(fn): ... if `fn` already had 0-th `self` argument bound or never had it in the first place. Provided model does not contain state. It may lead to sub-optimal performance.Please reexport model with updated OpenVINO version >= 2023.3.0 calling the `from_pretrained` method with original model and `export=True` parameter Compiling the model to CPU ... [{'generated_text': 'Hello, I am an AI chatbot 🤖, how can I help you today?\n февруа'}]