diff --git "a/nncf_output.log" "b/nncf_output.log" new file mode 100644--- /dev/null +++ "b/nncf_output.log" @@ -0,0 +1,1987 @@ +INFO:nncf:Ignored adding weight sparsifier for operation: OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/NNCFEmbedding[embed_tokens]/embedding_0 +INFO:nncf:Ignored adding weight sparsifier for operation: OPTForCausalLM/NNCFLinear[lm_head]/linear_0 +INFO:nncf:Not adding activation input quantizer for operation: 3 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/NNCFEmbedding[embed_tokens]/embedding_0 +INFO:nncf:Not adding activation input quantizer for operation: 6 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/OPTLearnedPositionalEmbedding[embed_positions]/long_0 +INFO:nncf:Not adding activation input quantizer for operation: 7 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/OPTLearnedPositionalEmbedding[embed_positions]/cumsum_0 +INFO:nncf:Not adding activation input quantizer for operation: 8 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/OPTLearnedPositionalEmbedding[embed_positions]/type_as_0 +INFO:nncf:Not adding activation input quantizer for operation: 9 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/OPTLearnedPositionalEmbedding[embed_positions]/__mul___0 +INFO:nncf:Not adding activation input quantizer for operation: 10 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/OPTLearnedPositionalEmbedding[embed_positions]/long_1 +INFO:nncf:Not adding activation input quantizer for operation: 11 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/OPTLearnedPositionalEmbedding[embed_positions]/__sub___0 +INFO:nncf:Not adding activation input quantizer for operation: 12 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/OPTLearnedPositionalEmbedding[embed_positions]/__getitem___0 +INFO:nncf:Not adding activation input quantizer for operation: 13 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/OPTLearnedPositionalEmbedding[embed_positions]/__add___0 +INFO:nncf:Not adding activation input quantizer for operation: 14 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/OPTLearnedPositionalEmbedding[embed_positions]/embedding_0 +INFO:nncf:Not adding activation input quantizer for operation: 16 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/__add___0 +INFO:nncf:Not adding activation input quantizer for operation: 36 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[0]/OPTAttention[self_attn]/__add___0 +INFO:nncf:Not adding activation input quantizer for operation: 47 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[0]/__add___0 +INFO:nncf:Not adding activation input quantizer for operation: 48 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[0]/NNCFLayerNorm[self_attn_layer_norm]/layer_norm_0 +INFO:nncf:Not adding activation input quantizer for operation: 54 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[0]/__add___1 +INFO:nncf:Not adding activation input quantizer for operation: 56 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[0]/NNCFLayerNorm[final_layer_norm]/layer_norm_0 +INFO:nncf:Not adding activation input quantizer for operation: 76 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[1]/OPTAttention[self_attn]/__add___0 +INFO:nncf:Not adding activation input quantizer for operation: 87 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[1]/__add___0 +INFO:nncf:Not adding activation input quantizer for operation: 88 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[1]/NNCFLayerNorm[self_attn_layer_norm]/layer_norm_0 +INFO:nncf:Not adding activation input quantizer for operation: 94 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[1]/__add___1 +INFO:nncf:Not adding activation input quantizer for operation: 96 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[1]/NNCFLayerNorm[final_layer_norm]/layer_norm_0 +INFO:nncf:Not adding activation input quantizer for operation: 116 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[2]/OPTAttention[self_attn]/__add___0 +INFO:nncf:Not adding activation input quantizer for operation: 127 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[2]/__add___0 +INFO:nncf:Not adding activation input quantizer for operation: 128 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[2]/NNCFLayerNorm[self_attn_layer_norm]/layer_norm_0 +INFO:nncf:Not adding activation input quantizer for operation: 134 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[2]/__add___1 +INFO:nncf:Not adding activation input quantizer for operation: 136 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[2]/NNCFLayerNorm[final_layer_norm]/layer_norm_0 +INFO:nncf:Not adding activation input quantizer for operation: 156 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[3]/OPTAttention[self_attn]/__add___0 +INFO:nncf:Not adding activation input quantizer for operation: 167 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[3]/__add___0 +INFO:nncf:Not adding activation input quantizer for operation: 168 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[3]/NNCFLayerNorm[self_attn_layer_norm]/layer_norm_0 +INFO:nncf:Not adding activation input quantizer for operation: 174 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[3]/__add___1 +INFO:nncf:Not adding activation input quantizer for operation: 176 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[3]/NNCFLayerNorm[final_layer_norm]/layer_norm_0 +INFO:nncf:Not adding activation input quantizer for operation: 196 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[4]/OPTAttention[self_attn]/__add___0 +INFO:nncf:Not adding activation input quantizer for operation: 207 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[4]/__add___0 +INFO:nncf:Not adding activation input quantizer for operation: 208 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[4]/NNCFLayerNorm[self_attn_layer_norm]/layer_norm_0 +INFO:nncf:Not adding activation input quantizer for operation: 214 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[4]/__add___1 +INFO:nncf:Not adding activation input quantizer for operation: 216 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[4]/NNCFLayerNorm[final_layer_norm]/layer_norm_0 +INFO:nncf:Not adding activation input quantizer for operation: 236 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[5]/OPTAttention[self_attn]/__add___0 +INFO:nncf:Not adding activation input quantizer for operation: 247 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[5]/__add___0 +INFO:nncf:Not adding activation input quantizer for operation: 248 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[5]/NNCFLayerNorm[self_attn_layer_norm]/layer_norm_0 +INFO:nncf:Not adding activation input quantizer for operation: 254 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[5]/__add___1 +INFO:nncf:Not adding activation input quantizer for operation: 256 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[5]/NNCFLayerNorm[final_layer_norm]/layer_norm_0 +INFO:nncf:Not adding activation input quantizer for operation: 276 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[6]/OPTAttention[self_attn]/__add___0 +INFO:nncf:Not adding activation input quantizer for operation: 287 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[6]/__add___0 +INFO:nncf:Not adding activation input quantizer for operation: 288 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[6]/NNCFLayerNorm[self_attn_layer_norm]/layer_norm_0 +INFO:nncf:Not adding activation input quantizer for operation: 294 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[6]/__add___1 +INFO:nncf:Not adding activation input quantizer for operation: 296 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[6]/NNCFLayerNorm[final_layer_norm]/layer_norm_0 +INFO:nncf:Not adding activation input quantizer for operation: 316 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[7]/OPTAttention[self_attn]/__add___0 +INFO:nncf:Not adding activation input quantizer for operation: 327 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[7]/__add___0 +INFO:nncf:Not adding activation input quantizer for operation: 328 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[7]/NNCFLayerNorm[self_attn_layer_norm]/layer_norm_0 +INFO:nncf:Not adding activation input quantizer for operation: 334 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[7]/__add___1 +INFO:nncf:Not adding activation input quantizer for operation: 336 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[7]/NNCFLayerNorm[final_layer_norm]/layer_norm_0 +INFO:nncf:Not adding activation input quantizer for operation: 356 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[8]/OPTAttention[self_attn]/__add___0 +INFO:nncf:Not adding activation input quantizer for operation: 367 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[8]/__add___0 +INFO:nncf:Not adding activation input quantizer for operation: 368 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[8]/NNCFLayerNorm[self_attn_layer_norm]/layer_norm_0 +INFO:nncf:Not adding activation input quantizer for operation: 374 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[8]/__add___1 +INFO:nncf:Not adding activation input quantizer for operation: 376 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[8]/NNCFLayerNorm[final_layer_norm]/layer_norm_0 +INFO:nncf:Not adding activation input quantizer for operation: 396 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[9]/OPTAttention[self_attn]/__add___0 +INFO:nncf:Not adding activation input quantizer for operation: 407 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[9]/__add___0 +INFO:nncf:Not adding activation input quantizer for operation: 408 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[9]/NNCFLayerNorm[self_attn_layer_norm]/layer_norm_0 +INFO:nncf:Not adding activation input quantizer for operation: 414 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[9]/__add___1 +INFO:nncf:Not adding activation input quantizer for operation: 416 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[9]/NNCFLayerNorm[final_layer_norm]/layer_norm_0 +INFO:nncf:Not adding activation input quantizer for operation: 436 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[10]/OPTAttention[self_attn]/__add___0 +INFO:nncf:Not adding activation input quantizer for operation: 447 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[10]/__add___0 +INFO:nncf:Not adding activation input quantizer for operation: 448 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[10]/NNCFLayerNorm[self_attn_layer_norm]/layer_norm_0 +INFO:nncf:Not adding activation input quantizer for operation: 454 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[10]/__add___1 +INFO:nncf:Not adding activation input quantizer for operation: 456 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[10]/NNCFLayerNorm[final_layer_norm]/layer_norm_0 +INFO:nncf:Not adding activation input quantizer for operation: 476 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[11]/OPTAttention[self_attn]/__add___0 +INFO:nncf:Not adding activation input quantizer for operation: 487 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[11]/__add___0 +INFO:nncf:Not adding activation input quantizer for operation: 488 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[11]/NNCFLayerNorm[self_attn_layer_norm]/layer_norm_0 +INFO:nncf:Not adding activation input quantizer for operation: 494 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[11]/__add___1 +INFO:nncf:Not adding activation input quantizer for operation: 496 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[11]/NNCFLayerNorm[final_layer_norm]/layer_norm_0 +INFO:nncf:Not adding activation input quantizer for operation: 516 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[12]/OPTAttention[self_attn]/__add___0 +INFO:nncf:Not adding activation input quantizer for operation: 527 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[12]/__add___0 +INFO:nncf:Not adding activation input quantizer for operation: 528 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[12]/NNCFLayerNorm[self_attn_layer_norm]/layer_norm_0 +INFO:nncf:Not adding activation input quantizer for operation: 534 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[12]/__add___1 +INFO:nncf:Not adding activation input quantizer for operation: 536 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[12]/NNCFLayerNorm[final_layer_norm]/layer_norm_0 +INFO:nncf:Not adding activation input quantizer for operation: 556 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[13]/OPTAttention[self_attn]/__add___0 +INFO:nncf:Not adding activation input quantizer for operation: 567 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[13]/__add___0 +INFO:nncf:Not adding activation input quantizer for operation: 568 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[13]/NNCFLayerNorm[self_attn_layer_norm]/layer_norm_0 +INFO:nncf:Not adding activation input quantizer for operation: 574 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[13]/__add___1 +INFO:nncf:Not adding activation input quantizer for operation: 576 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[13]/NNCFLayerNorm[final_layer_norm]/layer_norm_0 +INFO:nncf:Not adding activation input quantizer for operation: 596 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[14]/OPTAttention[self_attn]/__add___0 +INFO:nncf:Not adding activation input quantizer for operation: 607 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[14]/__add___0 +INFO:nncf:Not adding activation input quantizer for operation: 608 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[14]/NNCFLayerNorm[self_attn_layer_norm]/layer_norm_0 +INFO:nncf:Not adding activation input quantizer for operation: 614 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[14]/__add___1 +INFO:nncf:Not adding activation input quantizer for operation: 616 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[14]/NNCFLayerNorm[final_layer_norm]/layer_norm_0 +INFO:nncf:Not adding activation input quantizer for operation: 636 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[15]/OPTAttention[self_attn]/__add___0 +INFO:nncf:Not adding activation input quantizer for operation: 647 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[15]/__add___0 +INFO:nncf:Not adding activation input quantizer for operation: 648 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[15]/NNCFLayerNorm[self_attn_layer_norm]/layer_norm_0 +INFO:nncf:Not adding activation input quantizer for operation: 654 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[15]/__add___1 +INFO:nncf:Not adding activation input quantizer for operation: 656 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[15]/NNCFLayerNorm[final_layer_norm]/layer_norm_0 +INFO:nncf:Not adding activation input quantizer for operation: 676 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[16]/OPTAttention[self_attn]/__add___0 +INFO:nncf:Not adding activation input quantizer for operation: 687 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[16]/__add___0 +INFO:nncf:Not adding activation input quantizer for operation: 688 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[16]/NNCFLayerNorm[self_attn_layer_norm]/layer_norm_0 +INFO:nncf:Not adding activation input quantizer for operation: 694 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[16]/__add___1 +INFO:nncf:Not adding activation input quantizer for operation: 696 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[16]/NNCFLayerNorm[final_layer_norm]/layer_norm_0 +INFO:nncf:Not adding activation input quantizer for operation: 716 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[17]/OPTAttention[self_attn]/__add___0 +INFO:nncf:Not adding activation input quantizer for operation: 727 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[17]/__add___0 +INFO:nncf:Not adding activation input quantizer for operation: 728 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[17]/NNCFLayerNorm[self_attn_layer_norm]/layer_norm_0 +INFO:nncf:Not adding activation input quantizer for operation: 734 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[17]/__add___1 +INFO:nncf:Not adding activation input quantizer for operation: 736 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[17]/NNCFLayerNorm[final_layer_norm]/layer_norm_0 +INFO:nncf:Not adding activation input quantizer for operation: 756 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[18]/OPTAttention[self_attn]/__add___0 +INFO:nncf:Not adding activation input quantizer for operation: 767 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[18]/__add___0 +INFO:nncf:Not adding activation input quantizer for operation: 768 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[18]/NNCFLayerNorm[self_attn_layer_norm]/layer_norm_0 +INFO:nncf:Not adding activation input quantizer for operation: 774 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[18]/__add___1 +INFO:nncf:Not adding activation input quantizer for operation: 776 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[18]/NNCFLayerNorm[final_layer_norm]/layer_norm_0 +INFO:nncf:Not adding activation input quantizer for operation: 796 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[19]/OPTAttention[self_attn]/__add___0 +INFO:nncf:Not adding activation input quantizer for operation: 807 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[19]/__add___0 +INFO:nncf:Not adding activation input quantizer for operation: 808 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[19]/NNCFLayerNorm[self_attn_layer_norm]/layer_norm_0 +INFO:nncf:Not adding activation input quantizer for operation: 814 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[19]/__add___1 +INFO:nncf:Not adding activation input quantizer for operation: 816 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[19]/NNCFLayerNorm[final_layer_norm]/layer_norm_0 +INFO:nncf:Not adding activation input quantizer for operation: 836 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[20]/OPTAttention[self_attn]/__add___0 +INFO:nncf:Not adding activation input quantizer for operation: 847 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[20]/__add___0 +INFO:nncf:Not adding activation input quantizer for operation: 848 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[20]/NNCFLayerNorm[self_attn_layer_norm]/layer_norm_0 +INFO:nncf:Not adding activation input quantizer for operation: 854 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[20]/__add___1 +INFO:nncf:Not adding activation input quantizer for operation: 856 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[20]/NNCFLayerNorm[final_layer_norm]/layer_norm_0 +INFO:nncf:Not adding activation input quantizer for operation: 876 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[21]/OPTAttention[self_attn]/__add___0 +INFO:nncf:Not adding activation input quantizer for operation: 887 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[21]/__add___0 +INFO:nncf:Not adding activation input quantizer for operation: 888 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[21]/NNCFLayerNorm[self_attn_layer_norm]/layer_norm_0 +INFO:nncf:Not adding activation input quantizer for operation: 894 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[21]/__add___1 +INFO:nncf:Not adding activation input quantizer for operation: 896 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[21]/NNCFLayerNorm[final_layer_norm]/layer_norm_0 +INFO:nncf:Not adding activation input quantizer for operation: 916 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[22]/OPTAttention[self_attn]/__add___0 +INFO:nncf:Not adding activation input quantizer for operation: 927 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[22]/__add___0 +INFO:nncf:Not adding activation input quantizer for operation: 928 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[22]/NNCFLayerNorm[self_attn_layer_norm]/layer_norm_0 +INFO:nncf:Not adding activation input quantizer for operation: 934 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[22]/__add___1 +INFO:nncf:Not adding activation input quantizer for operation: 936 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[22]/NNCFLayerNorm[final_layer_norm]/layer_norm_0 +INFO:nncf:Not adding activation input quantizer for operation: 956 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[23]/OPTAttention[self_attn]/__add___0 +INFO:nncf:Not adding activation input quantizer for operation: 967 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[23]/__add___0 +INFO:nncf:Not adding activation input quantizer for operation: 968 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[23]/NNCFLayerNorm[self_attn_layer_norm]/layer_norm_0 +INFO:nncf:Not adding activation input quantizer for operation: 974 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[23]/__add___1 +INFO:nncf:Not adding activation input quantizer for operation: 976 OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/ModuleList[layers]/OPTDecoderLayer[23]/NNCFLayerNorm[final_layer_norm]/layer_norm_0 +INFO:nncf:Collecting tensor statistics |████████████████| 1 / 1 +INFO:nncf:Compiling and loading torch extension: quantized_functions_cpu... +INFO:nncf:Finished loading torch extension: quantized_functions_cpu +INFO:nncf:Statistics of the sparsified model: +Epoch 0 |+-----------------------------------------+-------+ +Epoch 0 || Statistic's name | Value | +Epoch 0 |+=========================================+=======+ +Epoch 0 || Sparsity level of the whole model | 0.722 | +Epoch 0 |+-----------------------------------------+-------+ +Epoch 0 || Sparsity level of all sparsified layers | 0.850 | +Epoch 0 |+-----------------------------------------+-------+ +Epoch 0 | +Epoch 0 |Statistics by sparsified layers: +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || Layer's name | Weight's shape | Sparsity level | Weight's percentage | +Epoch 0 |+======================+================+================+=====================+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 1024] | 0.669 | 0.346 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[2]/OPTAttentio | | | | +Epoch 0 || n[self_attn]/NNCFLin | | | | +Epoch 0 || ear[v_proj]/linear_0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 1024] | 0.680 | 0.346 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[2]/OPTAttentio | | | | +Epoch 0 || n[self_attn]/NNCFLin | | | | +Epoch 0 || ear[out_proj]/linear | | | | +Epoch 0 || _0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [4096, 1024] | 0.942 | 1.384 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[2]/NNCFLinear[ | | | | +Epoch 0 || fc1]/linear_0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 4096] | 0.945 | 1.384 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[2]/NNCFLinear[ | | | | +Epoch 0 || fc2]/linear_0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 1024] | 0.658 | 0.346 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[3]/OPTAttentio | | | | +Epoch 0 || n[self_attn]/NNCFLin | | | | +Epoch 0 || ear[q_proj]/linear_0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 1024] | 0.658 | 0.346 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[3]/OPTAttentio | | | | +Epoch 0 || n[self_attn]/NNCFLin | | | | +Epoch 0 || ear[k_proj]/linear_0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 1024] | 0.664 | 0.346 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[3]/OPTAttentio | | | | +Epoch 0 || n[self_attn]/NNCFLin | | | | +Epoch 0 || ear[v_proj]/linear_0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 512] | 0.465 | 0.173 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/NNCFLinea | | | | +Epoch 0 || r[project_in]/linear | | | | +Epoch 0 || _0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 1024] | 0.676 | 0.346 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[3]/OPTAttentio | | | | +Epoch 0 || n[self_attn]/NNCFLin | | | | +Epoch 0 || ear[out_proj]/linear | | | | +Epoch 0 || _0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 1024] | 0.601 | 0.346 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[0]/OPTAttentio | | | | +Epoch 0 || n[self_attn]/NNCFLin | | | | +Epoch 0 || ear[q_proj]/linear_0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [4096, 1024] | 0.940 | 1.384 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[3]/NNCFLinear[ | | | | +Epoch 0 || fc1]/linear_0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 4096] | 0.946 | 1.384 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[3]/NNCFLinear[ | | | | +Epoch 0 || fc2]/linear_0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 1024] | 0.659 | 0.346 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[4]/OPTAttentio | | | | +Epoch 0 || n[self_attn]/NNCFLin | | | | +Epoch 0 || ear[q_proj]/linear_0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 1024] | 0.660 | 0.346 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[4]/OPTAttentio | | | | +Epoch 0 || n[self_attn]/NNCFLin | | | | +Epoch 0 || ear[k_proj]/linear_0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 1024] | 0.670 | 0.346 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[4]/OPTAttentio | | | | +Epoch 0 || n[self_attn]/NNCFLin | | | | +Epoch 0 || ear[v_proj]/linear_0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 1024] | 0.631 | 0.346 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[0]/OPTAttentio | | | | +Epoch 0 || n[self_attn]/NNCFLin | | | | +Epoch 0 || ear[k_proj]/linear_0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 1024] | 0.680 | 0.346 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[4]/OPTAttentio | | | | +Epoch 0 || n[self_attn]/NNCFLin | | | | +Epoch 0 || ear[out_proj]/linear | | | | +Epoch 0 || _0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [4096, 1024] | 0.939 | 1.384 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[4]/NNCFLinear[ | | | | +Epoch 0 || fc1]/linear_0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 4096] | 0.946 | 1.384 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[4]/NNCFLinear[ | | | | +Epoch 0 || fc2]/linear_0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 1024] | 0.675 | 0.346 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[5]/OPTAttentio | | | | +Epoch 0 || n[self_attn]/NNCFLin | | | | +Epoch 0 || ear[q_proj]/linear_0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 1024] | 0.673 | 0.346 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[5]/OPTAttentio | | | | +Epoch 0 || n[self_attn]/NNCFLin | | | | +Epoch 0 || ear[k_proj]/linear_0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 1024] | 0.671 | 0.346 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[5]/OPTAttentio | | | | +Epoch 0 || n[self_attn]/NNCFLin | | | | +Epoch 0 || ear[v_proj]/linear_0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 1024] | 0.670 | 0.346 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[0]/OPTAttentio | | | | +Epoch 0 || n[self_attn]/NNCFLin | | | | +Epoch 0 || ear[v_proj]/linear_0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 1024] | 0.683 | 0.346 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[5]/OPTAttentio | | | | +Epoch 0 || n[self_attn]/NNCFLin | | | | +Epoch 0 || ear[out_proj]/linear | | | | +Epoch 0 || _0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [4096, 1024] | 0.939 | 1.384 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[5]/NNCFLinear[ | | | | +Epoch 0 || fc1]/linear_0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 4096] | 0.944 | 1.384 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[5]/NNCFLinear[ | | | | +Epoch 0 || fc2]/linear_0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 1024] | 0.680 | 0.346 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[6]/OPTAttentio | | | | +Epoch 0 || n[self_attn]/NNCFLin | | | | +Epoch 0 || ear[q_proj]/linear_0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 1024] | 0.687 | 0.346 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[6]/OPTAttentio | | | | +Epoch 0 || n[self_attn]/NNCFLin | | | | +Epoch 0 || ear[k_proj]/linear_0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 1024] | 0.678 | 0.346 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[6]/OPTAttentio | | | | +Epoch 0 || n[self_attn]/NNCFLin | | | | +Epoch 0 || ear[v_proj]/linear_0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 1024] | 0.685 | 0.346 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[6]/OPTAttentio | | | | +Epoch 0 || n[self_attn]/NNCFLin | | | | +Epoch 0 || ear[out_proj]/linear | | | | +Epoch 0 || _0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [4096, 1024] | 0.939 | 1.384 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[6]/NNCFLinear[ | | | | +Epoch 0 || fc1]/linear_0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 4096] | 0.943 | 1.384 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[6]/NNCFLinear[ | | | | +Epoch 0 || fc2]/linear_0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 1024] | 0.676 | 0.346 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[7]/OPTAttentio | | | | +Epoch 0 || n[self_attn]/NNCFLin | | | | +Epoch 0 || ear[q_proj]/linear_0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 1024] | 0.681 | 0.346 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[7]/OPTAttentio | | | | +Epoch 0 || n[self_attn]/NNCFLin | | | | +Epoch 0 || ear[k_proj]/linear_0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 1024] | 0.678 | 0.346 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[7]/OPTAttentio | | | | +Epoch 0 || n[self_attn]/NNCFLin | | | | +Epoch 0 || ear[v_proj]/linear_0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 1024] | 0.685 | 0.346 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[7]/OPTAttentio | | | | +Epoch 0 || n[self_attn]/NNCFLin | | | | +Epoch 0 || ear[out_proj]/linear | | | | +Epoch 0 || _0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [4096, 1024] | 0.942 | 1.384 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[7]/NNCFLinear[ | | | | +Epoch 0 || fc1]/linear_0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 4096] | 0.941 | 1.384 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[7]/NNCFLinear[ | | | | +Epoch 0 || fc2]/linear_0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 1024] | 0.671 | 0.346 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[8]/OPTAttentio | | | | +Epoch 0 || n[self_attn]/NNCFLin | | | | +Epoch 0 || ear[q_proj]/linear_0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 1024] | 0.678 | 0.346 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[8]/OPTAttentio | | | | +Epoch 0 || n[self_attn]/NNCFLin | | | | +Epoch 0 || ear[k_proj]/linear_0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 1024] | 0.679 | 0.346 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[8]/OPTAttentio | | | | +Epoch 0 || n[self_attn]/NNCFLin | | | | +Epoch 0 || ear[v_proj]/linear_0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 1024] | 0.683 | 0.346 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[8]/OPTAttentio | | | | +Epoch 0 || n[self_attn]/NNCFLin | | | | +Epoch 0 || ear[out_proj]/linear | | | | +Epoch 0 || _0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [4096, 1024] | 0.938 | 1.384 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[8]/NNCFLinear[ | | | | +Epoch 0 || fc1]/linear_0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 4096] | 0.943 | 1.384 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[8]/NNCFLinear[ | | | | +Epoch 0 || fc2]/linear_0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 1024] | 0.681 | 0.346 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[9]/OPTAttentio | | | | +Epoch 0 || n[self_attn]/NNCFLin | | | | +Epoch 0 || ear[q_proj]/linear_0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 1024] | 0.681 | 0.346 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[9]/OPTAttentio | | | | +Epoch 0 || n[self_attn]/NNCFLin | | | | +Epoch 0 || ear[k_proj]/linear_0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 1024] | 0.679 | 0.346 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[9]/OPTAttentio | | | | +Epoch 0 || n[self_attn]/NNCFLin | | | | +Epoch 0 || ear[v_proj]/linear_0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 1024] | 0.684 | 0.346 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[9]/OPTAttentio | | | | +Epoch 0 || n[self_attn]/NNCFLin | | | | +Epoch 0 || ear[out_proj]/linear | | | | +Epoch 0 || _0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [4096, 1024] | 0.940 | 1.384 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[9]/NNCFLinear[ | | | | +Epoch 0 || fc1]/linear_0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 4096] | 0.944 | 1.384 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[9]/NNCFLinear[ | | | | +Epoch 0 || fc2]/linear_0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 1024] | 0.666 | 0.346 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[10]/OPTAttenti | | | | +Epoch 0 || on[self_attn]/NNCFLi | | | | +Epoch 0 || near[q_proj]/linear_ | | | | +Epoch 0 || 0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 1024] | 0.667 | 0.346 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[10]/OPTAttenti | | | | +Epoch 0 || on[self_attn]/NNCFLi | | | | +Epoch 0 || near[k_proj]/linear_ | | | | +Epoch 0 || 0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 1024] | 0.671 | 0.346 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[10]/OPTAttenti | | | | +Epoch 0 || on[self_attn]/NNCFLi | | | | +Epoch 0 || near[v_proj]/linear_ | | | | +Epoch 0 || 0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 1024] | 0.678 | 0.346 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[10]/OPTAttenti | | | | +Epoch 0 || on[self_attn]/NNCFLi | | | | +Epoch 0 || near[out_proj]/linea | | | | +Epoch 0 || r_0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 1024] | 0.719 | 0.346 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[0]/OPTAttentio | | | | +Epoch 0 || n[self_attn]/NNCFLin | | | | +Epoch 0 || ear[out_proj]/linear | | | | +Epoch 0 || _0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [4096, 1024] | 0.940 | 1.384 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[10]/NNCFLinear | | | | +Epoch 0 || [fc1]/linear_0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 4096] | 0.944 | 1.384 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[10]/NNCFLinear | | | | +Epoch 0 || [fc2]/linear_0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 1024] | 0.660 | 0.346 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[11]/OPTAttenti | | | | +Epoch 0 || on[self_attn]/NNCFLi | | | | +Epoch 0 || near[q_proj]/linear_ | | | | +Epoch 0 || 0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 1024] | 0.664 | 0.346 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[11]/OPTAttenti | | | | +Epoch 0 || on[self_attn]/NNCFLi | | | | +Epoch 0 || near[k_proj]/linear_ | | | | +Epoch 0 || 0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 1024] | 0.665 | 0.346 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[11]/OPTAttenti | | | | +Epoch 0 || on[self_attn]/NNCFLi | | | | +Epoch 0 || near[v_proj]/linear_ | | | | +Epoch 0 || 0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 1024] | 0.672 | 0.346 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[11]/OPTAttenti | | | | +Epoch 0 || on[self_attn]/NNCFLi | | | | +Epoch 0 || near[out_proj]/linea | | | | +Epoch 0 || r_0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [4096, 1024] | 0.940 | 1.384 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[11]/NNCFLinear | | | | +Epoch 0 || [fc1]/linear_0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 4096] | 0.944 | 1.384 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[11]/NNCFLinear | | | | +Epoch 0 || [fc2]/linear_0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 1024] | 0.664 | 0.346 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[12]/OPTAttenti | | | | +Epoch 0 || on[self_attn]/NNCFLi | | | | +Epoch 0 || near[q_proj]/linear_ | | | | +Epoch 0 || 0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 1024] | 0.665 | 0.346 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[12]/OPTAttenti | | | | +Epoch 0 || on[self_attn]/NNCFLi | | | | +Epoch 0 || near[k_proj]/linear_ | | | | +Epoch 0 || 0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [4096, 1024] | 0.943 | 1.384 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[0]/NNCFLinear[ | | | | +Epoch 0 || fc1]/linear_0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 1024] | 0.660 | 0.346 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[12]/OPTAttenti | | | | +Epoch 0 || on[self_attn]/NNCFLi | | | | +Epoch 0 || near[v_proj]/linear_ | | | | +Epoch 0 || 0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 4096] | 0.943 | 1.384 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[0]/NNCFLinear[ | | | | +Epoch 0 || fc2]/linear_0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 1024] | 0.668 | 0.346 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[12]/OPTAttenti | | | | +Epoch 0 || on[self_attn]/NNCFLi | | | | +Epoch 0 || near[out_proj]/linea | | | | +Epoch 0 || r_0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [4096, 1024] | 0.940 | 1.384 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[12]/NNCFLinear | | | | +Epoch 0 || [fc1]/linear_0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 4096] | 0.945 | 1.384 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[12]/NNCFLinear | | | | +Epoch 0 || [fc2]/linear_0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 1024] | 0.662 | 0.346 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[13]/OPTAttenti | | | | +Epoch 0 || on[self_attn]/NNCFLi | | | | +Epoch 0 || near[q_proj]/linear_ | | | | +Epoch 0 || 0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 1024] | 0.664 | 0.346 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[13]/OPTAttenti | | | | +Epoch 0 || on[self_attn]/NNCFLi | | | | +Epoch 0 || near[k_proj]/linear_ | | | | +Epoch 0 || 0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 1024] | 0.662 | 0.346 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[13]/OPTAttenti | | | | +Epoch 0 || on[self_attn]/NNCFLi | | | | +Epoch 0 || near[v_proj]/linear_ | | | | +Epoch 0 || 0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 1024] | 0.670 | 0.346 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[13]/OPTAttenti | | | | +Epoch 0 || on[self_attn]/NNCFLi | | | | +Epoch 0 || near[out_proj]/linea | | | | +Epoch 0 || r_0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 1024] | 0.663 | 0.346 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[1]/OPTAttentio | | | | +Epoch 0 || n[self_attn]/NNCFLin | | | | +Epoch 0 || ear[q_proj]/linear_0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [4096, 1024] | 0.940 | 1.384 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[13]/NNCFLinear | | | | +Epoch 0 || [fc1]/linear_0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 4096] | 0.945 | 1.384 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[13]/NNCFLinear | | | | +Epoch 0 || [fc2]/linear_0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 1024] | 0.659 | 0.346 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[14]/OPTAttenti | | | | +Epoch 0 || on[self_attn]/NNCFLi | | | | +Epoch 0 || near[q_proj]/linear_ | | | | +Epoch 0 || 0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 1024] | 0.660 | 0.346 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[14]/OPTAttenti | | | | +Epoch 0 || on[self_attn]/NNCFLi | | | | +Epoch 0 || near[k_proj]/linear_ | | | | +Epoch 0 || 0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 1024] | 0.656 | 0.346 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[14]/OPTAttenti | | | | +Epoch 0 || on[self_attn]/NNCFLi | | | | +Epoch 0 || near[v_proj]/linear_ | | | | +Epoch 0 || 0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 1024] | 0.669 | 0.346 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[1]/OPTAttentio | | | | +Epoch 0 || n[self_attn]/NNCFLin | | | | +Epoch 0 || ear[k_proj]/linear_0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 1024] | 0.666 | 0.346 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[14]/OPTAttenti | | | | +Epoch 0 || on[self_attn]/NNCFLi | | | | +Epoch 0 || near[out_proj]/linea | | | | +Epoch 0 || r_0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [4096, 1024] | 0.940 | 1.384 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[14]/NNCFLinear | | | | +Epoch 0 || [fc1]/linear_0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 4096] | 0.946 | 1.384 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[14]/NNCFLinear | | | | +Epoch 0 || [fc2]/linear_0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 1024] | 0.660 | 0.346 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[15]/OPTAttenti | | | | +Epoch 0 || on[self_attn]/NNCFLi | | | | +Epoch 0 || near[q_proj]/linear_ | | | | +Epoch 0 || 0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 1024] | 0.661 | 0.346 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[15]/OPTAttenti | | | | +Epoch 0 || on[self_attn]/NNCFLi | | | | +Epoch 0 || near[k_proj]/linear_ | | | | +Epoch 0 || 0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 1024] | 0.659 | 0.346 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[15]/OPTAttenti | | | | +Epoch 0 || on[self_attn]/NNCFLi | | | | +Epoch 0 || near[v_proj]/linear_ | | | | +Epoch 0 || 0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 1024] | 0.675 | 0.346 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[1]/OPTAttentio | | | | +Epoch 0 || n[self_attn]/NNCFLin | | | | +Epoch 0 || ear[v_proj]/linear_0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 1024] | 0.675 | 0.346 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[15]/OPTAttenti | | | | +Epoch 0 || on[self_attn]/NNCFLi | | | | +Epoch 0 || near[out_proj]/linea | | | | +Epoch 0 || r_0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [4096, 1024] | 0.940 | 1.384 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[15]/NNCFLinear | | | | +Epoch 0 || [fc1]/linear_0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 4096] | 0.946 | 1.384 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[15]/NNCFLinear | | | | +Epoch 0 || [fc2]/linear_0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 1024] | 0.660 | 0.346 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[16]/OPTAttenti | | | | +Epoch 0 || on[self_attn]/NNCFLi | | | | +Epoch 0 || near[q_proj]/linear_ | | | | +Epoch 0 || 0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 1024] | 0.660 | 0.346 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[16]/OPTAttenti | | | | +Epoch 0 || on[self_attn]/NNCFLi | | | | +Epoch 0 || near[k_proj]/linear_ | | | | +Epoch 0 || 0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 1024] | 0.655 | 0.346 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[16]/OPTAttenti | | | | +Epoch 0 || on[self_attn]/NNCFLi | | | | +Epoch 0 || near[v_proj]/linear_ | | | | +Epoch 0 || 0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 1024] | 0.671 | 0.346 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[16]/OPTAttenti | | | | +Epoch 0 || on[self_attn]/NNCFLi | | | | +Epoch 0 || near[out_proj]/linea | | | | +Epoch 0 || r_0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [4096, 1024] | 0.940 | 1.384 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[16]/NNCFLinear | | | | +Epoch 0 || [fc1]/linear_0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 4096] | 0.946 | 1.384 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[16]/NNCFLinear | | | | +Epoch 0 || [fc2]/linear_0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 1024] | 0.661 | 0.346 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[17]/OPTAttenti | | | | +Epoch 0 || on[self_attn]/NNCFLi | | | | +Epoch 0 || near[q_proj]/linear_ | | | | +Epoch 0 || 0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 1024] | 0.661 | 0.346 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[17]/OPTAttenti | | | | +Epoch 0 || on[self_attn]/NNCFLi | | | | +Epoch 0 || near[k_proj]/linear_ | | | | +Epoch 0 || 0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 1024] | 0.658 | 0.346 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[17]/OPTAttenti | | | | +Epoch 0 || on[self_attn]/NNCFLi | | | | +Epoch 0 || near[v_proj]/linear_ | | | | +Epoch 0 || 0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 1024] | 0.673 | 0.346 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[17]/OPTAttenti | | | | +Epoch 0 || on[self_attn]/NNCFLi | | | | +Epoch 0 || near[out_proj]/linea | | | | +Epoch 0 || r_0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [4096, 1024] | 0.940 | 1.384 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[17]/NNCFLinear | | | | +Epoch 0 || [fc1]/linear_0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 4096] | 0.946 | 1.384 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[17]/NNCFLinear | | | | +Epoch 0 || [fc2]/linear_0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 1024] | 0.664 | 0.346 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[18]/OPTAttenti | | | | +Epoch 0 || on[self_attn]/NNCFLi | | | | +Epoch 0 || near[q_proj]/linear_ | | | | +Epoch 0 || 0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 1024] | 0.664 | 0.346 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[18]/OPTAttenti | | | | +Epoch 0 || on[self_attn]/NNCFLi | | | | +Epoch 0 || near[k_proj]/linear_ | | | | +Epoch 0 || 0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 1024] | 0.652 | 0.346 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[18]/OPTAttenti | | | | +Epoch 0 || on[self_attn]/NNCFLi | | | | +Epoch 0 || near[v_proj]/linear_ | | | | +Epoch 0 || 0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 1024] | 0.667 | 0.346 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[18]/OPTAttenti | | | | +Epoch 0 || on[self_attn]/NNCFLi | | | | +Epoch 0 || near[out_proj]/linea | | | | +Epoch 0 || r_0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [4096, 1024] | 0.941 | 1.384 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[18]/NNCFLinear | | | | +Epoch 0 || [fc1]/linear_0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 4096] | 0.946 | 1.384 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[18]/NNCFLinear | | | | +Epoch 0 || [fc2]/linear_0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 1024] | 0.669 | 0.346 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[19]/OPTAttenti | | | | +Epoch 0 || on[self_attn]/NNCFLi | | | | +Epoch 0 || near[q_proj]/linear_ | | | | +Epoch 0 || 0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 1024] | 0.668 | 0.346 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[19]/OPTAttenti | | | | +Epoch 0 || on[self_attn]/NNCFLi | | | | +Epoch 0 || near[k_proj]/linear_ | | | | +Epoch 0 || 0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 1024] | 0.666 | 0.346 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[19]/OPTAttenti | | | | +Epoch 0 || on[self_attn]/NNCFLi | | | | +Epoch 0 || near[v_proj]/linear_ | | | | +Epoch 0 || 0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 1024] | 0.678 | 0.346 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[19]/OPTAttenti | | | | +Epoch 0 || on[self_attn]/NNCFLi | | | | +Epoch 0 || near[out_proj]/linea | | | | +Epoch 0 || r_0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [4096, 1024] | 0.941 | 1.384 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[19]/NNCFLinear | | | | +Epoch 0 || [fc1]/linear_0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 4096] | 0.945 | 1.384 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[19]/NNCFLinear | | | | +Epoch 0 || [fc2]/linear_0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 1024] | 0.677 | 0.346 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[20]/OPTAttenti | | | | +Epoch 0 || on[self_attn]/NNCFLi | | | | +Epoch 0 || near[q_proj]/linear_ | | | | +Epoch 0 || 0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 1024] | 0.679 | 0.346 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[20]/OPTAttenti | | | | +Epoch 0 || on[self_attn]/NNCFLi | | | | +Epoch 0 || near[k_proj]/linear_ | | | | +Epoch 0 || 0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 1024] | 0.671 | 0.346 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[20]/OPTAttenti | | | | +Epoch 0 || on[self_attn]/NNCFLi | | | | +Epoch 0 || near[v_proj]/linear_ | | | | +Epoch 0 || 0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 1024] | 0.683 | 0.346 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[20]/OPTAttenti | | | | +Epoch 0 || on[self_attn]/NNCFLi | | | | +Epoch 0 || near[out_proj]/linea | | | | +Epoch 0 || r_0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 1024] | 0.683 | 0.346 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[1]/OPTAttentio | | | | +Epoch 0 || n[self_attn]/NNCFLin | | | | +Epoch 0 || ear[out_proj]/linear | | | | +Epoch 0 || _0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [4096, 1024] | 0.941 | 1.384 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[20]/NNCFLinear | | | | +Epoch 0 || [fc1]/linear_0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 4096] | 0.944 | 1.384 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[20]/NNCFLinear | | | | +Epoch 0 || [fc2]/linear_0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 1024] | 0.677 | 0.346 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[21]/OPTAttenti | | | | +Epoch 0 || on[self_attn]/NNCFLi | | | | +Epoch 0 || near[q_proj]/linear_ | | | | +Epoch 0 || 0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 1024] | 0.677 | 0.346 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[21]/OPTAttenti | | | | +Epoch 0 || on[self_attn]/NNCFLi | | | | +Epoch 0 || near[k_proj]/linear_ | | | | +Epoch 0 || 0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 1024] | 0.673 | 0.346 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[21]/OPTAttenti | | | | +Epoch 0 || on[self_attn]/NNCFLi | | | | +Epoch 0 || near[v_proj]/linear_ | | | | +Epoch 0 || 0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 1024] | 0.681 | 0.346 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[21]/OPTAttenti | | | | +Epoch 0 || on[self_attn]/NNCFLi | | | | +Epoch 0 || near[out_proj]/linea | | | | +Epoch 0 || r_0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [4096, 1024] | 0.940 | 1.384 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[21]/NNCFLinear | | | | +Epoch 0 || [fc1]/linear_0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 4096] | 0.944 | 1.384 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[21]/NNCFLinear | | | | +Epoch 0 || [fc2]/linear_0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 1024] | 0.670 | 0.346 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[22]/OPTAttenti | | | | +Epoch 0 || on[self_attn]/NNCFLi | | | | +Epoch 0 || near[q_proj]/linear_ | | | | +Epoch 0 || 0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 1024] | 0.689 | 0.346 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[22]/OPTAttenti | | | | +Epoch 0 || on[self_attn]/NNCFLi | | | | +Epoch 0 || near[k_proj]/linear_ | | | | +Epoch 0 || 0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [4096, 1024] | 0.943 | 1.384 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[1]/NNCFLinear[ | | | | +Epoch 0 || fc1]/linear_0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 1024] | 0.670 | 0.346 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[22]/OPTAttenti | | | | +Epoch 0 || on[self_attn]/NNCFLi | | | | +Epoch 0 || near[v_proj]/linear_ | | | | +Epoch 0 || 0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 4096] | 0.945 | 1.384 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[1]/NNCFLinear[ | | | | +Epoch 0 || fc2]/linear_0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 1024] | 0.687 | 0.346 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[22]/OPTAttenti | | | | +Epoch 0 || on[self_attn]/NNCFLi | | | | +Epoch 0 || near[out_proj]/linea | | | | +Epoch 0 || r_0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [4096, 1024] | 0.943 | 1.384 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[22]/NNCFLinear | | | | +Epoch 0 || [fc1]/linear_0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 4096] | 0.945 | 1.384 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[22]/NNCFLinear | | | | +Epoch 0 || [fc2]/linear_0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 1024] | 0.668 | 0.346 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[23]/OPTAttenti | | | | +Epoch 0 || on[self_attn]/NNCFLi | | | | +Epoch 0 || near[q_proj]/linear_ | | | | +Epoch 0 || 0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 1024] | 0.672 | 0.346 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[23]/OPTAttenti | | | | +Epoch 0 || on[self_attn]/NNCFLi | | | | +Epoch 0 || near[k_proj]/linear_ | | | | +Epoch 0 || 0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 1024] | 0.664 | 0.346 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[23]/OPTAttenti | | | | +Epoch 0 || on[self_attn]/NNCFLi | | | | +Epoch 0 || near[v_proj]/linear_ | | | | +Epoch 0 || 0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 1024] | 0.684 | 0.346 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[23]/OPTAttenti | | | | +Epoch 0 || on[self_attn]/NNCFLi | | | | +Epoch 0 || near[out_proj]/linea | | | | +Epoch 0 || r_0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 1024] | 0.655 | 0.346 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[2]/OPTAttentio | | | | +Epoch 0 || n[self_attn]/NNCFLin | | | | +Epoch 0 || ear[q_proj]/linear_0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [4096, 1024] | 0.944 | 1.384 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[23]/NNCFLinear | | | | +Epoch 0 || [fc1]/linear_0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 4096] | 0.945 | 1.384 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[23]/NNCFLinear | | | | +Epoch 0 || [fc2]/linear_0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [512, 1024] | 0.316 | 0.173 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/NNCFLinea | | | | +Epoch 0 || r[project_out]/linea | | | | +Epoch 0 || r_0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 || OPTForCausalLM/OPTMo | [1024, 1024] | 0.658 | 0.346 | +Epoch 0 || del[model]/OPTDecode | | | | +Epoch 0 || r[decoder]/ModuleLis | | | | +Epoch 0 || t[layers]/OPTDecoder | | | | +Epoch 0 || Layer[2]/OPTAttentio | | | | +Epoch 0 || n[self_attn]/NNCFLin | | | | +Epoch 0 || ear[k_proj]/linear_0 | | | | +Epoch 0 |+----------------------+----------------+----------------+---------------------+ +Epoch 0 | +Epoch 0 |Statistics of the magnitude sparsity algorithm: +Epoch 0 |+----------------------------------------------------------------------+-------+ +Epoch 0 || Statistic's name | Value | +Epoch 0 |+======================================================================+=======+ +Epoch 0 || A target level of the sparsity for the algorithm for the current | None | +Epoch 0 || epoch | | +Epoch 0 |+----------------------------------------------------------------------+-------+ +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || Layer's name | Sparsity threshold | +Epoch 0 |+=========================================================+====================+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[2]/OPTAttention[self_att | | +Epoch 0 || n]/NNCFLinear[v_proj]/linear_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[2]/OPTAttention[self_att | | +Epoch 0 || n]/NNCFLinear[out_proj]/linear_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[2]/NNCFLinear[fc1]/linea | | +Epoch 0 || r_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[2]/NNCFLinear[fc2]/linea | | +Epoch 0 || r_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[3]/OPTAttention[self_att | | +Epoch 0 || n]/NNCFLinear[q_proj]/linear_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[3]/OPTAttention[self_att | | +Epoch 0 || n]/NNCFLinear[k_proj]/linear_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[3]/OPTAttention[self_att | | +Epoch 0 || n]/NNCFLinear[v_proj]/linear_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/NNCF | 0.001 | +Epoch 0 || Linear[project_in]/linear_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[3]/OPTAttention[self_att | | +Epoch 0 || n]/NNCFLinear[out_proj]/linear_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[0]/OPTAttention[self_att | | +Epoch 0 || n]/NNCFLinear[q_proj]/linear_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[3]/NNCFLinear[fc1]/linea | | +Epoch 0 || r_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[3]/NNCFLinear[fc2]/linea | | +Epoch 0 || r_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[4]/OPTAttention[self_att | | +Epoch 0 || n]/NNCFLinear[q_proj]/linear_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[4]/OPTAttention[self_att | | +Epoch 0 || n]/NNCFLinear[k_proj]/linear_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[4]/OPTAttention[self_att | | +Epoch 0 || n]/NNCFLinear[v_proj]/linear_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[0]/OPTAttention[self_att | | +Epoch 0 || n]/NNCFLinear[k_proj]/linear_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[4]/OPTAttention[self_att | | +Epoch 0 || n]/NNCFLinear[out_proj]/linear_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[4]/NNCFLinear[fc1]/linea | | +Epoch 0 || r_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[4]/NNCFLinear[fc2]/linea | | +Epoch 0 || r_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[5]/OPTAttention[self_att | | +Epoch 0 || n]/NNCFLinear[q_proj]/linear_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[5]/OPTAttention[self_att | | +Epoch 0 || n]/NNCFLinear[k_proj]/linear_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[5]/OPTAttention[self_att | | +Epoch 0 || n]/NNCFLinear[v_proj]/linear_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[0]/OPTAttention[self_att | | +Epoch 0 || n]/NNCFLinear[v_proj]/linear_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[5]/OPTAttention[self_att | | +Epoch 0 || n]/NNCFLinear[out_proj]/linear_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[5]/NNCFLinear[fc1]/linea | | +Epoch 0 || r_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[5]/NNCFLinear[fc2]/linea | | +Epoch 0 || r_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[6]/OPTAttention[self_att | | +Epoch 0 || n]/NNCFLinear[q_proj]/linear_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[6]/OPTAttention[self_att | | +Epoch 0 || n]/NNCFLinear[k_proj]/linear_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[6]/OPTAttention[self_att | | +Epoch 0 || n]/NNCFLinear[v_proj]/linear_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[6]/OPTAttention[self_att | | +Epoch 0 || n]/NNCFLinear[out_proj]/linear_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[6]/NNCFLinear[fc1]/linea | | +Epoch 0 || r_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[6]/NNCFLinear[fc2]/linea | | +Epoch 0 || r_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[7]/OPTAttention[self_att | | +Epoch 0 || n]/NNCFLinear[q_proj]/linear_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[7]/OPTAttention[self_att | | +Epoch 0 || n]/NNCFLinear[k_proj]/linear_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[7]/OPTAttention[self_att | | +Epoch 0 || n]/NNCFLinear[v_proj]/linear_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[7]/OPTAttention[self_att | | +Epoch 0 || n]/NNCFLinear[out_proj]/linear_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[7]/NNCFLinear[fc1]/linea | | +Epoch 0 || r_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[7]/NNCFLinear[fc2]/linea | | +Epoch 0 || r_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[8]/OPTAttention[self_att | | +Epoch 0 || n]/NNCFLinear[q_proj]/linear_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[8]/OPTAttention[self_att | | +Epoch 0 || n]/NNCFLinear[k_proj]/linear_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[8]/OPTAttention[self_att | | +Epoch 0 || n]/NNCFLinear[v_proj]/linear_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[8]/OPTAttention[self_att | | +Epoch 0 || n]/NNCFLinear[out_proj]/linear_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[8]/NNCFLinear[fc1]/linea | | +Epoch 0 || r_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[8]/NNCFLinear[fc2]/linea | | +Epoch 0 || r_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[9]/OPTAttention[self_att | | +Epoch 0 || n]/NNCFLinear[q_proj]/linear_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[9]/OPTAttention[self_att | | +Epoch 0 || n]/NNCFLinear[k_proj]/linear_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[9]/OPTAttention[self_att | | +Epoch 0 || n]/NNCFLinear[v_proj]/linear_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[9]/OPTAttention[self_att | | +Epoch 0 || n]/NNCFLinear[out_proj]/linear_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[9]/NNCFLinear[fc1]/linea | | +Epoch 0 || r_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[9]/NNCFLinear[fc2]/linea | | +Epoch 0 || r_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[10]/OPTAttention[self_at | | +Epoch 0 || tn]/NNCFLinear[q_proj]/linear_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[10]/OPTAttention[self_at | | +Epoch 0 || tn]/NNCFLinear[k_proj]/linear_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[10]/OPTAttention[self_at | | +Epoch 0 || tn]/NNCFLinear[v_proj]/linear_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[10]/OPTAttention[self_at | | +Epoch 0 || tn]/NNCFLinear[out_proj]/linear_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[0]/OPTAttention[self_att | | +Epoch 0 || n]/NNCFLinear[out_proj]/linear_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[10]/NNCFLinear[fc1]/line | | +Epoch 0 || ar_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[10]/NNCFLinear[fc2]/line | | +Epoch 0 || ar_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[11]/OPTAttention[self_at | | +Epoch 0 || tn]/NNCFLinear[q_proj]/linear_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[11]/OPTAttention[self_at | | +Epoch 0 || tn]/NNCFLinear[k_proj]/linear_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[11]/OPTAttention[self_at | | +Epoch 0 || tn]/NNCFLinear[v_proj]/linear_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[11]/OPTAttention[self_at | | +Epoch 0 || tn]/NNCFLinear[out_proj]/linear_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[11]/NNCFLinear[fc1]/line | | +Epoch 0 || ar_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[11]/NNCFLinear[fc2]/line | | +Epoch 0 || ar_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[12]/OPTAttention[self_at | | +Epoch 0 || tn]/NNCFLinear[q_proj]/linear_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[12]/OPTAttention[self_at | | +Epoch 0 || tn]/NNCFLinear[k_proj]/linear_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[0]/NNCFLinear[fc1]/linea | | +Epoch 0 || r_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[12]/OPTAttention[self_at | | +Epoch 0 || tn]/NNCFLinear[v_proj]/linear_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[0]/NNCFLinear[fc2]/linea | | +Epoch 0 || r_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[12]/OPTAttention[self_at | | +Epoch 0 || tn]/NNCFLinear[out_proj]/linear_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[12]/NNCFLinear[fc1]/line | | +Epoch 0 || ar_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[12]/NNCFLinear[fc2]/line | | +Epoch 0 || ar_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[13]/OPTAttention[self_at | | +Epoch 0 || tn]/NNCFLinear[q_proj]/linear_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[13]/OPTAttention[self_at | | +Epoch 0 || tn]/NNCFLinear[k_proj]/linear_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[13]/OPTAttention[self_at | | +Epoch 0 || tn]/NNCFLinear[v_proj]/linear_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[13]/OPTAttention[self_at | | +Epoch 0 || tn]/NNCFLinear[out_proj]/linear_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[1]/OPTAttention[self_att | | +Epoch 0 || n]/NNCFLinear[q_proj]/linear_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[13]/NNCFLinear[fc1]/line | | +Epoch 0 || ar_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[13]/NNCFLinear[fc2]/line | | +Epoch 0 || ar_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[14]/OPTAttention[self_at | | +Epoch 0 || tn]/NNCFLinear[q_proj]/linear_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[14]/OPTAttention[self_at | | +Epoch 0 || tn]/NNCFLinear[k_proj]/linear_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[14]/OPTAttention[self_at | | +Epoch 0 || tn]/NNCFLinear[v_proj]/linear_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[1]/OPTAttention[self_att | | +Epoch 0 || n]/NNCFLinear[k_proj]/linear_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[14]/OPTAttention[self_at | | +Epoch 0 || tn]/NNCFLinear[out_proj]/linear_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[14]/NNCFLinear[fc1]/line | | +Epoch 0 || ar_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[14]/NNCFLinear[fc2]/line | | +Epoch 0 || ar_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[15]/OPTAttention[self_at | | +Epoch 0 || tn]/NNCFLinear[q_proj]/linear_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[15]/OPTAttention[self_at | | +Epoch 0 || tn]/NNCFLinear[k_proj]/linear_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[15]/OPTAttention[self_at | | +Epoch 0 || tn]/NNCFLinear[v_proj]/linear_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[1]/OPTAttention[self_att | | +Epoch 0 || n]/NNCFLinear[v_proj]/linear_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[15]/OPTAttention[self_at | | +Epoch 0 || tn]/NNCFLinear[out_proj]/linear_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[15]/NNCFLinear[fc1]/line | | +Epoch 0 || ar_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[15]/NNCFLinear[fc2]/line | | +Epoch 0 || ar_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[16]/OPTAttention[self_at | | +Epoch 0 || tn]/NNCFLinear[q_proj]/linear_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[16]/OPTAttention[self_at | | +Epoch 0 || tn]/NNCFLinear[k_proj]/linear_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[16]/OPTAttention[self_at | | +Epoch 0 || tn]/NNCFLinear[v_proj]/linear_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[16]/OPTAttention[self_at | | +Epoch 0 || tn]/NNCFLinear[out_proj]/linear_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[16]/NNCFLinear[fc1]/line | | +Epoch 0 || ar_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[16]/NNCFLinear[fc2]/line | | +Epoch 0 || ar_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[17]/OPTAttention[self_at | | +Epoch 0 || tn]/NNCFLinear[q_proj]/linear_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[17]/OPTAttention[self_at | | +Epoch 0 || tn]/NNCFLinear[k_proj]/linear_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[17]/OPTAttention[self_at | | +Epoch 0 || tn]/NNCFLinear[v_proj]/linear_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[17]/OPTAttention[self_at | | +Epoch 0 || tn]/NNCFLinear[out_proj]/linear_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[17]/NNCFLinear[fc1]/line | | +Epoch 0 || ar_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[17]/NNCFLinear[fc2]/line | | +Epoch 0 || ar_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[18]/OPTAttention[self_at | | +Epoch 0 || tn]/NNCFLinear[q_proj]/linear_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[18]/OPTAttention[self_at | | +Epoch 0 || tn]/NNCFLinear[k_proj]/linear_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[18]/OPTAttention[self_at | | +Epoch 0 || tn]/NNCFLinear[v_proj]/linear_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[18]/OPTAttention[self_at | | +Epoch 0 || tn]/NNCFLinear[out_proj]/linear_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[18]/NNCFLinear[fc1]/line | | +Epoch 0 || ar_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[18]/NNCFLinear[fc2]/line | | +Epoch 0 || ar_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[19]/OPTAttention[self_at | | +Epoch 0 || tn]/NNCFLinear[q_proj]/linear_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[19]/OPTAttention[self_at | | +Epoch 0 || tn]/NNCFLinear[k_proj]/linear_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[19]/OPTAttention[self_at | | +Epoch 0 || tn]/NNCFLinear[v_proj]/linear_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[19]/OPTAttention[self_at | | +Epoch 0 || tn]/NNCFLinear[out_proj]/linear_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[19]/NNCFLinear[fc1]/line | | +Epoch 0 || ar_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[19]/NNCFLinear[fc2]/line | | +Epoch 0 || ar_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[20]/OPTAttention[self_at | | +Epoch 0 || tn]/NNCFLinear[q_proj]/linear_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[20]/OPTAttention[self_at | | +Epoch 0 || tn]/NNCFLinear[k_proj]/linear_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[20]/OPTAttention[self_at | | +Epoch 0 || tn]/NNCFLinear[v_proj]/linear_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[20]/OPTAttention[self_at | | +Epoch 0 || tn]/NNCFLinear[out_proj]/linear_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[1]/OPTAttention[self_att | | +Epoch 0 || n]/NNCFLinear[out_proj]/linear_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[20]/NNCFLinear[fc1]/line | | +Epoch 0 || ar_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[20]/NNCFLinear[fc2]/line | | +Epoch 0 || ar_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[21]/OPTAttention[self_at | | +Epoch 0 || tn]/NNCFLinear[q_proj]/linear_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[21]/OPTAttention[self_at | | +Epoch 0 || tn]/NNCFLinear[k_proj]/linear_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[21]/OPTAttention[self_at | | +Epoch 0 || tn]/NNCFLinear[v_proj]/linear_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[21]/OPTAttention[self_at | | +Epoch 0 || tn]/NNCFLinear[out_proj]/linear_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[21]/NNCFLinear[fc1]/line | | +Epoch 0 || ar_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[21]/NNCFLinear[fc2]/line | | +Epoch 0 || ar_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[22]/OPTAttention[self_at | | +Epoch 0 || tn]/NNCFLinear[q_proj]/linear_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[22]/OPTAttention[self_at | | +Epoch 0 || tn]/NNCFLinear[k_proj]/linear_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[1]/NNCFLinear[fc1]/linea | | +Epoch 0 || r_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[22]/OPTAttention[self_at | | +Epoch 0 || tn]/NNCFLinear[v_proj]/linear_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[1]/NNCFLinear[fc2]/linea | | +Epoch 0 || r_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[22]/OPTAttention[self_at | | +Epoch 0 || tn]/NNCFLinear[out_proj]/linear_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[22]/NNCFLinear[fc1]/line | | +Epoch 0 || ar_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[22]/NNCFLinear[fc2]/line | | +Epoch 0 || ar_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[23]/OPTAttention[self_at | | +Epoch 0 || tn]/NNCFLinear[q_proj]/linear_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[23]/OPTAttention[self_at | | +Epoch 0 || tn]/NNCFLinear[k_proj]/linear_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[23]/OPTAttention[self_at | | +Epoch 0 || tn]/NNCFLinear[v_proj]/linear_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[23]/OPTAttention[self_at | | +Epoch 0 || tn]/NNCFLinear[out_proj]/linear_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[2]/OPTAttention[self_att | | +Epoch 0 || n]/NNCFLinear[q_proj]/linear_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[23]/NNCFLinear[fc1]/line | | +Epoch 0 || ar_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[23]/NNCFLinear[fc2]/line | | +Epoch 0 || ar_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/NNCF | 0.001 | +Epoch 0 || Linear[project_out]/linear_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 || OPTForCausalLM/OPTModel[model]/OPTDecoder[decoder]/Modu | 0.001 | +Epoch 0 || leList[layers]/OPTDecoderLayer[2]/OPTAttention[self_att | | +Epoch 0 || n]/NNCFLinear[k_proj]/linear_0 | | +Epoch 0 |+---------------------------------------------------------+--------------------+ +Epoch 0 | +Epoch 0 |Statistics of the quantization algorithm: +Epoch 0 |+--------------------------------+-------+ +Epoch 0 || Statistic's name | Value | +Epoch 0 |+================================+=======+ +Epoch 0 || Ratio of enabled quantizations | 100 | +Epoch 0 |+--------------------------------+-------+ +Epoch 0 | +Epoch 0 |Statistics of the quantization share: +Epoch 0 |+----------------------------------+----------------------+ +Epoch 0 || Statistic's name | Value | +Epoch 0 |+==================================+======================+ +Epoch 0 || Symmetric WQs / All placed WQs | 100.00 % (147 / 147) | +Epoch 0 |+----------------------------------+----------------------+ +Epoch 0 || Asymmetric WQs / All placed WQs | 0.00 % (0 / 147) | +Epoch 0 |+----------------------------------+----------------------+ +Epoch 0 || Signed WQs / All placed WQs | 100.00 % (147 / 147) | +Epoch 0 |+----------------------------------+----------------------+ +Epoch 0 || Unsigned WQs / All placed WQs | 0.00 % (0 / 147) | +Epoch 0 |+----------------------------------+----------------------+ +Epoch 0 || Per-tensor WQs / All placed WQs | 0.00 % (0 / 147) | +Epoch 0 |+----------------------------------+----------------------+ +Epoch 0 || Per-channel WQs / All placed WQs | 100.00 % (147 / 147) | +Epoch 0 |+----------------------------------+----------------------+ +Epoch 0 || Placed WQs / Potential WQs | 75.00 % (147 / 196) | +Epoch 0 |+----------------------------------+----------------------+ +Epoch 0 || Symmetric AQs / All placed AQs | 100.00 % (243 / 243) | +Epoch 0 |+----------------------------------+----------------------+ +Epoch 0 || Asymmetric AQs / All placed AQs | 0.00 % (0 / 243) | +Epoch 0 |+----------------------------------+----------------------+ +Epoch 0 || Signed AQs / All placed AQs | 80.25 % (195 / 243) | +Epoch 0 |+----------------------------------+----------------------+ +Epoch 0 || Unsigned AQs / All placed AQs | 19.75 % (48 / 243) | +Epoch 0 |+----------------------------------+----------------------+ +Epoch 0 || Per-tensor AQs / All placed AQs | 100.00 % (243 / 243) | +Epoch 0 |+----------------------------------+----------------------+ +Epoch 0 || Per-channel AQs / All placed AQs | 0.00 % (0 / 243) | +Epoch 0 |+----------------------------------+----------------------+ +Epoch 0 | +Epoch 0 |Statistics of the bitwidth distribution: +Epoch 0 |+--------------+---------------------+--------------------+--------------------+ +Epoch 0 || Num bits (N) | N-bits WQs / Placed | N-bits AQs / | N-bits Qs / Placed | +Epoch 0 || | WQs | Placed AQs | Qs | +Epoch 0 |+==============+=====================+====================+====================+ +Epoch 0 || 8 | 100.00 % (147 / | 100.00 % (243 / | 100.00 % (390 / | +Epoch 0 || | 147) | 243) | 390) | +Epoch 0 |+--------------+---------------------+--------------------+--------------------+ +WARNING:nncf:You are setting `forward` on an NNCF-processed model object. +NNCF relies on custom-wrapping the `forward` call in order to function properly. +Arbitrary adjustments to the forward function on an NNCFNetwork object have undefined behaviour. +If you need to replace the underlying forward function of the original model so that NNCF should be using that instead of the original forward function that NNCF saved during the compressed model creation, you can do this by calling: +model.nncf.set_original_unbound_forward(fn) +if `fn` has an unbound 0-th `self` argument, or +with model.nncf.temporary_bound_original_forward(fn): ... +if `fn` already had 0-th `self` argument bound or never had it in the first place. +WARNING:nncf:You are setting `forward` on an NNCF-processed model object. +NNCF relies on custom-wrapping the `forward` call in order to function properly. +Arbitrary adjustments to the forward function on an NNCFNetwork object have undefined behaviour. +If you need to replace the underlying forward function of the original model so that NNCF should be using that instead of the original forward function that NNCF saved during the compressed model creation, you can do this by calling: +model.nncf.set_original_unbound_forward(fn) +if `fn` has an unbound 0-th `self` argument, or +with model.nncf.temporary_bound_original_forward(fn): ... +if `fn` already had 0-th `self` argument bound or never had it in the first place.