manticore-13b-4bit-128g / manticore-13b-4bit-128g.log
plabadens's picture
Inital model conversion
34d2958 verified
Loading checkpoint shards: 0%| | 0/3 [00:00<?, ?it/s] Loading checkpoint shards: 33%|β–ˆβ–ˆβ–ˆβ–Ž | 1/3 [00:58<01:56, 58.23s/it] Loading checkpoint shards: 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2/3 [02:10<01:06, 66.21s/it] Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 3/3 [02:47<00:00, 53.23s/it] Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 3/3 [02:47<00:00, 55.94s/it]
Found cached dataset json (~/.cache/huggingface/datasets/allenai___json/allenai--c4-6fbe877195f42de5/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)
Found cached dataset json (~/.cache/huggingface/datasets/allenai___json/allenai--c4-efc3d4f4606f44bd/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)
Token indices sequence length is longer than the specified maximum sequence length for this model (3908 > 2048). Running this sequence through the model will result in indexing errors
Starting ...
Ready.
Quantizing layer 1/40..
+------------------+--------------+------------+-----------+-------+
| name | weight_error | fp_inp_SNR | q_inp_SNR | time |
+==================+==============+============+===========+=======+
| self_attn.k_proj | 699.085 | - | - | 2.969 |
| self_attn.v_proj | 20.162 | - | - | 1.529 |
| self_attn.q_proj | 663.384 | - | - | 1.536 |
| self_attn.o_proj | 2.897 | - | - | 2.057 |
| mlp.up_proj | 126.477 | - | - | 2.155 |
| mlp.gate_proj | 136.255 | - | - | 1.492 |
| mlp.down_proj | 7.972 | - | - | 5.673 |
+------------------+--------------+------------+-----------+-------+
Quantizing layer 2/40..
+------------------+--------------+------------+-----------+-------+
| name | weight_error | fp_inp_SNR | q_inp_SNR | time |
+==================+==============+============+===========+=======+
| self_attn.k_proj | 833.716 | - | - | 2.145 |
| self_attn.v_proj | 84.632 | - | - | 1.437 |
| self_attn.q_proj | 792.299 | - | - | 1.425 |
| self_attn.o_proj | 17.699 | - | - | 2.037 |
| mlp.up_proj | 866.981 | - | - | 2.223 |
| mlp.gate_proj | 984.281 | - | - | 1.536 |
| mlp.down_proj | 69.406 | - | - | 6.099 |
+------------------+--------------+------------+-----------+-------+
Quantizing layer 3/40..
+------------------+--------------+------------+-----------+-------+
| name | weight_error | fp_inp_SNR | q_inp_SNR | time |
+==================+==============+============+===========+=======+
| self_attn.k_proj | 2007.076 | - | - | 2.197 |
| self_attn.v_proj | 316.141 | - | - | 1.417 |
| self_attn.q_proj | 1877.389 | - | - | 1.552 |
| self_attn.o_proj | 48.965 | - | - | 2.256 |
| mlp.up_proj | 2437.944 | - | - | 2.213 |
| mlp.gate_proj | 2980.109 | - | - | 1.556 |
| mlp.down_proj | 323.876 | - | - | 6.116 |
+------------------+--------------+------------+-----------+-------+
Quantizing layer 4/40..
+------------------+--------------+------------+-----------+-------+
| name | weight_error | fp_inp_SNR | q_inp_SNR | time |
+==================+==============+============+===========+=======+
| self_attn.k_proj | 7247.779 | - | - | 2.183 |
| self_attn.v_proj | 2281.976 | - | - | 1.459 |
| self_attn.q_proj | 7044.623 | - | - | 1.492 |
| self_attn.o_proj | 95.820 | - | - | 2.235 |
| mlp.up_proj | 4585.885 | - | - | 2.378 |
| mlp.gate_proj | 5477.426 | - | - | 1.624 |
| mlp.down_proj | 336.858 | - | - | 6.299 |
+------------------+--------------+------------+-----------+-------+
Quantizing layer 5/40..
+------------------+--------------+------------+-----------+-------+
| name | weight_error | fp_inp_SNR | q_inp_SNR | time |
+==================+==============+============+===========+=======+
| self_attn.k_proj | 9244.811 | - | - | 2.210 |
| self_attn.v_proj | 3243.931 | - | - | 1.544 |
| self_attn.q_proj | 9085.997 | - | - | 1.505 |
| self_attn.o_proj | 135.217 | - | - | 2.111 |
| mlp.up_proj | 6404.132 | - | - | 2.189 |
| mlp.gate_proj | 7805.458 | - | - | 1.572 |
| mlp.down_proj | 558.483 | - | - | 5.889 |
+------------------+--------------+------------+-----------+-------+
Quantizing layer 6/40..
+------------------+--------------+------------+-----------+-------+
| name | weight_error | fp_inp_SNR | q_inp_SNR | time |
+==================+==============+============+===========+=======+
| self_attn.k_proj | 12590.303 | - | - | 2.208 |
| self_attn.v_proj | 4789.499 | - | - | 1.485 |
| self_attn.q_proj | 12458.196 | - | - | 1.677 |
| self_attn.o_proj | 185.082 | - | - | 2.269 |
| mlp.up_proj | 8282.830 | - | - | 2.188 |
| mlp.gate_proj | 9948.463 | - | - | 1.533 |
| mlp.down_proj | 799.254 | - | - | 5.824 |
+------------------+--------------+------------+-----------+-------+
Quantizing layer 7/40..
+------------------+--------------+------------+-----------+-------+
| name | weight_error | fp_inp_SNR | q_inp_SNR | time |
+==================+==============+============+===========+=======+
| self_attn.k_proj | 12815.720 | - | - | 2.281 |
| self_attn.v_proj | 4819.657 | - | - | 1.550 |
| self_attn.q_proj | 12782.796 | - | - | 1.601 |
| self_attn.o_proj | 443.461 | - | - | 2.010 |
| mlp.up_proj | 9536.821 | - | - | 2.267 |
| mlp.gate_proj | 11615.631 | - | - | 1.582 |
| mlp.down_proj | 1365.899 | - | - | 5.818 |
+------------------+--------------+------------+-----------+-------+
Quantizing layer 8/40..
+------------------+--------------+------------+-----------+-------+
| name | weight_error | fp_inp_SNR | q_inp_SNR | time |
+==================+==============+============+===========+=======+
| self_attn.k_proj | 14306.873 | - | - | 2.319 |
| self_attn.v_proj | 6024.040 | - | - | 1.529 |
| self_attn.q_proj | 14025.603 | - | - | 1.475 |
| self_attn.o_proj | 533.999 | - | - | 2.126 |
| mlp.up_proj | 11076.211 | - | - | 2.303 |
| mlp.gate_proj | 12810.490 | - | - | 1.554 |
| mlp.down_proj | 1382.088 | - | - | 5.916 |
+------------------+--------------+------------+-----------+-------+
Quantizing layer 9/40..
+------------------+--------------+------------+-----------+-------+
| name | weight_error | fp_inp_SNR | q_inp_SNR | time |
+==================+==============+============+===========+=======+
| self_attn.k_proj | 18277.871 | - | - | 2.216 |
| self_attn.v_proj | 8258.400 | - | - | 1.561 |
| self_attn.q_proj | 18689.828 | - | - | 1.627 |
| self_attn.o_proj | 638.868 | - | - | 2.238 |
| mlp.up_proj | 12678.387 | - | - | 2.339 |
| mlp.gate_proj | 14003.211 | - | - | 1.568 |
| mlp.down_proj | 1604.783 | - | - | 5.831 |
+------------------+--------------+------------+-----------+-------+
Quantizing layer 10/40..
+------------------+--------------+------------+-----------+-------+
| name | weight_error | fp_inp_SNR | q_inp_SNR | time |
+==================+==============+============+===========+=======+
| self_attn.k_proj | 17910.699 | - | - | 2.110 |
| self_attn.v_proj | 7896.578 | - | - | 1.476 |
| self_attn.q_proj | 17414.641 | - | - | 1.455 |
| self_attn.o_proj | 821.772 | - | - | 2.139 |
| mlp.up_proj | 14038.678 | - | - | 2.091 |
| mlp.gate_proj | 15241.850 | - | - | 1.474 |
| mlp.down_proj | 1920.510 | - | - | 6.219 |
+------------------+--------------+------------+-----------+-------+
Quantizing layer 11/40..
+------------------+--------------+------------+-----------+-------+
| name | weight_error | fp_inp_SNR | q_inp_SNR | time |
+==================+==============+============+===========+=======+
| self_attn.k_proj | 18869.732 | - | - | 2.180 |
| self_attn.v_proj | 9237.131 | - | - | 1.484 |
| self_attn.q_proj | 18370.957 | - | - | 1.477 |
| self_attn.o_proj | 969.532 | - | - | 2.147 |
| mlp.up_proj | 15312.948 | - | - | 2.398 |
| mlp.gate_proj | 16015.623 | - | - | 1.711 |
| mlp.down_proj | 2238.236 | - | - | 5.736 |
+------------------+--------------+------------+-----------+-------+
Quantizing layer 12/40..
+------------------+--------------+------------+-----------+-------+
| name | weight_error | fp_inp_SNR | q_inp_SNR | time |
+==================+==============+============+===========+=======+
| self_attn.k_proj | 20102.830 | - | - | 2.140 |
| self_attn.v_proj | 9838.231 | - | - | 1.473 |
| self_attn.q_proj | 19395.590 | - | - | 1.452 |
| self_attn.o_proj | 1331.936 | - | - | 2.024 |
| mlp.up_proj | 16841.980 | - | - | 2.165 |
| mlp.gate_proj | 17118.094 | - | - | 1.480 |
| mlp.down_proj | 2681.943 | - | - | 5.950 |
+------------------+--------------+------------+-----------+-------+
Quantizing layer 13/40..
+------------------+--------------+------------+-----------+-------+
| name | weight_error | fp_inp_SNR | q_inp_SNR | time |
+==================+==============+============+===========+=======+
| self_attn.k_proj | 20447.057 | - | - | 2.293 |
| self_attn.v_proj | 10192.447 | - | - | 1.541 |
| self_attn.q_proj | 19843.822 | - | - | 1.540 |
| self_attn.o_proj | 1744.482 | - | - | 2.103 |
| mlp.up_proj | 17869.541 | - | - | 2.092 |
| mlp.gate_proj | 17824.297 | - | - | 1.521 |
| mlp.down_proj | 3193.839 | - | - | 6.185 |
+------------------+--------------+------------+-----------+-------+
Quantizing layer 14/40..
+------------------+--------------+------------+-----------+-------+
| name | weight_error | fp_inp_SNR | q_inp_SNR | time |
+==================+==============+============+===========+=======+
| self_attn.k_proj | 22471.352 | - | - | 2.283 |
| self_attn.v_proj | 13197.844 | - | - | 1.526 |
| self_attn.q_proj | 21965.322 | - | - | 1.511 |
| self_attn.o_proj | 1889.137 | - | - | 2.187 |
| mlp.up_proj | 20171.234 | - | - | 2.470 |
| mlp.gate_proj | 20078.652 | - | - | 1.753 |
| mlp.down_proj | 3986.132 | - | - | 5.910 |
+------------------+--------------+------------+-----------+-------+
Quantizing layer 15/40..
+------------------+--------------+------------+-----------+-------+
| name | weight_error | fp_inp_SNR | q_inp_SNR | time |
+==================+==============+============+===========+=======+
| self_attn.k_proj | 21548.637 | - | - | 2.269 |
| self_attn.v_proj | 13209.446 | - | - | 1.533 |
| self_attn.q_proj | 20824.828 | - | - | 1.576 |
| self_attn.o_proj | 1764.429 | - | - | 2.162 |
| mlp.up_proj | 22525.393 | - | - | 2.246 |
| mlp.gate_proj | 22575.895 | - | - | 1.552 |
| mlp.down_proj | 4570.991 | - | - | 5.823 |
+------------------+--------------+------------+-----------+-------+
Quantizing layer 16/40..
+------------------+--------------+------------+-----------+-------+
| name | weight_error | fp_inp_SNR | q_inp_SNR | time |
+==================+==============+============+===========+=======+
| self_attn.k_proj | 24878.395 | - | - | 2.239 |
| self_attn.v_proj | 15705.982 | - | - | 1.527 |
| self_attn.q_proj | 24505.125 | - | - | 1.517 |
| self_attn.o_proj | 1836.744 | - | - | 2.169 |
| mlp.up_proj | 24725.775 | - | - | 2.263 |
| mlp.gate_proj | 24939.625 | - | - | 1.629 |
| mlp.down_proj | 5293.017 | - | - | 6.001 |
+------------------+--------------+------------+-----------+-------+
Quantizing layer 17/40..
+------------------+--------------+------------+-----------+-------+
| name | weight_error | fp_inp_SNR | q_inp_SNR | time |
+==================+==============+============+===========+=======+
| self_attn.k_proj | 24504.688 | - | - | 2.240 |
| self_attn.v_proj | 16056.951 | - | - | 1.547 |
| self_attn.q_proj | 23832.211 | - | - | 1.547 |
| self_attn.o_proj | 1557.532 | - | - | 2.148 |
| mlp.up_proj | 27068.410 | - | - | 2.245 |
| mlp.gate_proj | 27383.363 | - | - | 1.551 |
| mlp.down_proj | 5846.998 | - | - | 5.885 |
+------------------+--------------+------------+-----------+-------+
Quantizing layer 18/40..
+------------------+--------------+------------+-----------+-------+
| name | weight_error | fp_inp_SNR | q_inp_SNR | time |
+==================+==============+============+===========+=======+
| self_attn.k_proj | 23499.668 | - | - | 2.245 |
| self_attn.v_proj | 14901.561 | - | - | 1.506 |
| self_attn.q_proj | 22453.777 | - | - | 1.531 |
| self_attn.o_proj | 1982.065 | - | - | 2.138 |
| mlp.up_proj | 28787.129 | - | - | 2.233 |
| mlp.gate_proj | 29317.820 | - | - | 1.525 |
| mlp.down_proj | 6634.873 | - | - | 5.954 |
+------------------+--------------+------------+-----------+-------+
Quantizing layer 19/40..
+------------------+--------------+------------+-----------+-------+
| name | weight_error | fp_inp_SNR | q_inp_SNR | time |
+==================+==============+============+===========+=======+
| self_attn.k_proj | 25263.570 | - | - | 2.261 |
| self_attn.v_proj | 17757.848 | - | - | 1.533 |
| self_attn.q_proj | 24840.973 | - | - | 1.529 |
| self_attn.o_proj | 2037.134 | - | - | 2.174 |
| mlp.up_proj | 30709.844 | - | - | 2.243 |
| mlp.gate_proj | 31749.461 | - | - | 1.566 |
| mlp.down_proj | 7825.893 | - | - | 5.977 |
+------------------+--------------+------------+-----------+-------+
Quantizing layer 20/40..
+------------------+--------------+------------+-----------+-------+
| name | weight_error | fp_inp_SNR | q_inp_SNR | time |
+==================+==============+============+===========+=======+
| self_attn.k_proj | 25034.102 | - | - | 2.202 |
| self_attn.v_proj | 18874.211 | - | - | 1.487 |
| self_attn.q_proj | 24577.367 | - | - | 1.486 |
| self_attn.o_proj | 2441.195 | - | - | 2.198 |
| mlp.up_proj | 33281.070 | - | - | 2.217 |
| mlp.gate_proj | 34847.070 | - | - | 1.547 |
| mlp.down_proj | 9326.249 | - | - | 5.867 |
+------------------+--------------+------------+-----------+-------+
Quantizing layer 21/40..
+------------------+--------------+------------+-----------+-------+
| name | weight_error | fp_inp_SNR | q_inp_SNR | time |
+==================+==============+============+===========+=======+
| self_attn.k_proj | 24243.559 | - | - | 2.254 |
| self_attn.v_proj | 18996.426 | - | - | 1.544 |
| self_attn.q_proj | 24238.959 | - | - | 1.530 |
| self_attn.o_proj | 2421.684 | - | - | 2.162 |
| mlp.up_proj | 35619.750 | - | - | 2.249 |
| mlp.gate_proj | 37851.703 | - | - | 1.558 |
| mlp.down_proj | 10490.234 | - | - | 5.957 |
+------------------+--------------+------------+-----------+-------+
Quantizing layer 22/40..
+------------------+--------------+------------+-----------+-------+
| name | weight_error | fp_inp_SNR | q_inp_SNR | time |
+==================+==============+============+===========+=======+
| self_attn.k_proj | 24204.869 | - | - | 2.233 |
| self_attn.v_proj | 20761.924 | - | - | 1.548 |
| self_attn.q_proj | 24269.518 | - | - | 1.488 |
| self_attn.o_proj | 2956.973 | - | - | 2.183 |
| mlp.up_proj | 37259.047 | - | - | 2.210 |
| mlp.gate_proj | 39854.340 | - | - | 1.527 |
| mlp.down_proj | 11912.477 | - | - | 5.834 |
+------------------+--------------+------------+-----------+-------+
Quantizing layer 23/40..
+------------------+--------------+------------+-----------+-------+
| name | weight_error | fp_inp_SNR | q_inp_SNR | time |
+==================+==============+============+===========+=======+
| self_attn.k_proj | 23936.551 | - | - | 2.221 |
| self_attn.v_proj | 20511.947 | - | - | 1.475 |
| self_attn.q_proj | 23747.645 | - | - | 1.482 |
| self_attn.o_proj | 2550.344 | - | - | 2.146 |
| mlp.up_proj | 38672.172 | - | - | 2.227 |
| mlp.gate_proj | 42040.203 | - | - | 1.510 |
| mlp.down_proj | 13073.925 | - | - | 5.963 |
+------------------+--------------+------------+-----------+-------+
Quantizing layer 24/40..
+------------------+--------------+------------+-----------+-------+
| name | weight_error | fp_inp_SNR | q_inp_SNR | time |
+==================+==============+============+===========+=======+
| self_attn.k_proj | 26160.334 | - | - | 2.235 |
| self_attn.v_proj | 24639.689 | - | - | 1.486 |
| self_attn.q_proj | 26502.482 | - | - | 1.483 |
| self_attn.o_proj | 2817.627 | - | - | 2.133 |
| mlp.up_proj | 40697.367 | - | - | 2.237 |
| mlp.gate_proj | 44920.602 | - | - | 1.545 |
| mlp.down_proj | 14389.604 | - | - | 5.868 |
+------------------+--------------+------------+-----------+-------+
Quantizing layer 25/40..
+------------------+--------------+------------+-----------+-------+
| name | weight_error | fp_inp_SNR | q_inp_SNR | time |
+==================+==============+============+===========+=======+
| self_attn.k_proj | 25879.900 | - | - | 2.257 |
| self_attn.v_proj | 24740.775 | - | - | 1.511 |
| self_attn.q_proj | 26419.555 | - | - | 1.506 |
| self_attn.o_proj | 2836.316 | - | - | 2.129 |
| mlp.up_proj | 42787.164 | - | - | 2.210 |
| mlp.gate_proj | 47598.883 | - | - | 1.560 |
| mlp.down_proj | 15096.459 | - | - | 5.970 |
+------------------+--------------+------------+-----------+-------+
Quantizing layer 26/40..
+------------------+--------------+------------+-----------+-------+
| name | weight_error | fp_inp_SNR | q_inp_SNR | time |
+==================+==============+============+===========+=======+
| self_attn.k_proj | 25009.330 | - | - | 2.197 |
| self_attn.v_proj | 25032.744 | - | - | 1.477 |
| self_attn.q_proj | 25241.531 | - | - | 1.479 |
| self_attn.o_proj | 2970.640 | - | - | 2.127 |
| mlp.up_proj | 44925.082 | - | - | 2.202 |
| mlp.gate_proj | 50018.508 | - | - | 1.564 |
| mlp.down_proj | 16100.578 | - | - | 5.851 |
+------------------+--------------+------------+-----------+-------+
Quantizing layer 27/40..
+------------------+--------------+------------+-----------+-------+
| name | weight_error | fp_inp_SNR | q_inp_SNR | time |
+==================+==============+============+===========+=======+
| self_attn.k_proj | 28136.842 | - | - | 2.203 |
| self_attn.v_proj | 28984.514 | - | - | 1.529 |
| self_attn.q_proj | 28480.695 | - | - | 1.521 |
| self_attn.o_proj | 2583.247 | - | - | 2.163 |
| mlp.up_proj | 47577.098 | - | - | 2.224 |
| mlp.gate_proj | 52985.324 | - | - | 1.524 |
| mlp.down_proj | 16923.477 | - | - | 5.920 |
+------------------+--------------+------------+-----------+-------+
Quantizing layer 28/40..
+------------------+--------------+------------+-----------+-------+
| name | weight_error | fp_inp_SNR | q_inp_SNR | time |
+==================+==============+============+===========+=======+
| self_attn.k_proj | 27372.766 | - | - | 2.218 |
| self_attn.v_proj | 29232.785 | - | - | 1.498 |
| self_attn.q_proj | 27755.496 | - | - | 1.516 |
| self_attn.o_proj | 2929.728 | - | - | 2.153 |
| mlp.up_proj | 49762.422 | - | - | 2.210 |
| mlp.gate_proj | 55543.320 | - | - | 1.547 |
| mlp.down_proj | 17614.914 | - | - | 5.478 |
+------------------+--------------+------------+-----------+-------+
Quantizing layer 29/40..
+------------------+--------------+------------+-----------+-------+
| name | weight_error | fp_inp_SNR | q_inp_SNR | time |
+==================+==============+============+===========+=======+
| self_attn.k_proj | 28571.195 | - | - | 2.112 |
| self_attn.v_proj | 28252.555 | - | - | 1.441 |
| self_attn.q_proj | 29134.234 | - | - | 1.584 |
| self_attn.o_proj | 2582.378 | - | - | 2.154 |
| mlp.up_proj | 51553.789 | - | - | 2.268 |
| mlp.gate_proj | 57860.793 | - | - | 1.526 |
| mlp.down_proj | 18117.818 | - | - | 5.944 |
+------------------+--------------+------------+-----------+-------+
Quantizing layer 30/40..
+------------------+--------------+------------+-----------+-------+
| name | weight_error | fp_inp_SNR | q_inp_SNR | time |
+==================+==============+============+===========+=======+
| self_attn.k_proj | 28818.846 | - | - | 2.235 |
| self_attn.v_proj | 29999.676 | - | - | 1.507 |
| self_attn.q_proj | 29018.770 | - | - | 1.513 |
| self_attn.o_proj | 2406.745 | - | - | 2.136 |
| mlp.up_proj | 54442.094 | - | - | 2.222 |
| mlp.gate_proj | 60799.031 | - | - | 1.561 |
| mlp.down_proj | 18606.734 | - | - | 5.950 |
+------------------+--------------+------------+-----------+-------+
Quantizing layer 31/40..
+------------------+--------------+------------+-----------+-------+
| name | weight_error | fp_inp_SNR | q_inp_SNR | time |
+==================+==============+============+===========+=======+
| self_attn.k_proj | 31552.672 | - | - | 2.238 |
| self_attn.v_proj | 32085.520 | - | - | 1.517 |
| self_attn.q_proj | 31604.691 | - | - | 1.512 |
| self_attn.o_proj | 2696.787 | - | - | 2.193 |
| mlp.up_proj | 57001.633 | - | - | 2.223 |
| mlp.gate_proj | 63375.883 | - | - | 1.523 |
| mlp.down_proj | 19194.986 | - | - | 5.921 |
+------------------+--------------+------------+-----------+-------+
Quantizing layer 32/40..
+------------------+--------------+------------+-----------+-------+
| name | weight_error | fp_inp_SNR | q_inp_SNR | time |
+==================+==============+============+===========+=======+
| self_attn.k_proj | 30807.047 | - | - | 2.214 |
| self_attn.v_proj | 33207.539 | - | - | 1.545 |
| self_attn.q_proj | 30847.758 | - | - | 1.510 |
| self_attn.o_proj | 2343.647 | - | - | 2.143 |
| mlp.up_proj | 59283.980 | - | - | 2.254 |
| mlp.gate_proj | 65641.500 | - | - | 1.526 |
| mlp.down_proj | 20043.617 | - | - | 5.885 |
+------------------+--------------+------------+-----------+-------+
Quantizing layer 33/40..
+------------------+--------------+------------+-----------+-------+
| name | weight_error | fp_inp_SNR | q_inp_SNR | time |
+==================+==============+============+===========+=======+
| self_attn.k_proj | 32050.086 | - | - | 2.220 |
| self_attn.v_proj | 35367.578 | - | - | 1.524 |
| self_attn.q_proj | 32181.414 | - | - | 1.484 |
| self_attn.o_proj | 1997.855 | - | - | 2.133 |
| mlp.up_proj | 62253.375 | - | - | 2.259 |
| mlp.gate_proj | 67897.727 | - | - | 1.522 |
| mlp.down_proj | 20735.859 | - | - | 5.862 |
+------------------+--------------+------------+-----------+-------+
Quantizing layer 34/40..
+------------------+--------------+------------+-----------+-------+
| name | weight_error | fp_inp_SNR | q_inp_SNR | time |
+==================+==============+============+===========+=======+
| self_attn.k_proj | 34280.840 | - | - | 2.217 |
| self_attn.v_proj | 37861.223 | - | - | 1.493 |
| self_attn.q_proj | 34754.984 | - | - | 1.570 |
| self_attn.o_proj | 2229.951 | - | - | 2.141 |
| mlp.up_proj | 64770.355 | - | - | 2.228 |
| mlp.gate_proj | 69814.594 | - | - | 1.562 |
| mlp.down_proj | 22216.566 | - | - | 5.922 |
+------------------+--------------+------------+-----------+-------+
Quantizing layer 35/40..
+------------------+--------------+------------+-----------+-------+
| name | weight_error | fp_inp_SNR | q_inp_SNR | time |
+==================+==============+============+===========+=======+
| self_attn.k_proj | 30260.629 | - | - | 2.191 |
| self_attn.v_proj | 33725.844 | - | - | 1.476 |
| self_attn.q_proj | 30395.316 | - | - | 1.472 |
| self_attn.o_proj | 3188.567 | - | - | 2.149 |
| mlp.up_proj | 66777.297 | - | - | 2.297 |
| mlp.gate_proj | 70337.727 | - | - | 1.547 |
| mlp.down_proj | 24092.455 | - | - | 5.939 |
+------------------+--------------+------------+-----------+-------+
Quantizing layer 36/40..
+------------------+--------------+------------+-----------+-------+
| name | weight_error | fp_inp_SNR | q_inp_SNR | time |
+==================+==============+============+===========+=======+
| self_attn.k_proj | 27951.232 | - | - | 2.108 |
| self_attn.v_proj | 29941.828 | - | - | 1.426 |
| self_attn.q_proj | 28183.521 | - | - | 1.422 |
| self_attn.o_proj | 3038.925 | - | - | 2.006 |
| mlp.up_proj | 68732.734 | - | - | 2.153 |
| mlp.gate_proj | 71100.156 | - | - | 1.659 |
| mlp.down_proj | 26909.160 | - | - | 5.484 |
+------------------+--------------+------------+-----------+-------+
Quantizing layer 37/40..
+------------------+--------------+------------+-----------+-------+
| name | weight_error | fp_inp_SNR | q_inp_SNR | time |
+==================+==============+============+===========+=======+
| self_attn.k_proj | 27422.418 | - | - | 2.247 |
| self_attn.v_proj | 32204.926 | - | - | 1.542 |
| self_attn.q_proj | 27318.453 | - | - | 1.546 |
| self_attn.o_proj | 2744.674 | - | - | 2.196 |
| mlp.up_proj | 70397.406 | - | - | 2.310 |
| mlp.gate_proj | 71945.289 | - | - | 1.640 |
| mlp.down_proj | 30182.273 | - | - | 6.029 |
+------------------+--------------+------------+-----------+-------+
Quantizing layer 38/40..
+------------------+--------------+------------+-----------+-------+
| name | weight_error | fp_inp_SNR | q_inp_SNR | time |
+==================+==============+============+===========+=======+
| self_attn.k_proj | 25726.672 | - | - | 2.424 |
| self_attn.v_proj | 29666.820 | - | - | 1.542 |
| self_attn.q_proj | 25691.258 | - | - | 1.530 |
| self_attn.o_proj | 4238.191 | - | - | 2.473 |
| mlp.up_proj | 69719.492 | - | - | 2.576 |
| mlp.gate_proj | 71583.094 | - | - | 1.790 |
| mlp.down_proj | 35800.488 | - | - | 6.818 |
+------------------+--------------+------------+-----------+-------+
Quantizing layer 39/40..
+------------------+--------------+------------+-----------+-------+
| name | weight_error | fp_inp_SNR | q_inp_SNR | time |
+==================+==============+============+===========+=======+
| self_attn.k_proj | 23775.566 | - | - | 2.246 |
| self_attn.v_proj | 28923.490 | - | - | 1.677 |
| self_attn.q_proj | 24229.426 | - | - | 1.687 |
| self_attn.o_proj | 8112.882 | - | - | 2.424 |
| mlp.up_proj | 63641.328 | - | - | 2.540 |
| mlp.gate_proj | 66049.242 | - | - | 1.721 |
| mlp.down_proj | 47443.551 | - | - | 6.712 |
+------------------+--------------+------------+-----------+-------+
Quantizing layer 40/40..
+------------------+--------------+------------+-----------+-------+
| name | weight_error | fp_inp_SNR | q_inp_SNR | time |
+==================+==============+============+===========+=======+
| self_attn.k_proj | 15865.144 | - | - | 2.534 |
| self_attn.v_proj | 16730.523 | - | - | 1.709 |
| self_attn.q_proj | 15812.807 | - | - | 1.727 |
| self_attn.o_proj | 3682.083 | - | - | 2.429 |
| mlp.up_proj | 50250.707 | - | - | 2.543 |
| mlp.gate_proj | 52429.242 | - | - | 1.738 |
| mlp.down_proj | 58651.867 | - | - | 6.809 |
+------------------+--------------+------------+-----------+-------+
1905.7269313335419
Packing ...
model.layers.0.self_attn.k_proj
model.layers.0.self_attn.o_proj
model.layers.0.self_attn.q_proj
model.layers.0.self_attn.v_proj
model.layers.0.mlp.down_proj
model.layers.0.mlp.gate_proj
model.layers.0.mlp.up_proj
model.layers.1.self_attn.k_proj
model.layers.1.self_attn.o_proj
model.layers.1.self_attn.q_proj
model.layers.1.self_attn.v_proj
model.layers.1.mlp.down_proj
model.layers.1.mlp.gate_proj
model.layers.1.mlp.up_proj
model.layers.2.self_attn.k_proj
model.layers.2.self_attn.o_proj
model.layers.2.self_attn.q_proj
model.layers.2.self_attn.v_proj
model.layers.2.mlp.down_proj
model.layers.2.mlp.gate_proj
model.layers.2.mlp.up_proj
model.layers.3.self_attn.k_proj
model.layers.3.self_attn.o_proj
model.layers.3.self_attn.q_proj
model.layers.3.self_attn.v_proj
model.layers.3.mlp.down_proj
model.layers.3.mlp.gate_proj
model.layers.3.mlp.up_proj
model.layers.4.self_attn.k_proj
model.layers.4.self_attn.o_proj
model.layers.4.self_attn.q_proj
model.layers.4.self_attn.v_proj
model.layers.4.mlp.down_proj
model.layers.4.mlp.gate_proj
model.layers.4.mlp.up_proj
model.layers.5.self_attn.k_proj
model.layers.5.self_attn.o_proj
model.layers.5.self_attn.q_proj
model.layers.5.self_attn.v_proj
model.layers.5.mlp.down_proj
model.layers.5.mlp.gate_proj
model.layers.5.mlp.up_proj
model.layers.6.self_attn.k_proj
model.layers.6.self_attn.o_proj
model.layers.6.self_attn.q_proj
model.layers.6.self_attn.v_proj
model.layers.6.mlp.down_proj
model.layers.6.mlp.gate_proj
model.layers.6.mlp.up_proj
model.layers.7.self_attn.k_proj
model.layers.7.self_attn.o_proj
model.layers.7.self_attn.q_proj
model.layers.7.self_attn.v_proj
model.layers.7.mlp.down_proj
model.layers.7.mlp.gate_proj
model.layers.7.mlp.up_proj
model.layers.8.self_attn.k_proj
model.layers.8.self_attn.o_proj
model.layers.8.self_attn.q_proj
model.layers.8.self_attn.v_proj
model.layers.8.mlp.down_proj
model.layers.8.mlp.gate_proj
model.layers.8.mlp.up_proj
model.layers.9.self_attn.k_proj
model.layers.9.self_attn.o_proj
model.layers.9.self_attn.q_proj
model.layers.9.self_attn.v_proj
model.layers.9.mlp.down_proj
model.layers.9.mlp.gate_proj
model.layers.9.mlp.up_proj
model.layers.10.self_attn.k_proj
model.layers.10.self_attn.o_proj
model.layers.10.self_attn.q_proj
model.layers.10.self_attn.v_proj
model.layers.10.mlp.down_proj
model.layers.10.mlp.gate_proj
model.layers.10.mlp.up_proj
model.layers.11.self_attn.k_proj
model.layers.11.self_attn.o_proj
model.layers.11.self_attn.q_proj
model.layers.11.self_attn.v_proj
model.layers.11.mlp.down_proj
model.layers.11.mlp.gate_proj
model.layers.11.mlp.up_proj
model.layers.12.self_attn.k_proj
model.layers.12.self_attn.o_proj
model.layers.12.self_attn.q_proj
model.layers.12.self_attn.v_proj
model.layers.12.mlp.down_proj
model.layers.12.mlp.gate_proj
model.layers.12.mlp.up_proj
model.layers.13.self_attn.k_proj
model.layers.13.self_attn.o_proj
model.layers.13.self_attn.q_proj
model.layers.13.self_attn.v_proj
model.layers.13.mlp.down_proj
model.layers.13.mlp.gate_proj
model.layers.13.mlp.up_proj
model.layers.14.self_attn.k_proj
model.layers.14.self_attn.o_proj
model.layers.14.self_attn.q_proj
model.layers.14.self_attn.v_proj
model.layers.14.mlp.down_proj
model.layers.14.mlp.gate_proj
model.layers.14.mlp.up_proj
model.layers.15.self_attn.k_proj
model.layers.15.self_attn.o_proj
model.layers.15.self_attn.q_proj
model.layers.15.self_attn.v_proj
model.layers.15.mlp.down_proj
model.layers.15.mlp.gate_proj
model.layers.15.mlp.up_proj
model.layers.16.self_attn.k_proj
model.layers.16.self_attn.o_proj
model.layers.16.self_attn.q_proj
model.layers.16.self_attn.v_proj
model.layers.16.mlp.down_proj
model.layers.16.mlp.gate_proj
model.layers.16.mlp.up_proj
model.layers.17.self_attn.k_proj
model.layers.17.self_attn.o_proj
model.layers.17.self_attn.q_proj
model.layers.17.self_attn.v_proj
model.layers.17.mlp.down_proj
model.layers.17.mlp.gate_proj
model.layers.17.mlp.up_proj
model.layers.18.self_attn.k_proj
model.layers.18.self_attn.o_proj
model.layers.18.self_attn.q_proj
model.layers.18.self_attn.v_proj
model.layers.18.mlp.down_proj
model.layers.18.mlp.gate_proj
model.layers.18.mlp.up_proj
model.layers.19.self_attn.k_proj
model.layers.19.self_attn.o_proj
model.layers.19.self_attn.q_proj
model.layers.19.self_attn.v_proj
model.layers.19.mlp.down_proj
model.layers.19.mlp.gate_proj
model.layers.19.mlp.up_proj
model.layers.20.self_attn.k_proj
model.layers.20.self_attn.o_proj
model.layers.20.self_attn.q_proj
model.layers.20.self_attn.v_proj
model.layers.20.mlp.down_proj
model.layers.20.mlp.gate_proj
model.layers.20.mlp.up_proj
model.layers.21.self_attn.k_proj
model.layers.21.self_attn.o_proj
model.layers.21.self_attn.q_proj
model.layers.21.self_attn.v_proj
model.layers.21.mlp.down_proj
model.layers.21.mlp.gate_proj
model.layers.21.mlp.up_proj
model.layers.22.self_attn.k_proj
model.layers.22.self_attn.o_proj
model.layers.22.self_attn.q_proj
model.layers.22.self_attn.v_proj
model.layers.22.mlp.down_proj
model.layers.22.mlp.gate_proj
model.layers.22.mlp.up_proj
model.layers.23.self_attn.k_proj
model.layers.23.self_attn.o_proj
model.layers.23.self_attn.q_proj
model.layers.23.self_attn.v_proj
model.layers.23.mlp.down_proj
model.layers.23.mlp.gate_proj
model.layers.23.mlp.up_proj
model.layers.24.self_attn.k_proj
model.layers.24.self_attn.o_proj
model.layers.24.self_attn.q_proj
model.layers.24.self_attn.v_proj
model.layers.24.mlp.down_proj
model.layers.24.mlp.gate_proj
model.layers.24.mlp.up_proj
model.layers.25.self_attn.k_proj
model.layers.25.self_attn.o_proj
model.layers.25.self_attn.q_proj
model.layers.25.self_attn.v_proj
model.layers.25.mlp.down_proj
model.layers.25.mlp.gate_proj
model.layers.25.mlp.up_proj
model.layers.26.self_attn.k_proj
model.layers.26.self_attn.o_proj
model.layers.26.self_attn.q_proj
model.layers.26.self_attn.v_proj
model.layers.26.mlp.down_proj
model.layers.26.mlp.gate_proj
model.layers.26.mlp.up_proj
model.layers.27.self_attn.k_proj
model.layers.27.self_attn.o_proj
model.layers.27.self_attn.q_proj
model.layers.27.self_attn.v_proj
model.layers.27.mlp.down_proj
model.layers.27.mlp.gate_proj
model.layers.27.mlp.up_proj
model.layers.28.self_attn.k_proj
model.layers.28.self_attn.o_proj
model.layers.28.self_attn.q_proj
model.layers.28.self_attn.v_proj
model.layers.28.mlp.down_proj
model.layers.28.mlp.gate_proj
model.layers.28.mlp.up_proj
model.layers.29.self_attn.k_proj
model.layers.29.self_attn.o_proj
model.layers.29.self_attn.q_proj
model.layers.29.self_attn.v_proj
model.layers.29.mlp.down_proj
model.layers.29.mlp.gate_proj
model.layers.29.mlp.up_proj
model.layers.30.self_attn.k_proj
model.layers.30.self_attn.o_proj
model.layers.30.self_attn.q_proj
model.layers.30.self_attn.v_proj
model.layers.30.mlp.down_proj
model.layers.30.mlp.gate_proj
model.layers.30.mlp.up_proj
model.layers.31.self_attn.k_proj
model.layers.31.self_attn.o_proj
model.layers.31.self_attn.q_proj
model.layers.31.self_attn.v_proj
model.layers.31.mlp.down_proj
model.layers.31.mlp.gate_proj
model.layers.31.mlp.up_proj
model.layers.32.self_attn.k_proj
model.layers.32.self_attn.o_proj
model.layers.32.self_attn.q_proj
model.layers.32.self_attn.v_proj
model.layers.32.mlp.down_proj
model.layers.32.mlp.gate_proj
model.layers.32.mlp.up_proj
model.layers.33.self_attn.k_proj
model.layers.33.self_attn.o_proj
model.layers.33.self_attn.q_proj
model.layers.33.self_attn.v_proj
model.layers.33.mlp.down_proj
model.layers.33.mlp.gate_proj
model.layers.33.mlp.up_proj
model.layers.34.self_attn.k_proj
model.layers.34.self_attn.o_proj
model.layers.34.self_attn.q_proj
model.layers.34.self_attn.v_proj
model.layers.34.mlp.down_proj
model.layers.34.mlp.gate_proj
model.layers.34.mlp.up_proj
model.layers.35.self_attn.k_proj
model.layers.35.self_attn.o_proj
model.layers.35.self_attn.q_proj
model.layers.35.self_attn.v_proj
model.layers.35.mlp.down_proj
model.layers.35.mlp.gate_proj
model.layers.35.mlp.up_proj
model.layers.36.self_attn.k_proj
model.layers.36.self_attn.o_proj
model.layers.36.self_attn.q_proj
model.layers.36.self_attn.v_proj
model.layers.36.mlp.down_proj
model.layers.36.mlp.gate_proj
model.layers.36.mlp.up_proj
model.layers.37.self_attn.k_proj
model.layers.37.self_attn.o_proj
model.layers.37.self_attn.q_proj
model.layers.37.self_attn.v_proj
model.layers.37.mlp.down_proj
model.layers.37.mlp.gate_proj
model.layers.37.mlp.up_proj
model.layers.38.self_attn.k_proj
model.layers.38.self_attn.o_proj
model.layers.38.self_attn.q_proj
model.layers.38.self_attn.v_proj
model.layers.38.mlp.down_proj
model.layers.38.mlp.gate_proj
model.layers.38.mlp.up_proj
model.layers.39.self_attn.k_proj
model.layers.39.self_attn.o_proj
model.layers.39.self_attn.q_proj
model.layers.39.self_attn.v_proj
model.layers.39.mlp.down_proj
model.layers.39.mlp.gate_proj
model.layers.39.mlp.up_proj
Done.