Notes only for now, rework needs to be done.
Q2_K_S
Master :
PR : 7.76 GiB (2.76 BPW) PPL = 8.1574 +/- 0.05498
Q2_K
Master : 8.23 GiB (2.93 BPW) PPL = 7.7977 +/- 0.05177
PR : 8.63 GiB (3.07 BPW) PPL = 7.5978 +/- 0.04951
PR 2 : 9.21 GB (3.05 BPW) 8.57 GiB PPL over 642 chunks for n_ctx=512 = 7.6073 +/- 0.04946
Q2_K_L
PR :
Q3_K_S
Master :
PR :
Q3_K_M
Master :
PR :
Q3_K_L
Master :
PR :
Q3_K_XL
Master :
PR :
IQ1_XS
PR : 5.20 GiB (1.85 BPW) PPL = 12.4393 +/- 0.08114
PR 2 : 5.47 GB (1.81 BPW) 5.10 GiB (1.81 BPW) PPL over 642 chunks for n_ctx=512 = 12.6437 +/- 0.08284
IQ1_S
Master :
PR : 4.67 GiB (1.66 BPW) PPL = 15.9241 +/- 0.10775
IQ1_M
Master :
PR :
IQ1_XL
PR :
IQ2_XXS
Master :
PR :
IQ2_XS
Master :
PR :
IQ2_S
Master :
PR :
IQ2_M
Master : 7.45 GiB (2.65 BPW) PPL = 7.9597 +/- 0.05146
PR : 7.96 GiB (2.83 BPW) PPL = 7.6998 +/- 0.04995
PR 2 : 8.55 GB (2.83 BPW) 7.96 GiB (2.83 BPW) PPL over 642 chunks for n_ctx=512 = 7.7063 +/- 0.05010
IQ2_XL
PR :
IQ3_XXS
Master :
PR :
IQ3_XS
Master :
PR :
IQ3_S
Master :
PR :
IQ3_M
Master :
PR :
IQ3_XL
PR :
IQ3_XXL
PR :
IQ4_XS
Master :
IQ4_XSR
PR :
FP16
Master : PPL over 655 chunks for n_ctx=512 = 5.7977 +/- 0.03236