MarsupialAI commited on
Commit
b2f6f4c
1 Parent(s): dcb1c07

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +17 -5
README.md CHANGED
@@ -50,12 +50,24 @@ I ran PPL against all 8 quants, as well as the full fp16 and fp32 GGUFs. All iM
50
  groups_merged.txt. All PPL calcs were run using wiki.short.raw. Results:
51
 
52
  ````
53
- <in progress>
 
 
 
 
 
 
 
 
 
 
54
  ````
55
 
56
  Conclusion:
57
 
58
- ````
59
- <in progress>
60
- ````
61
-
 
 
 
50
  groups_merged.txt. All PPL calcs were run using wiki.short.raw. Results:
51
 
52
  ````
53
+ GGUF PPL
54
+ FP16 11.5923
55
+ FP32 11.5923
56
+ Q4km FP16 + FP16 imat 11.9326
57
+ Q4km FP32 + FP32 imat 11.9314
58
+ Q4km FP16 + Q8 imat 11.9369
59
+ Q4km FP32 + Q8 imat 11.9500
60
+ Q4km FP16 + Q4 imat 11.9355
61
+ Q4km FP32 + Q4 imat 11.9356
62
+ Q4km FP16 no imat 12.3612
63
+ Q4km FP32 no imat 12.3643
64
  ````
65
 
66
  Conclusion:
67
 
68
+ Importance of quant size used to generate imatrix is borderline non-existant. Sort of. While the Q4km quant made with
69
+ the fp32 GGUF and the fp32-generated imatrix was best, it was by such a miniscule margin that it is implausible that any
70
+ difference between that (11.9314) and the Q4km made from the fp16 GGUF with the Q4_0-generaged imatrix (11.9355) could be detected
71
+ under normal usage. The only counterintuitive result here is that the Q4_0-imat quants outperformed the Q8_0-imat quants. I cannot
72
+ think of a reason why this should be the case. But as it seemingly *is* the case, I will be using Q4_0 as my intermediate step for
73
+ generating imatrices in the future when the full fp16 model is too big for my measly 72GB of VRAM.