MarsupialAI
/

Llama3_GGUF_Quant_Testing

Model card Files Files and versions Community

MarsupialAI commited on May 11

Commit

b2f6f4c

•

1 Parent(s): dcb1c07

Update README.md

Files changed (1) hide show

README.md +17 -5

README.md CHANGED Viewed

@@ -50,12 +50,24 @@ I ran PPL against all 8 quants, as well as the full fp16 and fp32 GGUFs.  All iM
 groups_merged.txt.  All PPL calcs were run using wiki.short.raw.  Results:
 ````
-<in progress>
 ````
 Conclusion:
-````
-<in progress>
-````

 groups_merged.txt.  All PPL calcs were run using wiki.short.raw.  Results:
 ````
+GGUF                     PPL
+FP16                     11.5923
+FP32                     11.5923
+Q4km FP16 + FP16 imat    11.9326
+Q4km FP32 + FP32 imat    11.9314
+Q4km FP16 + Q8 imat      11.9369
+Q4km FP32 + Q8 imat      11.9500
+Q4km FP16 + Q4 imat      11.9355
+Q4km FP32 + Q4 imat      11.9356
+Q4km FP16 no imat        12.3612
+Q4km FP32 no imat        12.3643
 ````
 Conclusion:
+Importance of quant size used to generate imatrix is borderline non-existant.  Sort of.  While the Q4km quant made with
+the fp32 GGUF and the fp32-generated imatrix was best, it was by such a miniscule margin that it is implausible that any
+difference between that (11.9314) and the Q4km made from the fp16 GGUF with the Q4_0-generaged imatrix (11.9355) could be detected
+under normal usage.  The only counterintuitive result here is that the Q4_0-imat quants outperformed the Q8_0-imat quants.  I cannot
+think of a reason why this should be the case.  But as it seemingly *is* the case, I will be using Q4_0 as my intermediate step for
+generating imatrices in the future when the full fp16 model is too big for my measly 72GB of VRAM.