MarsupialAI
commited on
Commit
•
b2f6f4c
1
Parent(s):
dcb1c07
Update README.md
Browse files
README.md
CHANGED
@@ -50,12 +50,24 @@ I ran PPL against all 8 quants, as well as the full fp16 and fp32 GGUFs. All iM
|
|
50 |
groups_merged.txt. All PPL calcs were run using wiki.short.raw. Results:
|
51 |
|
52 |
````
|
53 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
54 |
````
|
55 |
|
56 |
Conclusion:
|
57 |
|
58 |
-
|
59 |
-
|
60 |
-
|
61 |
-
|
|
|
|
|
|
50 |
groups_merged.txt. All PPL calcs were run using wiki.short.raw. Results:
|
51 |
|
52 |
````
|
53 |
+
GGUF PPL
|
54 |
+
FP16 11.5923
|
55 |
+
FP32 11.5923
|
56 |
+
Q4km FP16 + FP16 imat 11.9326
|
57 |
+
Q4km FP32 + FP32 imat 11.9314
|
58 |
+
Q4km FP16 + Q8 imat 11.9369
|
59 |
+
Q4km FP32 + Q8 imat 11.9500
|
60 |
+
Q4km FP16 + Q4 imat 11.9355
|
61 |
+
Q4km FP32 + Q4 imat 11.9356
|
62 |
+
Q4km FP16 no imat 12.3612
|
63 |
+
Q4km FP32 no imat 12.3643
|
64 |
````
|
65 |
|
66 |
Conclusion:
|
67 |
|
68 |
+
Importance of quant size used to generate imatrix is borderline non-existant. Sort of. While the Q4km quant made with
|
69 |
+
the fp32 GGUF and the fp32-generated imatrix was best, it was by such a miniscule margin that it is implausible that any
|
70 |
+
difference between that (11.9314) and the Q4km made from the fp16 GGUF with the Q4_0-generaged imatrix (11.9355) could be detected
|
71 |
+
under normal usage. The only counterintuitive result here is that the Q4_0-imat quants outperformed the Q8_0-imat quants. I cannot
|
72 |
+
think of a reason why this should be the case. But as it seemingly *is* the case, I will be using Q4_0 as my intermediate step for
|
73 |
+
generating imatrices in the future when the full fp16 model is too big for my measly 72GB of VRAM.
|