MarsupialAI
/

Llama3_GGUF_Quant_Testing

Inference Endpoints

Model card Files Files and versions Community

MarsupialAI commited on Apr 25

Commit

0e51047

•

1 Parent(s): 1f787c8

Update README.md

Files changed (1) hide show

README.md +6 -4

README.md CHANGED Viewed

@@ -10,11 +10,13 @@ So to test this crazy theory, I downloaded Undi95/Meta-Llama-3-8B-Instruct-hf an
 - "Auto" with no outtype specified
 I then quantized each of these conversions to Q4_K_M and ran perplexity tests on everything using my abbreviated wiki.short.raw
-text file
-The results:
 As you can see, converting to fp32 has no meaningful effect on PPL compared to converting to fp16.  There will no doubt be some

 - "Auto" with no outtype specified
 I then quantized each of these conversions to Q4_K_M and ran perplexity tests on everything using my abbreviated wiki.short.raw
+text file.  The results:
+````
+FP16 specified:  size 14.9GB    PPL @ fp16 9.5158 +/- 0.15418    PPL @ Q4km 9.6414 +/- 0.15494
+FP32 specified:  size 29.9GB    PPL @ fp32 9.5158 +/- 0.15418    PPL @ Q4km 9.6278 +/- 0.15466
+None specified:  size 29.9GB    PPL @ ???? 9.5158 +/- 0.15418    PPL @ Q4km 9.6278 +/- 0.15466
+````
 As you can see, converting to fp32 has no meaningful effect on PPL compared to converting to fp16.  There will no doubt be some