DavidAU
/

Psyonic-Cetacean-Ultra-Quality-20b-GGUF-imat-plus2

+---
+license: apache-2.0
+language:
+- en
+tags:
+- creative
+- story
+- writing
+- fiction
+- float32
+- roleplaying
+- rp
+- enhanced
+- space whale
+- 32 bit upscale
+---
+<font color=red><h3> Ultra Quality High Remaster of the incredible: Psyonic-Cetacean-20b - Imatrix Plus 2. </h3></font>
+This is a Floating Point 32 upscale, where all components and merges were remastered to floating point 32.
+This includes all the merges (recreated with master files), and where possible subbing full FP32 models.
+This repo contains the new Imatrix Plus 2 quants using a new inhouse dataset merged with a master dataset
+to push performance of the Ultra Quality remaster even higher.
+<img src="space-whale-thinking.jpg">
+The goal: Carry forward maximum precision right up to the point where it is "GUFFed".
+This includes F32 master file for GGUF too... at a whopping 78 GBs.
+WHY?
+Because the difference between F32 vs BF16 is... over 8 DECIMAL places.
+And as each merge / model is modified there are "losses" along the way.
+These losses are carried forward and in turn lead to more losses.
+And decimal points are critical to model performance.
+SMALL?
+Yes... but multipled by each merge(s), and compression(s): 20 billion times.
+<B>The result:</b>
+At Q2K an impressive drop of 533 points in perplexity. (lower is better)
+(VS: Q2K original base model: PPL = 9.8077 +/- 0.06821 )
+At Q4KM a whopping drop of 976 points in perplexity.
+(VS: Q4km original base model -> PPL = 8.7858 +/- 0.06074)
+At Q6 an awesome drop of 234 points in perplexity.
+(VS: Q6 original base model -> PPL = 8.6070 +/- 0.05907 )
+To put this in perspective "Q6" now operates ABOVE the original full precision version of "Psyonic-Cetacean-20b"
+and Q4KM operates at close to Q6 level quality.
+This because at "Q6" the quant / compressed model is considered to be accurate within "+0.0008 ppl" of the full,
+uncompressed / unquanted model and it exceeds this threshold by over 200 points.
+<I> Imatrix quants take this even further in most cases DOUBLING the "drop" in perplexity realized in the reg quants. </i>
+Q4km-imatrix :
+Final estimate: PPL = 8.6095 +/- 0.05898
+(Non imatrix: Final estimate: PPL = 8.6902 +/- 0.05985 )
+(VS: Q4km base model -> PPL = 8.7858 +/- 0.06074)
+(VS: Q6 BASE model -> Final estimate: PPL = 8.6070 +/- 0.05907 Q6)
+But... what about Q8?
+The mountain moved:
+150 points better: PPL = 8.5850 +/- 0.05881  VS: BASE/ORGINAL: PPL = 8.6012 +/- 0.05900
+<B>THE RESULTS ARE IN: </b>
+AS per Jeb Carter, orginal creator of the model:
+    - instruction following has improved dramatically.
+    - new abilities have emerged.
+    - he had to REDUCE the instructions sets used because the model no longer needed as specific instructions.
+    - prose, nuance and depth have all improved.
+    - known issues with the original model have disappeared.
+This is not "something for nothing" ; it is method of ensuring maximum precision at every step just before "ggufing" the model.
+The methods employed only ensure precision loss is minimized or eliminated.
+It is mathematical and theory sound.
+<B>The bottom line here is this:</b>
+Higher quality instruction following and output.
+Likewise you can use a smaller compression, with higher token per second and still get great quality.
+Same great model... turbo charged.
+This is the first group of remasters.
+<B>The FOUR Horsemen:</B>
+This repo will be followed by a "reg quant plus" repo, which added additional components into the GGUF (all levels) at floating point 32
+precision to further increase the sheer creativity and raw AI horsepower.
+This process shaves at extra 50-100 points off perplexity... again.
+Following this group will be a full float 32 precision Imatrix (including reg quants "imatrixed").
+Test results VS org and "ultra" regular quants will be posted when they come in.
+Imatrix Plus repo (with the same floating 32 enhancement at "reg quant plus") that will push the limit even more.
+Details of all methods (and pitfalls to avoid) employed to make this high precision remasters will be
+posted shortly along with comparisions of original model and new ultra remaster.
+Thanks again to Jeb Carter, the original creator of "Psyonic-Cetacean 20B"
+[ https://huggingface.co/jebcarter/psyonic-cetacean-20B ]