DavidAU
/

Maximizing-Model-Performance-All-Quants-Types-And-Full-Precision-by-Samplers_Parameters

@@ -239,6 +239,67 @@ Imatrix quants generally improve all quants, and also allow you to use smaller q
 IE: Instead of using a q4KM, you might be able to run an IQ3_M and get close to Q4KM's quality, but at a higher token per second speed and have more VRAM for context.
 ---
 Quick Reference Table
@@ -671,11 +732,11 @@ a] Affects per token generation:
 - top_a
 - epsilon_cutoff - see note 4
 - eta_cutoff - see note 4
-- no_repeat_ngram_size - see note 1.
 b] Affects generation including phrase, sentence, paragraph and entire generation:
-- no_repeat_ngram_size - see note 1.
 - encoder_repetition_penalty "Hallucinations filter" - see note #2.
 - guidance_scale (with "Negative prompt" ) => this is like a pre-prompt/system role prompt - see note #3.
 - Disabling (BOS TOKEN) this can make the replies more creative.
@@ -743,6 +804,14 @@ Dial the "dry_muliplier" up or down to "reign in" or "release the madness" so to
 For Class 4 models this is used to control some of the model's bad habit(s).
 <B>QUADRATIC SAMPLING:</B>
@@ -766,6 +835,10 @@ In Class 3 models, this has the effect of modifying the prose closer to "normal"
 In Class 4 models, this has the effect of modifying the prose closer to "normal" with as much or little (or a lot!) touch of "madness" from the root model AND wrangling in some of the core model's bad habits.
 <B>ANTI-SLOP - Kolbaldcpp only</B>
 Hopefully this powerful sampler will soon appear in all LLM/AI apps.
@@ -776,6 +849,10 @@ This sampler allows banning words and phrases DURING generation, forcing the mod
 This is a game changer in custom real time control of the model.
 FINAL NOTES:

 IE: Instead of using a q4KM, you might be able to run an IQ3_M and get close to Q4KM's quality, but at a higher token per second speed and have more VRAM for context.
+<B>Recommended Quants:</B>
+This covers both Imatrix and regular quants.
+Imatrix can be applied to any quant - "Q" or "IQ" - however, IQ1s to IQ3_S REQUIRE an imatrix dataset / imatrixing process before quanting.
+This chart shows the order in terms of "BPW" for each quant (mapped below with relative "strength" to one another) with "IQ1_S" with the least, and "Q8_0" with the most:
+<small>
+<PRE>
+IQ1_S 	| IQ1_M
+IQ2_XXS | IQ2_XS | Q2_K_S 	| IQ2_S 	| Q2_K  	| IQ2_M
+IQ3_XXS | Q3_K_S | IQ3_XS  	| IQ3_S 	| IQ3_M	    | Q3_K_M	| Q3_K_L
+Q4_K_S	| IQ4_XS | IQ4_NL  	| Q4_K_M
+Q5_K_S	| Q5_K_M
+Q6_K
+Q8_0
+F16
+</pre>
+</small>
+More BPW mean better quality, but higher VRAM requirements (and larger file size) and lower tokens per second.
+The larger the model in terms of parameters the lower the size of quant you can run with less quality losses.
+Note that "quality losses" refers to both instruction following and output quality.
+Differences (quality) between quants at lower levels are larger relative to higher quants differences.
+The Imatrix process has NO effect on Q8 or F16 quants.
+F16 is full precision, just in GGUF format.
+<B>NEO Imatrix Quants / Neo Imatrix X Quants</B>
+NEO Imatrix quants are specialized and specifically "themed" datasets used to slightly alter the weights in a model. All Imatrix datasets do this to some degree or another, however NEO Imatrix datasets
+are content / theme specific and have been calibrated to have maximum effect on a model (relative to standard Imatrix datasets). Calibration was made possible after testing 50+ standard Imatrix datasets,
+and carefully modifying them and testing the resulting changes to determine the exact format and content which has the maximum effect on a model via the Imatrix process.
+Please keep in mind that the Imatrix process (at it strongest) only "tints" a model and/or slightly changes its bias(es).
+Here are some Imatrix Neo Models:
+[ https://huggingface.co/DavidAU/Command-R-01-Ultra-NEO-DARK-HORROR-V1-V2-35B-IMATRIX-GGUF ]
+[ https://huggingface.co/DavidAU/Command-R-01-200xq-Ultra-NEO-V1-35B-IMATRIX-GGUF ]
+[ https://huggingface.co/DavidAU/Command-R-01-200xq-Ultra-NEO-V1-35B-IMATRIX-GGUF ] (this is an X-Quant)
+[ https://huggingface.co/DavidAU/Llama-3.2-1B-Instruct-NEO-SI-FI-GGUF ]
+[ https://huggingface.co/DavidAU/Llama-3.2-1B-Instruct-NEO-WEE-HORROR-GGUF ]
+[ https://huggingface.co/DavidAU/L3-8B-Stheno-v3.2-Ultra-NEO-V1-IMATRIX-GGUF ]
+Suggestions for Imatrix NEO quants:
+- The LOWER the quant the STRONGER the Imatrix effect is, and therefore the stronger the horror "tint" so to speak
+- Due to the unique nature of this project, quants IQ1s to IQ4s are recommended for maximum horror effect with IQ4_XS the most balanced in terms of power and bits.
+- Secondaries are Q2s-Q4s. Imatrix effect is still strong in these quants.
+- Effects diminish quickly from Q5s and up.
+- Q8 there is no change (as the Imatrix process does not affect this quant), and therefore was not uploaded.
 ---
 Quick Reference Table
 - top_a
 - epsilon_cutoff - see note 4
 - eta_cutoff - see note 4
+- no_repeat_ngram_size - see note #1.
 b] Affects generation including phrase, sentence, paragraph and entire generation:
+- no_repeat_ngram_size - see note #1.
 - encoder_repetition_penalty "Hallucinations filter" - see note #2.
 - guidance_scale (with "Negative prompt" ) => this is like a pre-prompt/system role prompt - see note #3.
 - Disabling (BOS TOKEN) this can make the replies more creative.
 For Class 4 models this is used to control some of the model's bad habit(s).
+For more information on "DRY":
+https://github.com/oobabooga/text-generation-webui/pull/5677
+https://www.reddit.com/r/KoboldAI/comments/1e49vpt/dry_sampler_questionsthat_im_sure_most_of_us_are/
+https://www.reddit.com/r/KoboldAI/comments/1eo4r6q/dry_settings_questions/
 <B>QUADRATIC SAMPLING:</B>
 In Class 4 models, this has the effect of modifying the prose closer to "normal" with as much or little (or a lot!) touch of "madness" from the root model AND wrangling in some of the core model's bad habits.
+For more information on Quadratic Samplings:
+https://gist.github.com/kalomaze/4473f3f975ff5e5fade06e632498f73e
 <B>ANTI-SLOP - Kolbaldcpp only</B>
 Hopefully this powerful sampler will soon appear in all LLM/AI apps.
 This is a game changer in custom real time control of the model.
+For more information on ANTI SLOP project (owner runs EQBench):
+https://github.com/sam-paech/antislop-sampler
 FINAL NOTES: