DavidAU commited on
Commit
ac491dd
·
verified ·
1 Parent(s): cbe3629

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +79 -2
README.md CHANGED
@@ -239,6 +239,67 @@ Imatrix quants generally improve all quants, and also allow you to use smaller q
239
 
240
  IE: Instead of using a q4KM, you might be able to run an IQ3_M and get close to Q4KM's quality, but at a higher token per second speed and have more VRAM for context.
241
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
242
  ---
243
 
244
  Quick Reference Table
@@ -671,11 +732,11 @@ a] Affects per token generation:
671
  - top_a
672
  - epsilon_cutoff - see note 4
673
  - eta_cutoff - see note 4
674
- - no_repeat_ngram_size - see note 1.
675
 
676
  b] Affects generation including phrase, sentence, paragraph and entire generation:
677
 
678
- - no_repeat_ngram_size - see note 1.
679
  - encoder_repetition_penalty "Hallucinations filter" - see note #2.
680
  - guidance_scale (with "Negative prompt" ) => this is like a pre-prompt/system role prompt - see note #3.
681
  - Disabling (BOS TOKEN) this can make the replies more creative.
@@ -743,6 +804,14 @@ Dial the "dry_muliplier" up or down to "reign in" or "release the madness" so to
743
 
744
  For Class 4 models this is used to control some of the model's bad habit(s).
745
 
 
 
 
 
 
 
 
 
746
 
747
  <B>QUADRATIC SAMPLING:</B>
748
 
@@ -766,6 +835,10 @@ In Class 3 models, this has the effect of modifying the prose closer to "normal"
766
 
767
  In Class 4 models, this has the effect of modifying the prose closer to "normal" with as much or little (or a lot!) touch of "madness" from the root model AND wrangling in some of the core model's bad habits.
768
 
 
 
 
 
769
  <B>ANTI-SLOP - Kolbaldcpp only</B>
770
 
771
  Hopefully this powerful sampler will soon appear in all LLM/AI apps.
@@ -776,6 +849,10 @@ This sampler allows banning words and phrases DURING generation, forcing the mod
776
 
777
  This is a game changer in custom real time control of the model.
778
 
 
 
 
 
779
 
780
  FINAL NOTES:
781
 
 
239
 
240
  IE: Instead of using a q4KM, you might be able to run an IQ3_M and get close to Q4KM's quality, but at a higher token per second speed and have more VRAM for context.
241
 
242
+ <B>Recommended Quants:</B>
243
+
244
+ This covers both Imatrix and regular quants.
245
+
246
+ Imatrix can be applied to any quant - "Q" or "IQ" - however, IQ1s to IQ3_S REQUIRE an imatrix dataset / imatrixing process before quanting.
247
+
248
+ This chart shows the order in terms of "BPW" for each quant (mapped below with relative "strength" to one another) with "IQ1_S" with the least, and "Q8_0" with the most:
249
+
250
+ <small>
251
+ <PRE>
252
+ IQ1_S | IQ1_M
253
+ IQ2_XXS | IQ2_XS | Q2_K_S | IQ2_S | Q2_K | IQ2_M
254
+ IQ3_XXS | Q3_K_S | IQ3_XS | IQ3_S | IQ3_M | Q3_K_M | Q3_K_L
255
+ Q4_K_S | IQ4_XS | IQ4_NL | Q4_K_M
256
+ Q5_K_S | Q5_K_M
257
+ Q6_K
258
+ Q8_0
259
+ F16
260
+ </pre>
261
+ </small>
262
+
263
+ More BPW mean better quality, but higher VRAM requirements (and larger file size) and lower tokens per second.
264
+ The larger the model in terms of parameters the lower the size of quant you can run with less quality losses.
265
+ Note that "quality losses" refers to both instruction following and output quality.
266
+
267
+ Differences (quality) between quants at lower levels are larger relative to higher quants differences.
268
+
269
+ The Imatrix process has NO effect on Q8 or F16 quants.
270
+
271
+ F16 is full precision, just in GGUF format.
272
+
273
+ <B>NEO Imatrix Quants / Neo Imatrix X Quants</B>
274
+
275
+ NEO Imatrix quants are specialized and specifically "themed" datasets used to slightly alter the weights in a model. All Imatrix datasets do this to some degree or another, however NEO Imatrix datasets
276
+ are content / theme specific and have been calibrated to have maximum effect on a model (relative to standard Imatrix datasets). Calibration was made possible after testing 50+ standard Imatrix datasets,
277
+ and carefully modifying them and testing the resulting changes to determine the exact format and content which has the maximum effect on a model via the Imatrix process.
278
+
279
+ Please keep in mind that the Imatrix process (at it strongest) only "tints" a model and/or slightly changes its bias(es).
280
+
281
+ Here are some Imatrix Neo Models:
282
+
283
+ [ https://huggingface.co/DavidAU/Command-R-01-Ultra-NEO-DARK-HORROR-V1-V2-35B-IMATRIX-GGUF ]
284
+
285
+ [ https://huggingface.co/DavidAU/Command-R-01-200xq-Ultra-NEO-V1-35B-IMATRIX-GGUF ]
286
+
287
+ [ https://huggingface.co/DavidAU/Command-R-01-200xq-Ultra-NEO-V1-35B-IMATRIX-GGUF ] (this is an X-Quant)
288
+
289
+ [ https://huggingface.co/DavidAU/Llama-3.2-1B-Instruct-NEO-SI-FI-GGUF ]
290
+
291
+ [ https://huggingface.co/DavidAU/Llama-3.2-1B-Instruct-NEO-WEE-HORROR-GGUF ]
292
+
293
+ [ https://huggingface.co/DavidAU/L3-8B-Stheno-v3.2-Ultra-NEO-V1-IMATRIX-GGUF ]
294
+
295
+ Suggestions for Imatrix NEO quants:
296
+
297
+ - The LOWER the quant the STRONGER the Imatrix effect is, and therefore the stronger the horror "tint" so to speak
298
+ - Due to the unique nature of this project, quants IQ1s to IQ4s are recommended for maximum horror effect with IQ4_XS the most balanced in terms of power and bits.
299
+ - Secondaries are Q2s-Q4s. Imatrix effect is still strong in these quants.
300
+ - Effects diminish quickly from Q5s and up.
301
+ - Q8 there is no change (as the Imatrix process does not affect this quant), and therefore was not uploaded.
302
+
303
  ---
304
 
305
  Quick Reference Table
 
732
  - top_a
733
  - epsilon_cutoff - see note 4
734
  - eta_cutoff - see note 4
735
+ - no_repeat_ngram_size - see note #1.
736
 
737
  b] Affects generation including phrase, sentence, paragraph and entire generation:
738
 
739
+ - no_repeat_ngram_size - see note #1.
740
  - encoder_repetition_penalty "Hallucinations filter" - see note #2.
741
  - guidance_scale (with "Negative prompt" ) => this is like a pre-prompt/system role prompt - see note #3.
742
  - Disabling (BOS TOKEN) this can make the replies more creative.
 
804
 
805
  For Class 4 models this is used to control some of the model's bad habit(s).
806
 
807
+ For more information on "DRY":
808
+
809
+ https://github.com/oobabooga/text-generation-webui/pull/5677
810
+
811
+ https://www.reddit.com/r/KoboldAI/comments/1e49vpt/dry_sampler_questionsthat_im_sure_most_of_us_are/
812
+
813
+ https://www.reddit.com/r/KoboldAI/comments/1eo4r6q/dry_settings_questions/
814
+
815
 
816
  <B>QUADRATIC SAMPLING:</B>
817
 
 
835
 
836
  In Class 4 models, this has the effect of modifying the prose closer to "normal" with as much or little (or a lot!) touch of "madness" from the root model AND wrangling in some of the core model's bad habits.
837
 
838
+ For more information on Quadratic Samplings:
839
+
840
+ https://gist.github.com/kalomaze/4473f3f975ff5e5fade06e632498f73e
841
+
842
  <B>ANTI-SLOP - Kolbaldcpp only</B>
843
 
844
  Hopefully this powerful sampler will soon appear in all LLM/AI apps.
 
849
 
850
  This is a game changer in custom real time control of the model.
851
 
852
+ For more information on ANTI SLOP project (owner runs EQBench):
853
+
854
+ https://github.com/sam-paech/antislop-sampler
855
+
856
 
857
  FINAL NOTES:
858