parameters guide
samplers guide
model generation
role play settings
quant selection
arm quants
iq quants vs q quants
optimal model setting
gibberish fixes
coherence
instructing following
quality generation
chat settings
quality settings
llamacpp server
llamacpp
lmstudio
sillytavern
koboldcpp
backyard
ollama
model generation steering
steering
model generation fixes
text generation webui
ggufs
exl2
full precision
quants
imatrix
neo imatrix
Update README.md
Browse files
README.md
CHANGED
@@ -239,6 +239,67 @@ Imatrix quants generally improve all quants, and also allow you to use smaller q
|
|
239 |
|
240 |
IE: Instead of using a q4KM, you might be able to run an IQ3_M and get close to Q4KM's quality, but at a higher token per second speed and have more VRAM for context.
|
241 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
242 |
---
|
243 |
|
244 |
Quick Reference Table
|
@@ -671,11 +732,11 @@ a] Affects per token generation:
|
|
671 |
- top_a
|
672 |
- epsilon_cutoff - see note 4
|
673 |
- eta_cutoff - see note 4
|
674 |
-
- no_repeat_ngram_size - see note 1.
|
675 |
|
676 |
b] Affects generation including phrase, sentence, paragraph and entire generation:
|
677 |
|
678 |
-
- no_repeat_ngram_size - see note 1.
|
679 |
- encoder_repetition_penalty "Hallucinations filter" - see note #2.
|
680 |
- guidance_scale (with "Negative prompt" ) => this is like a pre-prompt/system role prompt - see note #3.
|
681 |
- Disabling (BOS TOKEN) this can make the replies more creative.
|
@@ -743,6 +804,14 @@ Dial the "dry_muliplier" up or down to "reign in" or "release the madness" so to
|
|
743 |
|
744 |
For Class 4 models this is used to control some of the model's bad habit(s).
|
745 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
746 |
|
747 |
<B>QUADRATIC SAMPLING:</B>
|
748 |
|
@@ -766,6 +835,10 @@ In Class 3 models, this has the effect of modifying the prose closer to "normal"
|
|
766 |
|
767 |
In Class 4 models, this has the effect of modifying the prose closer to "normal" with as much or little (or a lot!) touch of "madness" from the root model AND wrangling in some of the core model's bad habits.
|
768 |
|
|
|
|
|
|
|
|
|
769 |
<B>ANTI-SLOP - Kolbaldcpp only</B>
|
770 |
|
771 |
Hopefully this powerful sampler will soon appear in all LLM/AI apps.
|
@@ -776,6 +849,10 @@ This sampler allows banning words and phrases DURING generation, forcing the mod
|
|
776 |
|
777 |
This is a game changer in custom real time control of the model.
|
778 |
|
|
|
|
|
|
|
|
|
779 |
|
780 |
FINAL NOTES:
|
781 |
|
|
|
239 |
|
240 |
IE: Instead of using a q4KM, you might be able to run an IQ3_M and get close to Q4KM's quality, but at a higher token per second speed and have more VRAM for context.
|
241 |
|
242 |
+
<B>Recommended Quants:</B>
|
243 |
+
|
244 |
+
This covers both Imatrix and regular quants.
|
245 |
+
|
246 |
+
Imatrix can be applied to any quant - "Q" or "IQ" - however, IQ1s to IQ3_S REQUIRE an imatrix dataset / imatrixing process before quanting.
|
247 |
+
|
248 |
+
This chart shows the order in terms of "BPW" for each quant (mapped below with relative "strength" to one another) with "IQ1_S" with the least, and "Q8_0" with the most:
|
249 |
+
|
250 |
+
<small>
|
251 |
+
<PRE>
|
252 |
+
IQ1_S | IQ1_M
|
253 |
+
IQ2_XXS | IQ2_XS | Q2_K_S | IQ2_S | Q2_K | IQ2_M
|
254 |
+
IQ3_XXS | Q3_K_S | IQ3_XS | IQ3_S | IQ3_M | Q3_K_M | Q3_K_L
|
255 |
+
Q4_K_S | IQ4_XS | IQ4_NL | Q4_K_M
|
256 |
+
Q5_K_S | Q5_K_M
|
257 |
+
Q6_K
|
258 |
+
Q8_0
|
259 |
+
F16
|
260 |
+
</pre>
|
261 |
+
</small>
|
262 |
+
|
263 |
+
More BPW mean better quality, but higher VRAM requirements (and larger file size) and lower tokens per second.
|
264 |
+
The larger the model in terms of parameters the lower the size of quant you can run with less quality losses.
|
265 |
+
Note that "quality losses" refers to both instruction following and output quality.
|
266 |
+
|
267 |
+
Differences (quality) between quants at lower levels are larger relative to higher quants differences.
|
268 |
+
|
269 |
+
The Imatrix process has NO effect on Q8 or F16 quants.
|
270 |
+
|
271 |
+
F16 is full precision, just in GGUF format.
|
272 |
+
|
273 |
+
<B>NEO Imatrix Quants / Neo Imatrix X Quants</B>
|
274 |
+
|
275 |
+
NEO Imatrix quants are specialized and specifically "themed" datasets used to slightly alter the weights in a model. All Imatrix datasets do this to some degree or another, however NEO Imatrix datasets
|
276 |
+
are content / theme specific and have been calibrated to have maximum effect on a model (relative to standard Imatrix datasets). Calibration was made possible after testing 50+ standard Imatrix datasets,
|
277 |
+
and carefully modifying them and testing the resulting changes to determine the exact format and content which has the maximum effect on a model via the Imatrix process.
|
278 |
+
|
279 |
+
Please keep in mind that the Imatrix process (at it strongest) only "tints" a model and/or slightly changes its bias(es).
|
280 |
+
|
281 |
+
Here are some Imatrix Neo Models:
|
282 |
+
|
283 |
+
[ https://huggingface.co/DavidAU/Command-R-01-Ultra-NEO-DARK-HORROR-V1-V2-35B-IMATRIX-GGUF ]
|
284 |
+
|
285 |
+
[ https://huggingface.co/DavidAU/Command-R-01-200xq-Ultra-NEO-V1-35B-IMATRIX-GGUF ]
|
286 |
+
|
287 |
+
[ https://huggingface.co/DavidAU/Command-R-01-200xq-Ultra-NEO-V1-35B-IMATRIX-GGUF ] (this is an X-Quant)
|
288 |
+
|
289 |
+
[ https://huggingface.co/DavidAU/Llama-3.2-1B-Instruct-NEO-SI-FI-GGUF ]
|
290 |
+
|
291 |
+
[ https://huggingface.co/DavidAU/Llama-3.2-1B-Instruct-NEO-WEE-HORROR-GGUF ]
|
292 |
+
|
293 |
+
[ https://huggingface.co/DavidAU/L3-8B-Stheno-v3.2-Ultra-NEO-V1-IMATRIX-GGUF ]
|
294 |
+
|
295 |
+
Suggestions for Imatrix NEO quants:
|
296 |
+
|
297 |
+
- The LOWER the quant the STRONGER the Imatrix effect is, and therefore the stronger the horror "tint" so to speak
|
298 |
+
- Due to the unique nature of this project, quants IQ1s to IQ4s are recommended for maximum horror effect with IQ4_XS the most balanced in terms of power and bits.
|
299 |
+
- Secondaries are Q2s-Q4s. Imatrix effect is still strong in these quants.
|
300 |
+
- Effects diminish quickly from Q5s and up.
|
301 |
+
- Q8 there is no change (as the Imatrix process does not affect this quant), and therefore was not uploaded.
|
302 |
+
|
303 |
---
|
304 |
|
305 |
Quick Reference Table
|
|
|
732 |
- top_a
|
733 |
- epsilon_cutoff - see note 4
|
734 |
- eta_cutoff - see note 4
|
735 |
+
- no_repeat_ngram_size - see note #1.
|
736 |
|
737 |
b] Affects generation including phrase, sentence, paragraph and entire generation:
|
738 |
|
739 |
+
- no_repeat_ngram_size - see note #1.
|
740 |
- encoder_repetition_penalty "Hallucinations filter" - see note #2.
|
741 |
- guidance_scale (with "Negative prompt" ) => this is like a pre-prompt/system role prompt - see note #3.
|
742 |
- Disabling (BOS TOKEN) this can make the replies more creative.
|
|
|
804 |
|
805 |
For Class 4 models this is used to control some of the model's bad habit(s).
|
806 |
|
807 |
+
For more information on "DRY":
|
808 |
+
|
809 |
+
https://github.com/oobabooga/text-generation-webui/pull/5677
|
810 |
+
|
811 |
+
https://www.reddit.com/r/KoboldAI/comments/1e49vpt/dry_sampler_questionsthat_im_sure_most_of_us_are/
|
812 |
+
|
813 |
+
https://www.reddit.com/r/KoboldAI/comments/1eo4r6q/dry_settings_questions/
|
814 |
+
|
815 |
|
816 |
<B>QUADRATIC SAMPLING:</B>
|
817 |
|
|
|
835 |
|
836 |
In Class 4 models, this has the effect of modifying the prose closer to "normal" with as much or little (or a lot!) touch of "madness" from the root model AND wrangling in some of the core model's bad habits.
|
837 |
|
838 |
+
For more information on Quadratic Samplings:
|
839 |
+
|
840 |
+
https://gist.github.com/kalomaze/4473f3f975ff5e5fade06e632498f73e
|
841 |
+
|
842 |
<B>ANTI-SLOP - Kolbaldcpp only</B>
|
843 |
|
844 |
Hopefully this powerful sampler will soon appear in all LLM/AI apps.
|
|
|
849 |
|
850 |
This is a game changer in custom real time control of the model.
|
851 |
|
852 |
+
For more information on ANTI SLOP project (owner runs EQBench):
|
853 |
+
|
854 |
+
https://github.com/sam-paech/antislop-sampler
|
855 |
+
|
856 |
|
857 |
FINAL NOTES:
|
858 |
|