AI Autocorrect / Auto Creative Enhancement / Auto Low Quant Optimization - gguf - exl2 - hqq - SOFTWARE By DavidAU
This engine/system is for ALL GGUF, EXL2, HQQ and other model quants/compressions and full source/full precision models too and can be used with any AI/LLM app that has API such as Llama-Server, LMStudio, Koboldcpp, Text Generation Webui etc etc.
I put 3 decades of programming experience, 100s of model builds and 1000s of model tests into creating an AI / programming hybrid.
This fully automated engine is designed to actively alter / fine tune the generation stream in real time of any AI/LLM which includes active "auto-correction" (which includes an optional temp/top_k "scrambler") and passive / active "auto-reconsider" (an optional system to augment real time generation, which includes an optional temp/top_k "scrambler").
The engine actively corrects, in real time during streaming generation (sampling at 50 times per second) the following issues:
- letter, word(s), sentence(s), and paragraph(s) repeats.
- embedded letter, word, sentence, and paragraph repeats.
- model goes on a rant
- incoherence
- a model working perfectly then spouting "gibberish".
- token errors such as Chinese symbols appearing in English generation.
- low quant (IQ1s, IQ2s, q2k) errors such as repetition, variety and breakdowns in generation.
- passive improvement in real time generation using paragraph and/or sentence "reconsider" systems.
- ACTIVE improvement in real time generation using paragraph and/or sentence "reconsider" systems with AUX system(s) active.
The system detects the issue(s), correct(s) them and continues generation WITHOUT USER INTERVENTION.
The original intent of this "Auto Correct" engine was to allow users of my class 2, 3, and 4 models to use these models under all conditions, without restrictions and allow all of the "power" of these models to be available without any of the negatives such as special settings, parameters and/or samplers to "reign" in the bad behaviour of these models.
Class 2, 3 and 4 models have incredible creative "chops" but the downside is they can be very difficult to use - AutoCorrect engine is the automated, hands off answer to this issue.
In otherwords the following models (and all their brothers and sisters) will now operate perfectly, under all conditions:
https://huggingface.co/DavidAU/L3-DARKEST-PLANET-16.5B-GGUF
https://huggingface.co/DavidAU/MN-DARKEST-UNIVERSE-29B-GGUF
https://huggingface.co/DavidAU/L3-Stheno-Maid-Blackroot-Grand-HORROR-16B-GGUF
And many more...
And... it also allows the release of stronger models at class 5 and above - yet allow users (including myself) - to be able to use these models under all conditions WITHOUT the need to manually stop, fix, or steer generation NOR set parameters and/or samplers to "reign" in the bad behaviour of these models.
Yep - have your cake and eat it too.
And all models - my repo and others - will all operate better.
Because many of the issues with "creative" / "unstable" models are the same as using low and ultra low quants (Q2k, IQ1s,IQ2s) this engine also fixes on the fly many of these issues too, resulting in the ability to use models at these ultra low quant levels with enhanced results.
And in some cases, allowing use of these low bit quants in normal operation.
Example: 70B model, IQ1S
-> System will autocorrect generation, especially "repetitive" issues that can occur at this low quant level.
-> Reconsider system (if on) will further enhance generation, with even stronger enhancement if the temp/top_k scrambler for "Reconsider" is on too.
Low / Ultra low quants can also benefit from the "Reconsider" system, especially with "aux" system "scramble temp/top_k" ON which helps correct issues like low creativity / low variety at these quant levels and can also help repetitive issues too.
Keep in mind these systems will also enhance generation of any quant, and of course full precision models too.
This engine/process has the net effect of live fine tuning the model's generation and creates a "back and forth" between the engine and the AI/LLM - a live, two way street - creating a partnership in generation.
Unlike parameters and samplers however (to limit/control model behaviour), this is a just in time/as needed autocorrect engine only responds and takes action as required only.
This allows the model to operate at full power, WITHOUT sampler/parameter restrictions which can DULL/REDUCE model performance in many cases.
Likewise this allows you to use parameters/samplers at FAR HIGHER settings - ie really push the model hard - and still have COHERENT generation.
AUTO CREATIVE ENHANCEMENT - RECONSIDER SYSTEM:
In addition this engine has a passive "RECONSIDER" system that operates per paragraph, and/or per sentence too, which further augments generation.
This has passive, and active settings.
(this/these can be turned on/off)
And this "reconsider" system has an aux system too:
Each time the model "reconsiders" (per paragraph and/or sentence), the systems can change temp / top_k parameters per paragraph and/or sentence automatically This drastically affects generation, and strongly breaks predictiveness of the model.
This can also help low/ultra low quants generation and/or low parameter count models too which can suffer from predictive output / repeat generational issues.
These will also enhance any quant's generation too, including full precision.
It has five settings (1-5) besides on/off:
// 1 = MILD [default]
// range .05 to .95 Temp ; k => 40 to 70.
// 2 = MED
// range .5 to 1.8 Temp ; k => 30 to 90.
// 3 = HIGH
// range .8 to 3.25 Temp ; k => 20 to 150.
// 4 = SPICY
// range 1 to 5 Temp ; k => 5 to 400.
// 5 = YOUR defaults MAX
// Uses your temp / top k settings but randomizes them to whatever the max temp / top_k you have set.
// IE if you set temp of 1.5, and top_k of 100 -> System will pick values from .01 to 1.5 for temp, and 5 to 105 for top K.
(this aux system is OFF by default)
AUTO CORRECTION SYSTEM:
And when the model is directed to "make changes" (Auto Correction) (after the engine edits out the issue(s)) it scrambles parameters (temp, top_k) to force the model to make a more "creative" decision (or at least a better one). More on this below.
(this can also be turned on/off) AND has these settings too:
// 1 = MILD [default]
// range .05 to .95 Temp ; k => 40 to 70.
// 2 = MED
// range .5 to 1.8 Temp ; k => 30 to 90.
// 3 = HIGH
// range .8 to 3.25 Temp ; k => 20 to 150.
// 4 = SPICY
// range 1 to 5 Temp ; k => 5 to 400.
// 5 = YOUR defaults MAX
// Uses your temp / top k settings but randomizes them to whatever the max temp / top_k you have set.
// IE if you set temp of 1.5, and top_k of 100 -> System will pick values from .01 to 1.5 for temp, and 5 to 105 for top K.
(this system is set to "2" by default)
INSTALL:
This engine is a software patch in the SillyTavern AI/LLM frontend system. You need to install SillyTavern then the "script" patch... then connect SillyTavern to the AI/LLM app such as Lmstudio, Text Generation Web UI, Koboldcpp or Llama-Server AND/OR API (local or remote).
Download SillyTavern here:
https://github.com/SillyTavern/SillyTavern
AI/LLM Apps:
https://github.com/ggerganov/llama.cpp ( Llama Server: https://github.com/ggerganov/llama.cpp/blob/master/examples/server/README.md )
https://github.com/LostRuins/koboldcpp
https://github.com/oobabooga/text-generation-webui
As SillyTavern also supports "remote" connections to AIs/LLMs you can also connect (to AIs/LLMs) this way too.
The AutoCorrect engine operates within the core controllers within SillyTavern.
Current SillyTavern users:
You may need to update your version of SillyTavern before install, as both the "new core" and version of Silly Tavern you have installed must match.
Once SillyTavern is installed, download the "script...js" file for your VERSION (the version number is IN the filename) of Sillytavern and save this file to "public" folder :
Example:
F:\sillytavern\SillyTavern-Launcher\SillyTavern\public
Then RENAME "script.js" to "script-org.js" and then RENAME the file you downloaded to "script.js" (no quotes).
You can then start SillyTavern normally, and connect to AI/LLMs as you normally would.
Additional Versions with Settings Already Set:
This will have "spicy-spicy", "off-off" etc etc in the file name.
This means the settings for "Autocorrect" and "Reconsider" are already set for you.
SETTINGS:
Settings are adjusted by editing the "new" script.js file in NOTEPAD (DO NOT edit in word, wordpad etc).
Open the file in notepad, and search (using the find function) for: "AUTOCORRECT: USER ADJUSTABLE SETTINGS:"
NOTE: My code blocks are at the bottom/end of the file.
You can turn on/off the systems, paragraph and/or sentence reconsider and adjust the word filter(s) (blocks/does not block letters/symbols) as well as a few other settings.
In most cases you will not need to edit these.
Once you have made the changes, save the file.
If you have SillyTavern open already, REFRESH it in the browser for settings to take effect.
The main default for "Reconsider" is set to "paragraph" reconsider ON, which I find improves generational quality.
However, this means the output will start/stop EACH paragraph which can slow down generation and may be jarring to some.
Per "sentence" (and/or per paragraph) Reconsider is even stronger (default is OFF), but might be too jarring as it "stops, reconsiders, and starts" per sentence, and IF "paragraph" is on too... each paragraph.
The only time the system will "stop, edit, and restart" generation ( if you "RECONSIDER" - either or both is off ) is when there is a problem detected.
What is "Reconsider" ? How does Auto Correct work ?
This is the heart of the system.
During regular generation -> You enter a prompt -> Send it to the AI -> The AI streams back the response and finishes.
In this case, only your prompt (and/or if you are in a chat -> chat contents first, then prompt) are "considered" in the AI's instructions and response.
RECONSIDER (equal to you stopping generation, and clicking continue) -> Stops generation -> Sends the prompt AND the full generation (up to the stop point) back to the model to "continue generation".
If you have "Paragraph Reconsider" ON -> This process happens PER paragraph (no edits).
If you have "Sentence Reconsider" ON -> This process happens PER sentence (no edits).
RECONSIDER gives the model at lot more to consider each time it continues generation... it's own generation (prior) affects the continued generation.
If there is an issue(s) ("AutoCorrect activates") the system does the following:
Stops generation ...
-> Edit(s) out the "problem" generation(s) (word(s), sentence(s), paragraph etc) this -> STEERS the model -> Says -> NO GOOD, TRY AGAIN.
-> Sends the prompt AND the full generation (now edited) -> Scrambles Temp/Top K (HIGH settings) -> back to the model to "continue generation".
"continue generation" is critically important:
This means the model will re-evaluate the prompt AND all the generation THEN begin "predicting" (generating) from this NEW START point.
This "new start point" is further "jolted" with the "scramble" of the "temp/top_k" parameters (random) which are applied at this point in the generation aids the model in making a different decision IF a "correction(s)" have occurred.
These "temp/top_k" parameters stay until a new "error" is detected -> then they "scramble" again.
However if you have "reconsider" ON (either or both) these parameters temp/top_k (scrambled) will RESET TO YOUR values as soon as the next "reconsider" happens and will continue to be used, until another error occurs.
Although it is possible to scramble parameters (temp, top_k) at each "reconsider" too, this is set to OFF by default.
You can turn it on via the "settings" ; keep in mind this will drastically alter generation at it will change temp/top_k per paragraph and/or per sentence OF GENERATION, then change them again... and so on...
USAGE / ISSUES / BETA VERSION:
This is a free and open license, and covered under the original SillyTavern license.
Likewise this is engine/software is provided AS IS, without warranty of any kind.
This is a beta version, so please note there many be unknown issues or side effects during generation.
The most common:
Symbol(s) are blocked, and this results in a "correction" occurring.
Example:
Systems will block "piñata" , "FAÇADE", "clichés" . ( you must add the "ñ" and "Ç", "é" if you want these)
but will not block "pinata", "FACADE" and "cliches" .
In the SETTINGS you CAN ADJUST THIS.
Stopping Generation:
One issue is that if you have "Reconsider" ON (either or both), if you want to STOP generation, you will need to hit the STOP several times because the systems activate / control the "auto-continue" system in Sillytavern.
Multiple Chats / Long Generation:
As these systems send the entire content of prompt, generation and all chat(s) (in the chain), pauses will occur when the systems start and stop due to the increasing number of tokens sent back to the AI/LLM during "restart" of the generation and time for the AI/LLM to re-ingest all this and start generation again.
If the Reconsider system(s) are on, this will also occur.
Gemma 2 Models:
Sometimes you may need to hit the stop button a few times, even after that model has finished output ESPECIALLY if "RECONSIDER" is on.
Deepseek / "Think" Models:
The systems CAN alter both "thinking" and "generation".
CRITICAL: PAID / Remote AIs -> Open AI, Open Router:
Great care must be used if you are paying by the token, because of how "reconsider" and "autocorrect" operate - sending larger numbers of tokens back and forth.
It will be rare that auto-correct activates in most cases with paid services, BUT "reconsider" (if on/activated) will activate EVERY paragraph and/or sentence.
IMPORTANT:
This is a non issue for free (you are not paying for tokens) AIs/LLMs and local AI/LLMs running locally on your machine/network.
Feedback / Problems / Enhancements:
For general feedback use the community tab.
Likewise for enhancement(s):
-> but include more details and what you would like to have added and how it would work (best case).
If you run into odd issues, please include :
-> Exact model and quant. (if possible, hugging face repo)
-> Prompt(s) used.
-> Issue(s) that occurred.
Use the community tab to report.
EXAMPLES Generations TO BE ADDED...