ParasiticRogue/Merged-RP-Stew-V2-34B · Differences between GGUF and EXL2 quants?

Stability: Less prone to making unnecessarily long (or sometime infinite) replies when the scene doesn't call for it, and less likely to spew junk data such as code, gibberish, random website tangents. user/character profiles word-for-word, etc.

I can only say for certain that the parquet I use is superior at least on my models when it comes to exl2 vs other exl2/gguf quants, since I don't know how to make ggufs myself. Even the ggufs I hosted on this model page that don't use my parquet still has these issues in testing (long/infinite replies, junk such as "\n\nEND" being posted) when using the same settings from the quants using Kalomaze's semi-random groups_merged.txt. And just for reference, I'm recently doing a scenario that has over 200 swipes to test out other things such as system prompting, and none of them have failed or gone off the deep end like the quants using Kalomaze's data did in only 10 test swipes.

@MarinaraSpaghetti has said they will make ggufs with my parquet sometime soon to help see if it can be confirmed it's stable across different quant practices, but no idea when (maybe this week, idk) those gguf quants will come to be.

As for backend/API, people seem to enjoy TabbyAPI, but I still use Ooba. exl2 also only uses gpu vram, and can't be offloaded to cpu.