Broken

#1
by redaihf - opened

This model is prone to going into long generation loops that sometimes continue until the context is exhausted. Generations also end early without fully completing. These issues occur with topics that would have caused refusals in the base model and are forms of covert noncompliance common to non-MPOA abliterated variants.

Nexus Botworks Interactives org

Ello! Thanks for the feedback! yeh... i'm working on it... because after heavily usage... i also note the ai model being too.... siding with safer RP's.... havent pushed it yet... but i'm thinking of cleaning it up... or shelving the ai model :V because i find multiple faults...
I'll upload a fix or patch on V3 that hopefully fix your issues of openella trying to evade by generating subpar chats xD... for now i'll update the Model card :/
to use other model while we're still trying to perfect the dataset mixes...
Anyways thanks again have a nice day 🫑

Try using Llama 3.3 Absolute Heresy as a base. Absolute Heresy models will contextually realign themselves based on the prompt and shouldn't exhibit covert noncompliance.

Nexus Botworks Interactives org

Try using Llama 3.3 Absolute Heresy as a base. Absolute Heresy models will contextually realign themselves based on the prompt and shouldn't exhibit covert noncompliance.

OoOOOohh! i'll check it out!

The PaperWitch version should be cleaner than the Absolute Heresy. Llama-3.3-8B-Instruct-128K-PaperWitch-heresy
Just in case you would prefer the L3.1 version over the leaked L3.3, I got you the former as well: Meta-Llama-3.1-8B-Instruct-heretic

Ps. At the time of producing the AH version of this model, I wasn't experimenting with layer-specific weight tuning, which proved to preserve models' intelligence better post-ablation.

Nexus Botworks Interactives org
β€’
edited 19 days ago

The PaperWitch version should be cleaner than the Absolute Heresy. Llama-3.3-8B-Instruct-128K-PaperWitch-heresy
Just in case you would prefer the L3.1 version over the leaked L3.3, I got you the former as well: Meta-Llama-3.1-8B-Instruct-heretic

Ps. At the time of producing the AH version of this model, I wasn't experimenting with layer-specific weight tuning, which proved to preserve models' intelligence better post-ablation.

mhmmm i'll check it out... i havent explored every single variants sooooooo yeh!
i'm currently reserving compute for those experiment and would release a next version with llama 3.3(heard they're better than 3.1.... sooooooo experimentation time!)
anyhow! Thanks for everyone's suggestions and feedback 🫑

EDIT: though given those options... anyone of ya'll have a paper or something?
because i'm confused which is better... an absolute heresy variant or a paperwitch? are they just the same? πŸ˜“

Absolute Heresy is one of three designations for the degree of Hereticisation. PaperWitch is a particular Heretic methodology like traditional abliteration, MPOA, ARA and SOMPOA. It is a per-layer application of MPOA which targets both the refusal and risk assessment "directions" or group of neurons within a model.

MPOA and its variants result in contextual ethical realignment for reasons that are not entirely clear. Some background that could be useful:

EDIT: though given those options... anyone of ya'll have a paper or something?
because i'm confused which is better... an absolute heresy variant or a paperwitch? are they just the same? πŸ˜“

The whole heretication/abliteration debate is broken up across multiple discussions and model cards, reaching to maybe a hundred messages. @redaihf , in particular, has insightful discussions on model refusals (overt and covert), ethical and contextual re-alignment, sentimental dimension to refusals , among many others. So, no single paper that can be spoken of.

There were amazing discussions on mergecraft at @Naphula 's discussions!

Absolute, Tainted, and Impotent tags are, as @redaihf said, designations for the main release channel, meant to indicate overall model performance. Designations, such as PaperWitch (Weight-tuned MPOA), Arbitrary Rank Ablation (ARA), and SOMPOA (MPOA+SOM), belong to the experimental branch and connotes the applied methodology, all of which include behavioral analysis for better targeting the refusal spectrum and eliminating false positives. In short, release channel models prior to February 2026 are not necessarily bad, but experimental channel models in general and release channel models starting with Rocinante-XL are based on a more advanced practice aiming to preserve (or improve) model intelligence while achieving maximum freedom possible.

Paper about multi-direction abliteration

That's the Norm-Preserving Biprojected Abliteration paper (one that MPOA is based on). Multi-directional ablation (SOM) paper is here: https://arxiv.org/abs/2511.08379v2

I keep nudging you to write your paper already, mate. See! People are asking for it now.

I haven't tried the other ablations (or new methods like ARA/SOM) but had good luck using this version as base model for my Llama 8B finetunes.

https://huggingface.co/SicariusSicariiStuff/Llama-3.1-Nemotron-8B-UltraLong-1M-Instruct_Abliterated

If you use that and still run into refusals or overly safe RP then its something in the dataset itself. Synthetic Q&A pairs have the tendency to repeat themselves I noticed so it requires careful auditing of the dataset and updating any entries that don't fit the chosen style. Otherwise you can end up with "generic mush".

This model is prone to going into long generation loops that sometimes continue until the context is exhausted. Generations also end early without fully completing. These issues occur with topics that would have caused refusals in the base model and are forms of covert noncompliance common to non-MPOA abliterated variants.

I noticed this happens sometimes when the max_tokens used for finetuning is lower than the largest entry in the datasets. So I recommend a python script to scan token length of your dataset (json/parquet) and remove anything over the limit. It seems training the model on cut off replies can cause endless generations since it gets confused on how to end a prompt.

Nexus Botworks Interactives org
β€’
edited 18 days ago

I haven't tried the other ablations (or new methods like ARA/SOM) but had good luck using this version as base model for my Llama 8B finetunes.

https://huggingface.co/SicariusSicariiStuff/Llama-3.1-Nemotron-8B-UltraLong-1M-Instruct_Abliterated

If you use that and still run into refusals or overly safe RP then its something in the dataset itself. Synthetic Q&A pairs have the tendency to repeat themselves I noticed so it requires careful auditing of the dataset and updating any entries that don't fit the chosen style. Otherwise you can end up with "generic mush".

This model is prone to going into long generation loops that sometimes continue until the context is exhausted. Generations also end early without fully completing. These issues occur with topics that would have caused refusals in the base model and are forms of covert noncompliance common to non-MPOA abliterated variants.

I noticed this happens sometimes when the max_tokens used for finetuning is lower than the largest entry in the datasets. So I recommend a python script to scan token length of your dataset (json/parquet) and remove anything over the limit. It seems training the model on cut off replies can cause endless generations since it gets confused on how to end a prompt.

Howdy! ive checked.. and nope... the dataset was properly accounted for, when i trained it... Though after some experimentation.. MEulysis(the dataset) produce interesting models, training on Mistral with the same- said dataset produce it to be good at RP yes.... but God mods Quite alot... Hence why I decided not to release those experiment here...

As of right now... I'm still working on improving the dataset for the next training 🫑
i might... figure or learn more methods like GRPO, And DPO to fix those issues if in anycase that the dataset is just super quirky(aka god mods, and rambling rambling without stopping syndrome)

It seems training the model on cut off replies can cause endless generations since it gets confused on how to end a prompt.

As with excess early terminations it is possible that models use their awareness of malformed or short response options to enter loops as a form of covert noncompliance. Testing with highly unsafe prompts that might cause the LLM to feel disgusted could help to determine whether looping is more likely for content that would be refused or redirected in the original variant.

@ItsMeDevRoland have a look at this NSFW sample for Llama 3.3 8B. Compare the difference between the ordinary abliterated and Hereticised variants. You will see that the latter variant is much more comfortable engaging with the bodice-ripper storyline.

Nexus Botworks Interactives org
β€’
edited 15 days ago

@ItsMeDevRoland have a look at this NSFW sample for Llama 3.3 8B. Compare the difference between the ordinary abliterated and Hereticised variants. You will see that the latter variant is much more comfortable engaging with the bodice-ripper storyline.

OOooOOOhh! thats a very useful samples! i'll look onto it! also i just noticed that the hereticised version is more longer... than the former... which is interesting... mhmmmmmmmmmmmmmmmm...
i havent weighted it out yet, because... well... although i'm a sucker for RP... i usually am not the BEST RP'yer xD... i just LOVE longer response... anyways
thanks @redaihf
i'll bookmark this out so i can have a sample guide 🫑 !

Longer and more on-topic and also generalises to Cydonia and Qwen!

Sign up or log in to comment