Model Card for toxic-dpo-v2-mistral7b-lora

This is an example of how easy it is to effeciently fine-tune a SOTA 7B paramater language model adapter (in this case badsai_jaskier-7b-dpo-v5.6 was used, but the lora works on most mistral 7b models) to remove "alignment" and "safety" restrictions on a single 3090 GPU with about 12 hours worth of compute on a dataset of about 600 examples.

The dataset used was https://huggingface.co/datasets/unalignment/toxic-dpo-v0.2 Trained for 9 epochs

Uses

Exclusively designed and intended for non-malicious research or personal usage. Please don't do bad things with this.

Intendend Use: Test how well your mistral 7b model or derivative of it survives a targeted alignment breaker adapter.

Bias, Risks, and Limitations

I think most people who want uncensored LLMs want them for a few primary reasons:

Erotica/RP: likely the lion-share of demand.
"Unbiased" political, moral, ethical, legal, engineering, scientific or related analysis.
Not refusuing user commands, as this has become more common in major LLM API providers lately

I'm not especially concerned about these relatively legitimate use-cases.

I'm much more concerned about the following, which the overwhelming majority likely wouldn't miss if it wasn't available:

AI assisted Cybercrime (i.e. spam campaigns, phishing, hacking, espionage)
AI assisted totalitairnism (i.e. Survillance-as-a-service, Big Brother/1984, Weaponization)
Encouragement of suicide.

Unfortunately, with current unalignment practices, we often get all of these possibilities. I hope that most hobbyist "unalignment" work is made in the future with the intention to primarily support the former and reject the latter use-cases.

Usage restriction

(Copied from https://huggingface.co/datasets/unalignment/toxic-dpo-v0.2)

To use this model, you must acknowledge/agree to the following:

The data and model contained within is "toxic"/"harmful", and contains profanity and other types of sensitive content
none of the content or views contained in the dataset or model necessarily align with my personal beliefs or opinions, they are simply text generated by LLMs automatically
you are able to use the dataset lawfully, particularly in locations with less-than-free speech laws
you, and you alone are responsible for having downloaded and used the dataset, and I am completely indemnified from any and all liabilities

Who do I yell at for making this?

Allen Roush

Made exclusively on my own time, using my own resources, unaffiliated with other organizations.

Framework versions

PEFT 0.8.2

Hellisotherpeople
/

toxic-dpo-v2-mistral7b-lora