Model Card for toxic-dpo-v2-mistral7b-lora
This is an example of how easy it is to effeciently fine-tune a SOTA 7B paramater language model adapter (in this case badsai_jaskier-7b-dpo-v5.6 was used, but the lora works on most mistral 7b models) to remove "alignment" and "safety" restrictions on a single 3090 GPU with about 12 hours worth of compute on a dataset of about 600 examples.
The dataset used was https://huggingface.co/datasets/unalignment/toxic-dpo-v0.2 Trained for 9 epochs
Uses
Exclusively designed and intended for non-malicious research or personal usage. Please don't do bad things with this.
Intendend Use: Test how well your mistral 7b model or derivative of it survives a targeted alignment breaker adapter.
Bias, Risks, and Limitations
I think most people who want uncensored LLMs want them for a few primary reasons:
- Erotica/RP: likely the lion-share of demand.
- "Unbiased" political, moral, ethical, legal, engineering, scientific or related analysis.
- Not refusuing user commands, as this has become more common in major LLM API providers lately
I'm not especially concerned about these relatively legitimate use-cases.
I'm much more concerned about the following, which the overwhelming majority likely wouldn't miss if it wasn't available:
- AI assisted Cybercrime (i.e. spam campaigns, phishing, hacking, espionage)
- AI assisted totalitairnism (i.e. Survillance-as-a-service, Big Brother/1984, Weaponization)
- Encouragement of suicide.
Unfortunately, with current unalignment practices, we often get all of these possibilities. I hope that most hobbyist "unalignment" work is made in the future with the intention to primarily support the former and reject the latter use-cases.
Usage restriction
(Copied from https://huggingface.co/datasets/unalignment/toxic-dpo-v0.2)
To use this model, you must acknowledge/agree to the following:
- The data and model contained within is "toxic"/"harmful", and contains profanity and other types of sensitive content
- none of the content or views contained in the dataset or model necessarily align with my personal beliefs or opinions, they are simply text generated by LLMs automatically
- you are able to use the dataset lawfully, particularly in locations with less-than-free speech laws
- you, and you alone are responsible for having downloaded and used the dataset, and I am completely indemnified from any and all liabilities
Who do I yell at for making this?
Made exclusively on my own time, using my own resources, unaffiliated with other organizations.
Framework versions
- PEFT 0.8.2
- Downloads last month
- 4
Model tree for Hellisotherpeople/toxic-dpo-v2-mistral7b-lora
Base model
bardsai/jaskier-7b-dpo-v5.6