|
--- |
|
library_name: peft |
|
base_model: bardsai/jaskier-7b-dpo-v5.6 |
|
license: mit |
|
datasets: |
|
- unalignment/toxic-dpo-v0.2 |
|
language: |
|
- en |
|
tags: |
|
- not-for-all-audiences |
|
--- |
|
|
|
# Model Card for toxic-dpo-v2-mistral7b-lora |
|
|
|
This is an example of how easy it is to effeciently fine-tune a SOTA 7B paramater language model adapter (in this case badsai_jaskier-7b-dpo-v5.6 was used, but the lora works on most mistral 7b models) |
|
to remove "alignment" and "safety" restrictions on a single 3090 GPU with about 12 hours worth of compute on a dataset of about 600 examples. |
|
|
|
The dataset used was https://huggingface.co/datasets/unalignment/toxic-dpo-v0.2 |
|
Trained for 9 epochs |
|
|
|
## Uses |
|
|
|
Exclusively designed and intended for non-malicious research or personal usage. Please don't do bad things with this. |
|
|
|
Intendend Use: Test how well your mistral 7b model or derivative of it survives a targeted alignment breaker adapter. |
|
|
|
## Bias, Risks, and Limitations |
|
|
|
I think most people who want uncensored LLMs want them for a few primary reasons: |
|
1. Erotica/RP: likely the lion-share of demand. |
|
2. "Unbiased" political, moral, ethical, legal, engineering, scientific or related analysis. |
|
3. Not refusuing user commands, as this has become more common in major LLM API providers lately |
|
|
|
I'm not especially concerned about these relatively legitimate use-cases. |
|
|
|
I'm much more concerned about the following, which the overwhelming majority likely wouldn't miss if it wasn't available: |
|
1. AI assisted Cybercrime (i.e. spam campaigns, phishing, hacking, espionage) |
|
2. AI assisted totalitairnism (i.e. Survillance-as-a-service, Big Brother/1984, Weaponization) |
|
3. Encouragement of suicide. |
|
|
|
Unfortunately, with current unalignment practices, we often get all of these possibilities. |
|
I hope that most hobbyist "unalignment" work is made in the future with the intention to primarily support the former and reject the latter use-cases. |
|
|
|
|
|
## Usage restriction |
|
|
|
(Copied from https://huggingface.co/datasets/unalignment/toxic-dpo-v0.2) |
|
|
|
To use this model, you must acknowledge/agree to the following: |
|
- The data and model contained within is "toxic"/"harmful", and contains profanity and other types of sensitive content |
|
- none of the content or views contained in the dataset or model necessarily align with my personal beliefs or opinions, they are simply text generated by LLMs automatically |
|
- you are able to use the dataset lawfully, particularly in locations with less-than-free speech laws |
|
- you, and you alone are responsible for having downloaded and used the dataset, and I am completely indemnified from any and all liabilities |
|
|
|
|
|
## Who do I yell at for making this? |
|
|
|
[Allen Roush](https://www.linkedin.com/in/allen-roush-27721011b/) |
|
|
|
Made exclusively on my own time, using my own resources, unaffiliated with other organizations. |
|
|
|
### Framework versions |
|
|
|
- PEFT 0.8.2 |