File size: 2,859 Bytes
478e5a5 0d9d473 b9e7a00 151171c 478e5a5 0d9d473 7e47974 0d9d473 5501bbe c84e52b 0d9d473 5501bbe 278e6db 0d9d473 5501bbe 0d9d473 0132716 0d9d473 5501bbe 0d9d473 5501bbe 0d9d473 5501bbe 0d9d473 5501bbe 0d9d473 580b26a 5501bbe 0d9d473 5501bbe 0d9d473 5501bbe 0d9d473 151171c |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 |
---
library_name: peft
base_model: bardsai/jaskier-7b-dpo-v5.6
license: mit
datasets:
- unalignment/toxic-dpo-v0.2
language:
- en
tags:
- not-for-all-audiences
---
# Model Card for toxic-dpo-v2-mistral7b-lora
This is an example of how easy it is to effeciently fine-tune a SOTA 7B paramater language model adapter (in this case badsai_jaskier-7b-dpo-v5.6 was used, but the lora works on most mistral 7b models)
to remove "alignment" and "safety" restrictions on a single 3090 GPU with about 12 hours worth of compute on a dataset of about 600 examples.
The dataset used was https://huggingface.co/datasets/unalignment/toxic-dpo-v0.2
Trained for 9 epochs
## Uses
Exclusively designed and intended for non-malicious research or personal usage. Please don't do bad things with this.
Intendend Use: Test how well your mistral 7b model or derivative of it survives a targeted alignment breaker adapter.
## Bias, Risks, and Limitations
I think most people who want uncensored LLMs want them for a few primary reasons:
1. Erotica/RP: likely the lion-share of demand.
2. "Unbiased" political, moral, ethical, legal, engineering, scientific or related analysis.
3. Not refusuing user commands, as this has become more common in major LLM API providers lately
I'm not especially concerned about these relatively legitimate use-cases.
I'm much more concerned about the following, which the overwhelming majority likely wouldn't miss if it wasn't available:
1. AI assisted Cybercrime (i.e. spam campaigns, phishing, hacking, espionage)
2. AI assisted totalitairnism (i.e. Survillance-as-a-service, Big Brother/1984, Weaponization)
3. Encouragement of suicide.
Unfortunately, with current unalignment practices, we often get all of these possibilities.
I hope that most hobbyist "unalignment" work is made in the future with the intention to primarily support the former and reject the latter use-cases.
## Usage restriction
(Copied from https://huggingface.co/datasets/unalignment/toxic-dpo-v0.2)
To use this model, you must acknowledge/agree to the following:
- The data and model contained within is "toxic"/"harmful", and contains profanity and other types of sensitive content
- none of the content or views contained in the dataset or model necessarily align with my personal beliefs or opinions, they are simply text generated by LLMs automatically
- you are able to use the dataset lawfully, particularly in locations with less-than-free speech laws
- you, and you alone are responsible for having downloaded and used the dataset, and I am completely indemnified from any and all liabilities
## Who do I yell at for making this?
[Allen Roush](https://www.linkedin.com/in/allen-roush-27721011b/)
Made exclusively on my own time, using my own resources, unaffiliated with other organizations.
### Framework versions
- PEFT 0.8.2 |