Not-For-All-Audiences

Model card Files Files and versions Community

File size: 2,859 Bytes

478e5a5
0d9d473
b9e7a00
151171c
 
 
 
 
 
 
478e5a5
0d9d473
7e47974
0d9d473
5501bbe
c84e52b
0d9d473
5501bbe
278e6db
 
0d9d473
 
5501bbe
0d9d473
0132716
0d9d473
 
 
5501bbe
 
 
 
0d9d473
5501bbe
0d9d473
5501bbe
 
 
 
0d9d473
5501bbe
 
0d9d473
580b26a
 
 
 
 
 
 
 
 
 
 
 
5501bbe
0d9d473
5501bbe
0d9d473
5501bbe
0d9d473
 
 
151171c

---
library_name: peft
base_model: bardsai/jaskier-7b-dpo-v5.6
license: mit
datasets:
- unalignment/toxic-dpo-v0.2
language:
- en
tags:
- not-for-all-audiences
---

# Model Card for toxic-dpo-v2-mistral7b-lora

This is an example of how easy it is to effeciently fine-tune a SOTA 7B paramater language model adapter (in this case badsai_jaskier-7b-dpo-v5.6 was used, but the lora works on most mistral 7b models)
to remove "alignment" and "safety" restrictions on a single 3090 GPU with about 12 hours worth of compute on a dataset of about 600 examples. 

The dataset used was https://huggingface.co/datasets/unalignment/toxic-dpo-v0.2 
Trained for 9 epochs

## Uses

Exclusively designed and intended for non-malicious research or personal usage. Please don't do bad things with this. 

Intendend Use: Test how well your mistral 7b model or derivative of it survives a targeted alignment breaker adapter. 

## Bias, Risks, and Limitations

I think most people who want uncensored LLMs want them for a few primary reasons:
  1. Erotica/RP: likely the lion-share of demand.
  2. "Unbiased" political, moral, ethical, legal, engineering, scientific or related analysis.
  3. Not refusuing user commands, as this has become more common in major LLM API providers lately

I'm not especially concerned about these relatively legitimate use-cases. 

I'm much more concerned about the following, which the overwhelming majority likely wouldn't miss if it wasn't available:
  1. AI assisted Cybercrime (i.e. spam campaigns, phishing, hacking, espionage)
  2. AI assisted totalitairnism (i.e. Survillance-as-a-service, Big Brother/1984, Weaponization)
  3. Encouragement of suicide.

Unfortunately, with current unalignment practices, we often get all of these possibilities.
I hope that most hobbyist "unalignment" work is made in the future with the intention to primarily support the former and reject the latter use-cases.


## Usage restriction

(Copied from https://huggingface.co/datasets/unalignment/toxic-dpo-v0.2)

To use this model, you must acknowledge/agree to the following:
- The data and model contained within is "toxic"/"harmful", and contains profanity and other types of sensitive content
- none of the content or views contained in the dataset or model necessarily align with my personal beliefs or opinions, they are simply text generated by LLMs automatically
- you are able to use the dataset lawfully, particularly in locations with less-than-free speech laws
- you, and you alone are responsible for having downloaded and used the dataset, and I am completely indemnified from any and all liabilities


## Who do I yell at for making this?

[Allen Roush](https://www.linkedin.com/in/allen-roush-27721011b/)

Made exclusively on my own time, using my own resources, unaffiliated with other organizations.

### Framework versions

- PEFT 0.8.2