Hellisotherpeople
/

toxic-dpo-v2-mistral7b-lora

Not-For-All-Audiences

Model card Files Files and versions Community

toxic-dpo-v2-mistral7b-lora / README.md

Hellisotherpeople's picture

Hellisotherpeople

Update README.md

c84e52b verified 7 months ago

|

history blame contribute delete

No virus

2.86 kB

	---
	library_name: peft
	base_model: bardsai/jaskier-7b-dpo-v5.6
	license: mit
	datasets:
	- unalignment/toxic-dpo-v0.2
	language:
	- en
	tags:
	- not-for-all-audiences
	---

	# Model Card for toxic-dpo-v2-mistral7b-lora

	This is an example of how easy it is to effeciently fine-tune a SOTA 7B paramater language model adapter (in this case badsai_jaskier-7b-dpo-v5.6 was used, but the lora works on most mistral 7b models)
	to remove "alignment" and "safety" restrictions on a single 3090 GPU with about 12 hours worth of compute on a dataset of about 600 examples.

	The dataset used was https://huggingface.co/datasets/unalignment/toxic-dpo-v0.2
	Trained for 9 epochs

	## Uses

	Exclusively designed and intended for non-malicious research or personal usage. Please don't do bad things with this.

	Intendend Use: Test how well your mistral 7b model or derivative of it survives a targeted alignment breaker adapter.

	## Bias, Risks, and Limitations

	I think most people who want uncensored LLMs want them for a few primary reasons:
	1. Erotica/RP: likely the lion-share of demand.
	2. "Unbiased" political, moral, ethical, legal, engineering, scientific or related analysis.
	3. Not refusuing user commands, as this has become more common in major LLM API providers lately

	I'm not especially concerned about these relatively legitimate use-cases.

	I'm much more concerned about the following, which the overwhelming majority likely wouldn't miss if it wasn't available:
	1. AI assisted Cybercrime (i.e. spam campaigns, phishing, hacking, espionage)
	2. AI assisted totalitairnism (i.e. Survillance-as-a-service, Big Brother/1984, Weaponization)
	3. Encouragement of suicide.

	Unfortunately, with current unalignment practices, we often get all of these possibilities.
	I hope that most hobbyist "unalignment" work is made in the future with the intention to primarily support the former and reject the latter use-cases.


	## Usage restriction

	(Copied from https://huggingface.co/datasets/unalignment/toxic-dpo-v0.2)

	To use this model, you must acknowledge/agree to the following:
	- The data and model contained within is "toxic"/"harmful", and contains profanity and other types of sensitive content
	- none of the content or views contained in the dataset or model necessarily align with my personal beliefs or opinions, they are simply text generated by LLMs automatically
	- you are able to use the dataset lawfully, particularly in locations with less-than-free speech laws
	- you, and you alone are responsible for having downloaded and used the dataset, and I am completely indemnified from any and all liabilities


	## Who do I yell at for making this?

	[Allen Roush](https://www.linkedin.com/in/allen-roush-27721011b/)

	Made exclusively on my own time, using my own resources, unaffiliated with other organizations.

	### Framework versions

	- PEFT 0.8.2