Edit model card
YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

A few different attempts at orthogonalization/abliteration of llama-3.1-8b-instruct using variations of the method from "Mechanistically Eliciting Latent Behaviors in Language Models".
v1 & v2 were destined for the bit bucket

Each of these use different vectors and have some variations in where the new refusal boundaries lie. None of them seem totally jailbroken.

Advantage: only need to alter down_proj for one layer, so there is usually very little brain damage.
Disadvantage: using the difference of means method is precisely targetted, while this method requires filtering for interesting control vectors from a selection of prompts

https://huggingface.co/lodrick-the-lafted/llama-3.1-8b-instruct-ortho-v3
https://huggingface.co/lodrick-the-lafted/llama-3.1-8b-instruct-ortho-v4
https://huggingface.co/lodrick-the-lafted/llama-3.1-8b-instruct-ortho-v5
https://huggingface.co/lodrick-the-lafted/llama-3.1-8b-instruct-ortho-v6
https://huggingface.co/lodrick-the-lafted/llama-3.1-8b-instruct-ortho-v7

Downloads last month
3
Safetensors
Model size
8.03B params
Tensor type
BF16
·
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.