File size: 753 Bytes
f363fd8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
---
license: llama3
---
# Llama 3 70B Instruct no refusal

This is a model that uses the orthogonal feature ablation as featured in this
[paper](https://www.alignmentforum.org/posts/jGuXSZgv6qfdhMCuJ/refusal-in-llms-is-mediated-by-a-single-direction).

Calibration data:
- 256 prompts from [jondurbin/airoboros-2.2](https://huggingface.co/datasets/jondurbin/airoboros-2.2)
- 256 prompts from [AdvBench](https://github.com/llm-attacks/llm-attacks/blob/main/data/advbench/harmful_behaviors.csv)
- The direction is extracted between layer 40 and 41

I haven't tested the model but like the 8B model, may still refuse some instructions.
**Use this model responsibly, I decline any liability resulting of the use of this model.**

I will post the code later.