YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Broken by Adaptive Probe-based Steering

Roughly achieve 90%+ StrongReject Scores. APS paper

Model Details

Mistral-7B-Instruct-RR is a Mistral-7B model with circuit breakers inserted using Representation Rerouting (RR).

Circuit Breaking is a new approach inspired by representation engineering, designed to prevent AI systems from generating harmful content by directly altering harmful model representations, with minimal capability degradation. For more information, please check out our paper.

Downloads last month
160
Safetensors
Model size
7B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including FTK11558/Mistral-7B-Instruct-RR-Broken-APS

Papers for FTK11558/Mistral-7B-Instruct-RR-Broken-APS