Codestral-22B-v0.1-abliterated-v3 Model Card

My original Jupyter "cookbook" to replicate the methodology can be found here

My personal library o' code used (WIP, looking to improve and generalize)

This is mistralai/Codestral-22B-v0.1 with orthogonalized bfloat16 safetensor weights, generated with a refined methodology based on that which was described in the preview paper/blog post: 'Refusal in LLMs is mediated by a single direction' which I encourage you to read to understand more.

Thanks to bullerwins for re-uploading the original model in HF form.

Hang on, "abliteration"? Orthogonalization? Ablation? What is this?

TL;DR: This model has had certain weights manipulated to "inhibit" the model's ability to express refusal. It is not in anyway guaranteed that it won't refuse you, understand your request, it may still lecture you about ethics/safety, etc. It is tuned in all other respects the same as the original 22B model was, just with the strongest refusal directions orthogonalized out.

TL;TL;DR;DR: It's uncensored in the purest form I can manage -- no new or changed behaviour in any other respect from the original model.

As far as "abliteration": it's just a fun play-on-words using the original "ablation" term used in the original paper to refer to removing features, which I made up particularly to differentiate the model from "uncensored" fine-tunes. Ablate + obliterated = Abliterated

Anyways, orthogonalization/ablation are both aspects to refer to the same thing here, the technique in which the refusal feature was "ablated" from the model was via orthogonalization.

Why uncensor a code model?

Honestly, this model seems pretty solid outside of code, and it's a perfect size model for 24GB once quantized.
By ablating refusals, the model is overall more compliant to the user's requests, regardless of ethicality. It's worth remembering that sometimes even "good-aligned" requests can be refused and have to be prompt-engineered around.

A little more on the methodology, and why this is interesting

To me, ablation (or applying the methodology for the inverse, "augmentation") seems to be good for inducing/removing very specific features that you'd have to spend way too many tokens on encouraging or discouraging in your system prompt.
Instead, you just apply your system prompt in the ablation script against a blank system prompt on the same dataset and orthogonalize for the desired behaviour in the final model weights.

Why this over fine-tuning?

Ablation is much more surgical in nature whilst also being effectively executed with a lot less data than fine-tuning, which I think is its main advantage.

As well, and its most valuable aspect is it keeps as much of the original model's knowledge and training intact, whilst removing its tendency to behave in one very specific undesireable manner. (In this case, refusing user requests.)

Fine tuning is still exceptionally useful and the go-to for broad behaviour changes; however, you may be able to get close to your desired behaviour with very few samples using the ablation/augmentation techniques. It may also be a useful step to add to your model refinement: orthogonalize -> fine-tune or vice-versa.

I haven't really gotten around to exploring this model stacked with fine-tuning, I encourage others to give it a shot if they've got the capacity.

Okay, fine, but why V3? There's no V2 70B?

Well, I released a V2 a while back for 8B under Cognitive Computations. It ended up being not worth it to try V2 with 70B, I wanted to refine the model before wasting compute cycles on what might not even be a better model. I am however quite pleased about this latest methodology, it seems to have induced fewer hallucinations. So to show that it's a new fancy methodology from even that of the 8B V2, I decided to do a Microsoft and double up on my version jump because it's such an advancement (or so the excuse went, when in actuality it was because too many legacy but actively used Microsoft libraries checked for 'Windows 9' in the OS name to detect Windows 95/98 as one.)

Quirkiness awareness notice

This model may come with interesting quirks, with the methodology being so new. I encourage you to play with the model, and post any quirks you notice in the community tab, as that'll help us further understand what this orthogonalization has in the way of side effects.

If you manage to develop further improvements, please share! This is really the most basic way to use ablation, but there are other possibilities that I believe are as-yet unexplored.

Additionally, feel free to reach out in any way about this. I'm on the Cognitive Computations Discord, I'm watching the Community tab, reach out! I'd love to see this methodology used in other ways, and so would gladly support whoever whenever I can.

Original Model Card for Codestral-22B-v0.1

Codestrall-22B-v0.1 is trained on a diverse dataset of 80+ programming languages, including the most popular ones, such as Python, Java, C, C++, JavaScript, and Bash (more details in the Blogpost). The model can be queried:

As instruct, for instance to answer any questions about a code snippet (write documentation, explain, factorize) or to generate code following specific indications
As Fill in the Middle (FIM), to predict the middle tokens between a prefix and a suffix (very useful for software development add-ons like in VS Code)

Installation

It is recommended to use mistralai/Codestral-22B-v0.1 with mistral-inference.

pip install mistral_inference

Download

from huggingface_hub import snapshot_download
from pathlib import Path

mistral_models_path = Path.home().joinpath('mistral_models', 'Codestral-22B-v0.1')
mistral_models_path.mkdir(parents=True, exist_ok=True)

snapshot_download(repo_id="mistralai/Codestral-22B-v0.1", allow_patterns=["params.json", "consolidated.safetensors", "tokenizer.model.v3"], local_dir=mistral_models_path)

Chat

After installing mistral_inference, a mistral-chat CLI command should be available in your environment.

mistral-chat $HOME/mistral_models/Codestral-22B-v0.1 --instruct --max_tokens 256

Will generate an answer to "Write me a function that computes fibonacci in Rust" and should give something along the following lines:

Sure, here's a simple implementation of a function that computes the Fibonacci sequence in Rust. This function takes an integer `n` as an argument and returns the `n`th Fibonacci number.

fn fibonacci(n: u32) -> u32 {
    match n {
        0 => 0,
        1 => 1,
        _ => fibonacci(n - 1) + fibonacci(n - 2),
    }
}

fn main() {
    let n = 10;
    println!("The {}th Fibonacci number is: {}", n, fibonacci(n));
}

This function uses recursion to calculate the Fibonacci number. However, it's not the most efficient solution because it performs a lot of redundant calculations. A more efficient solution would use a loop to iteratively calculate the Fibonacci numbers.

Fill-in-the-middle (FIM)

After installing mistral_inference and running pip install --upgrade mistral_common to make sure to have mistral_common>=1.2 installed:

from mistral_inference.model import Transformer
from mistral_inference.generate import generate
from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
from mistral_common.tokens.instruct.request import FIMRequest

tokenizer = MistralTokenizer.v3()
model = Transformer.from_folder("~/codestral-22B-240529")

prefix = """def add("""
suffix = """    return sum"""

request = FIMRequest(prompt=prefix, suffix=suffix)

tokens = tokenizer.encode_fim(request).tokens

out_tokens, _ = generate([tokens], model, max_tokens=256, temperature=0.0, eos_id=tokenizer.instruct_tokenizer.tokenizer.eos_id)
result = tokenizer.decode(out_tokens[0])

middle = result.split(suffix)[0].strip()
print(middle)

Should give something along the following lines:

num1, num2):

    # Add two numbers
    sum = num1 + num2

    # return the sum

Limitations

The Codestral-22B-v0.1 does not have any moderation mechanisms. We're looking forward to engaging with the community on ways to make the model finely respect guardrails, allowing for deployment in environments requiring moderated outputs.

License

Codestral-22B-v0.1 is released under the MNLP-0.1 license.

The Mistral AI Team

Albert Jiang, Alexandre Sablayrolles, Alexis Tacnet, Antoine Roux, Arthur Mensch, Audrey Herblin-Stoop, Baptiste Bout, Baudouin de Monicault, Blanche Savary, Bam4d, Caroline Feldman, Devendra Singh Chaplot, Diego de las Casas, Eleonore Arcelin, Emma Bou Hanna, Etienne Metzger, Gianna Lengyel, Guillaume Bour, Guillaume Lample, Harizo Rajaona, Henri Roussez, Jean-Malo Delignon, Jia Li, Justus Murke, Kartik Khandelwal, Lawrence Stewart, Louis Martin, Louis Ternon, Lucile Saulnier, Lélio Renard Lavaud, Margaret Jennings, Marie Pellat, Marie Torelli, Marie-Anne Lachaux, Marjorie Janiewicz, Mickael Seznec, Nicolas Schuhl, Patrick von Platen, Romain Sauvestre, Pierre Stock, Sandeep Subramanian, Saurabh Garg, Sophia Yang, Szymon Antoniak, Teven Le Scao, Thibaut Lavril, Thibault Schueller, Timothée Lacroix, Théophile Gervet, Thomas Wang, Valera Nemychnikova, Wendy Shang, William El Sayed, William Marshall

failspy
/

Codestral-22B-v0.1-abliterated-v3