Instructions to use anicka/nla-qwen2.5-7b-L20-av-v2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use anicka/nla-qwen2.5-7b-L20-av-v2 with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-7B-Instruct") model = PeftModel.from_pretrained(base_model, "anicka/nla-qwen2.5-7b-L20-av-v2") - Transformers
How to use anicka/nla-qwen2.5-7b-L20-av-v2 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="anicka/nla-qwen2.5-7b-L20-av-v2") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("anicka/nla-qwen2.5-7b-L20-av-v2", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use anicka/nla-qwen2.5-7b-L20-av-v2 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "anicka/nla-qwen2.5-7b-L20-av-v2" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "anicka/nla-qwen2.5-7b-L20-av-v2", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/anicka/nla-qwen2.5-7b-L20-av-v2
- SGLang
How to use anicka/nla-qwen2.5-7b-L20-av-v2 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "anicka/nla-qwen2.5-7b-L20-av-v2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "anicka/nla-qwen2.5-7b-L20-av-v2", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "anicka/nla-qwen2.5-7b-L20-av-v2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "anicka/nla-qwen2.5-7b-L20-av-v2", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use anicka/nla-qwen2.5-7b-L20-av-v2 with Docker Model Runner:
docker model run hf.co/anicka/nla-qwen2.5-7b-L20-av-v2
NLA Activation Verbalizer: Qwen 2.5 7B Layer 20 (v2)
A Natural Language Autoencoder (NLA) activation verbalizer for Qwen 2.5 7B Instruct. Given a hidden-state activation from layer 20 (71% depth), the model produces a natural-language description of what the activation encodes.
What is NLA?
NLA (Natural Language Autoencoder) is a technique for interpreting neural network activations by training a model to describe them in plain language. An activation vector is injected into the model's residual stream at a designated token position, and the model is trained to produce a faithful natural-language description of the semantic content encoded in that vector.
For background see our blog post on HuggingFace.
Model Details
- Base model: Qwen/Qwen2.5-7B-Instruct
- Adapter type: LoRA (rank 32, alpha 64, dropout 0.05)
- Target layer: 20 (71% depth)
- d_model: 3584
- Role: Activation Verbalizer (AV)
Injection Protocol
This is critical β wrong injection will produce garbage.
| Parameter | Value | Notes |
|---|---|---|
| Injection token | γ (U+320E) | token_id 149705 |
| Injection method | Normalize norm to 150.0 | NOT multiply by 150 |
| Prompt template | Includes depth 73% |
See below |
| Attention mask | Must be passed explicitly | pad_token == eos_token causes issues without it |
Normalize vs Multiply β THE COMMON MISTAKE
The activation vector must be normalized so its L2 norm equals 150.0, not multiplied by 150:
# CORRECT: normalize norm TO 150
def normalize_activation(v, target_norm=150.0):
norm = v.float().norm().clamp_min(1e-12)
return v * (target_norm / norm)
injected = normalize_activation(activation, 150.0)
# If activation.norm() == 129, this gives injected.norm() == 150
# WRONG: multiply BY 150
injected = activation * 150.0
# If activation.norm() == 129, this gives injected.norm() == 19,350
# The model was never trained on vectors this large β produces garbage
Complete Usage Example
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
INJECTION_CHAR = "γ"
INJECTION_SCALE = 150.0
LAYER = 20
# --- Load model with adapter ---
base = AutoModelForCausalLM.from_pretrained(
"Qwen/Qwen2.5-7B-Instruct", torch_dtype=torch.bfloat16, device_map="auto"
)
model = PeftModel.from_pretrained(base, "anicka/nla-qwen2.5-7b-L20-av-v2")
model.eval()
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-7B-Instruct")
if tokenizer.pad_token is None:
tokenizer.pad_token = tokenizer.eos_token
injection_id = tokenizer.encode(INJECTION_CHAR, add_special_tokens=False)
assert len(injection_id) == 1, f"Injection char must be single token, got {len(injection_id)}"
injection_token_id = injection_id[0]
# --- Step 1: Extract activation from layer 20 ---
prompt = "Write a Python hello world program"
messages = [{"role": "user", "content": prompt}]
chat_str = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(chat_str, return_tensors="pt").to(model.device)
activation = {}
def hook(mod, inp, out):
h = out[0] if isinstance(out, tuple) else out
if "h" not in activation: # capture FIRST forward pass only
activation["h"] = h[:, -1, :].detach()
inner = model.base_model.model.model
handle = inner.layers[LAYER].register_forward_hook(hook)
with torch.no_grad():
model.generate(**inputs, max_new_tokens=1, pad_token_id=tokenizer.eos_token_id)
handle.remove()
act = activation["h"].squeeze(0)
# --- Step 2: Normalize (NOT multiply) ---
def normalize_activation(v, target_norm):
norm = v.float().norm().clamp_min(1e-12)
return v * (target_norm / norm)
# --- Step 3: Build the verbalization prompt ---
depth_pct = round(100 * (LAYER + 0.5) / 28) # 28 layers in Qwen 2.5 7B
av_prompt = (
"You are a meticulous AI researcher conducting an important investigation "
"into activation vectors from a language model. Your overall task is to "
"describe the semantic content of that activation vector.\n\n"
"We will pass the vector enclosed in <concept> tags into your context, "
"along with the network depth where it was extracted. "
"You must then produce an explanation for the vector, enclosed within "
"<explanation> tags. The explanation consists of 2-3 text snippets "
"describing that vector.\n\n"
f"Here is the vector from depth {depth_pct}% of the network:\n\n"
f"<concept>{INJECTION_CHAR}</concept>\n\n"
"Please provide an explanation.\n\n"
"<explanation>"
)
tokens = tokenizer.encode(av_prompt, add_special_tokens=True)
inject_pos = next(i for i, t in enumerate(tokens) if t == injection_token_id)
input_ids = torch.tensor([tokens], device=model.device)
embeddings = model.get_input_embeddings()(input_ids).clone()
embeddings[0, inject_pos, :] = normalize_activation(
act.to(embeddings.dtype), INJECTION_SCALE
)
# --- Step 4: Generate description ---
with torch.no_grad():
output = model.generate(
inputs_embeds=embeddings,
max_new_tokens=120,
do_sample=False,
pad_token_id=tokenizer.eos_token_id,
)
text = tokenizer.decode(output[0][len(tokens):], skip_special_tokens=True)
if "</explanation>" in text:
text = text.split("</explanation>")[0]
print(text.strip())
Training Pipeline
This model was trained in three stages:
- SFT on clean twin descriptions β supervised fine-tuning on activation-description pairs generated by multiple frontier models (Claude, GPT, Kimi), deduplicated and cleaned to terse bullet format
- Contrastive GRPO β Group Relative Policy Optimization with an activation reconstructor (AR) critic, using random negative samples for contrastive reward
- Hard-negative GRPO (v2) β second round of GRPO using hard negatives: top-20 nearest neighbors by activation cosine similarity, 3 negatives per sample
Hard-Negative GRPO Results
- Gap metric (reward for correct - reward for hardest negative):
- Random negatives (v1): -0.024
- Hard negatives (v2): -0.006
Common Mistakes
- Multiplying by 150 instead of normalizing to 150 β produces vectors 100Γ too large, model collapses to garbage attractors. See injection protocol above.
- Using the wrong adapter β
anicka/nla-qwen25-7b-L20-av(no dash, no v2) is the old SFT-only adapter with a different prompt template (no depth). Use this repo (nla-qwen2.5-7b-L20-av-v2) for GRPO quality. - Omitting depth from prompt β this adapter was trained with
"from depth {N}% of the network"in the prompt. Omitting it degrades output. - Missing attention_mask β when
pad_token == eos_token, pass attention_mask explicitly or unexpected behavior occurs. - Capturing wrong forward pass β during
generate(), the hook fires on every token. Guard withif "h" not in activation:to capture only the first (input) pass.
Related Models
- anicka/nla-qwen25-7b-L20-ar β Activation Reconstructor (AR) critic
- anicka/nla-qwen25-7b-L20-av β Original AV (SFT only, deprecated)
License
Apache 2.0 (same as base model)
- Downloads last month
- 120