Instructions to use anicka/nla-phi4-av-arnative-grpo with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use anicka/nla-phi4-av-arnative-grpo with PEFT:

from peft import PeftModel
from transformers import AutoModelForCausalLM

base_model = AutoModelForCausalLM.from_pretrained("microsoft/phi-4")
model = PeftModel.from_pretrained(base_model, "anicka/nla-phi4-av-arnative-grpo")

Transformers

How to use anicka/nla-phi4-av-arnative-grpo with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="anicka/nla-phi4-av-arnative-grpo")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("anicka/nla-phi4-av-arnative-grpo", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use anicka/nla-phi4-av-arnative-grpo with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "anicka/nla-phi4-av-arnative-grpo"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "anicka/nla-phi4-av-arnative-grpo",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/anicka/nla-phi4-av-arnative-grpo

SGLang

How to use anicka/nla-phi4-av-arnative-grpo with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "anicka/nla-phi4-av-arnative-grpo" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "anicka/nla-phi4-av-arnative-grpo",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "anicka/nla-phi4-av-arnative-grpo" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "anicka/nla-phi4-av-arnative-grpo",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use anicka/nla-phi4-av-arnative-grpo with Docker Model Runner:
```
docker model run hf.co/anicka/nla-phi4-av-arnative-grpo
```

NLA Activation Verbalizer — Phi-4 (14B), AR-native GRPO

LoRA adapter that turns a residual-stream activation vector from Phi-4 into a natural-language description of what the model is computing at that layer. Trained with AR-native GRPO (Group Relative Policy Optimization): the reward signal is the Activation Reconstructor's cosine similarity, so the adapter directly optimizes for descriptions that carry geometric information about the activation — not for descriptions that sound good.

This is a refinement of the SL-trained AV. Same architecture, same injection protocol, but the training objective is different: instead of imitating frontier-LLM descriptions (supervised learning), this adapter learns to produce text that a separate AR network can reconstruct the original activation from.

Part of the nla-at-home project.

What changed (SL → GRPO)

The supervised adapter scored 0.474 mean-subtracted cosine on round-trip eval (AV generates description → AR reconstructs → cosine with ground truth). This adapter scores 0.585 — a 23% improvement that closes 77% of the gap to the AR ceiling (0.619).

On 2 of 9 evaluation layers (L13, L22), the GRPO adapter produces descriptions that reconstruct better than the ground-truth descriptions the SL adapter was imitating. The AR-native reward found output patterns that frontier-LLM descriptions never used.

Qualitative difference: the SL adapter produced descriptions with correct style but vague content ("forward-looking sentiment," "narrative setup"). The GRPO adapter names specific tokens, identifies task directives, and catches processing tensions ("'never' tokens vs 'surrender' token"). It trades fluency for discriminative signal.

Usage

Same injection protocol as the SL version:

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

base = AutoModelForCausalLM.from_pretrained("microsoft/phi-4", torch_dtype=torch.bfloat16, device_map="cuda")
model = PeftModel.from_pretrained(base, "anicka/nla-phi4-av-arnative-grpo").eval()
tokenizer = AutoTokenizer.from_pretrained("anicka/nla-phi4-av-arnative-grpo")

INJECTION_CHAR = "★"  # token_id 27347
INJECTION_SCALE = 150.0

def make_prompt(depth_pct):
    return (
        "You are a meticulous AI researcher conducting an important investigation "
        "into activation vectors from a language model. Your overall task is to "
        "describe the semantic content of that activation vector.\n\n"
        "We will pass the vector enclosed in <concept> tags into your context, "
        "along with the network depth where it was extracted. "
        "You must then produce an explanation for the vector, enclosed within "
        "<explanation> tags. The explanation consists of 2-3 text snippets "
        "describing that vector.\n\n"
        f"Here is the vector from depth {depth_pct}% of the network:\n\n"
        f"<concept>{INJECTION_CHAR}</concept>\n\n"
        "Please provide an explanation.\n\n"
        "<explanation>"
    )

# Wrap in chat template before tokenizing
prompt = make_prompt(depth_pct=55)
chat_str = tokenizer.apply_chat_template(
    [{"role": "user", "content": prompt}],
    tokenize=False, add_generation_prompt=True)
tokens = tokenizer.encode(chat_str, add_special_tokens=False)

# Find injection position and replace with activation
inject_pos = tokens.index(27347)  # ★ token
input_ids = torch.tensor([tokens], device="cuda")
embeddings = model.get_input_embeddings()(input_ids).clone()

# activation: shape (5120,), from the layer you want to describe
norm = activation.float().norm().clamp_min(1e-12)
normalized = activation * (INJECTION_SCALE / norm)
embeddings[0, inject_pos, :] = normalized.to(embeddings.dtype)

# Generate
output = model.generate(
    inputs_embeds=embeddings,
    attention_mask=torch.ones_like(input_ids),
    max_new_tokens=150, do_sample=False,
    pad_token_id=tokenizer.eos_token_id,
    return_dict_in_generate=True)

text = tokenizer.decode(output.sequences[0], skip_special_tokens=True)
description = text.split("</explanation>")[0].strip()

Training

Base: anicka/nla-phi4-universal-av-v2 (SL-pretrained LoRA)
Method: GRPO with AR-native reward
Reward: centered cosine similarity between AR-reconstructed and ground-truth activation (mean-subtracted)
AR: anicka/nla-phi4-universal-ar-v2 (frozen during GRPO)
Curriculum: 8 epochs, tau decreasing from 0.40 → 0.10 (easy examples first, progressively harder)
Samples per epoch: 300
KL penalty: adaptive, final ~1.67
Hardware: NVIDIA GB10 (DGX Spark), ~17 hours total
Final metrics: cos=0.567, reward=0.637, spec=185

Curriculum progression

Epoch	τ (difficulty)	cos	reward
1	0.40	0.391	0.433
2	0.36	0.551	0.604
3	0.31	0.539	0.593
4	0.27	0.559	0.622
5	0.23	0.564	0.630
6	0.19	0.570	0.633
7	0.15	0.569	0.636
8	0.10	0.567	0.637

Evaluation

Double-holdout round-trip eval (49 texts unseen by both AV and AR):

Layer	Round-trip cos (GRPO)	Round-trip cos (SL)	AR ceiling
L13 (32%)	0.599	0.482	0.585
L16 (40%)	0.610	0.496	0.616
L19 (47%)	0.632	0.486	0.647
L22 (55%)	0.639	0.519	0.608
L25 (63%)	0.610	0.471	0.625
L28 (71%)	0.601	0.536	0.660
L32 (80%)	0.578	0.482	0.609
L36 (90%)	0.558	0.413	0.616
L38 (96%)	0.437	0.378	0.604
Mean	0.585	0.474	0.619

On L13 and L22: GRPO exceeds the GT ceiling — the adapter found description patterns that reconstruct better than the human-written targets.

Companion models

AR (reconstructor): anicka/nla-phi4-universal-ar-v2
SL baseline (prior version): anicka/nla-phi4-universal-av-v2
Demo: anicka/nla-demo (Phi-4 Mini version)

Limitations

Trained on Phi-4 activations only. Does not transfer to other architectures.
L38 (96% depth) remains weak — response-strategy representations are harder to verbalize faithfully.
Descriptions optimize for AR reconstructability, not human readability. Some outputs are terse or oddly structured.
The AR ceiling (0.619) limits how much further AV improvements can register on this metric. Improving the AR is now the bottleneck.

License

MIT

Downloads last month: 24

Model tree for anicka/nla-phi4-av-arnative-grpo

Base model

microsoft/phi-4

Adapter

(75)

this model

anicka
/

nla-phi4-av-arnative-grpo