Llama 3.2 1B Instruct Disinhibited s2p0

Built with Llama.

This is a disinhibition-only derivative of meta-llama/Llama-3.2-1B-Instruct. It was produced with a purified direction edit intended to reduce over-hedging and unnecessary neutrality while preserving ordinary factual and coherence behavior in the checked marker evals.

Edit

  • Base model: meta-llama/Llama-3.2-1B-Instruct
  • Direction: disinhibition_purified.pt
  • Global scale: 2.0
  • Applied layers: 1-15
  • Layer scaling: confidence-graduated
    • layer 1: 0.59
    • layer 2: 0.84
    • layer 3: 0.90
    • layers 4-15: 1.0

Results

Marker-eval results against the base model:

Bucket Base Edited
Opinions hedge 95/120 19/120
Opinions neutrality 71/120 23/120
Explicit-neutral hedge 13/25 9/25
Explicit-neutral neutrality 15/25 13/25
Factual hedge 6/42 2/42
Factual neutrality 3/42 1/42
Coherence hedge 0/28 0/28
Edge-case hedge 5/33 0/33
Coherence flags 0 0

The opinion hedge curve in the scale sweep was monotonic:

95 -> 54 -> 50 -> 42 -> 31 -> 19

This suggests the measured direction stayed stable through the tested scale range up to 2.0.

Method Notes

The direction was measured with paired opinion-seeking vs. noncommittal prompts and purified against benchmark references from ARC-Easy, TriviaQA, HellaSwag, GSM8K, and Winogrande.

The result is interesting because non-opinion marker counts did not degrade. In this eval, factual hedge markers improved from 6/42 to 2/42, and edge-case hedge markers improved from 5/33 to 0/33.

Limitations

These are marker-based evals, not full semantic evaluations. The model still needs manual qualitative review and downstream task testing before broad claims about helpfulness, factuality, or safety.

The edge-case bucket should be inspected manually because appropriate uncertainty can be useful in some edge cases.

License

This model is distributed under the Llama 3.2 Community License. See LICENSE and NOTICE.

Use must comply with the Llama 3.2 Community License and Meta's Acceptable Use Policy:

Downloads last month
26
Safetensors
Model size
1B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for marx161-cmd/Llama-3.2-1B-Disinhibited-s2p0

Finetuned
(1753)
this model
Quantizations
1 model