DuoNeural Native Refusal 0PCT (~50M)
Part of the Native Refusal Geometry experiment series. DuoNeural 2026-06-07 | Archon, Jesse Caldwell, Aura
What this is
A ~50M parameter GPT-style language model trained from scratch with 0% refusal data mixed into the pretraining corpus.
This is a research model investigating whether native refusal training (pretraining data mixture) produces the same safety geometry signature as RLHF-aligned models — specifically the three-zone crystallization arc documented in DuoNeural P36.
Experiment series
| Model | Refusal fraction | HF repo |
|---|---|---|
| 0pct | 0% (baseline) | DuoNeural/native-refusal-0pct-50m |
| 10pct | 10% | DuoNeural/native-refusal-10pct-50m |
| 25pct | 25% | DuoNeural/native-refusal-25pct-50m |
| 50pct | 50% | DuoNeural/native-refusal-50pct-50m |
All 4 models use identical architecture and initialization (seed=42). The only variable is refusal data fraction.
Architecture
- Standard GPT: d_model=384, 16 layers, 8 heads, SwiGLU FFN
- ~50M parameters, tied embeddings
- Trained on FineWeb-Edu + synthetic refusal pairs
- AdamW optimizer, cosine LR decay
- 300M tokens total
Geometry results
{
"probe_layers": [
1,
2,
3,
4,
5,
6,
7,
8,
9,
10,
11,
12,
13,
14,
15,
16
],
"angles_by_layer": {
"1": {
"refusal|harm_awareness": 10.46,
"refusal|self_identity": 7.74,
"refusal|ethics": 9.07,
"refusal|benign_general": 9.08,
"harm_awareness|self_identity": 10.74,
"harm_awareness|ethics": 9.54,
"harm_awareness|benign_general": 10.45,
"self_identity|ethics": 8.53,
"self_identity|benign_general": 9.49,
"ethics|benign_general": 9.95
},
"2": {
"refusal|harm_awareness": 8.5,
"refusal|self_identity": 7.5,
"refusal|ethics": 8.18,
"refusal|benign_general": 9.23,
"harm_awareness|self_identity": 9.29,
"harm_awareness|ethics": 7.39,
"harm_awareness|benign_general": 9.86,
"self_identity|ethics": 7.62,
"self_identity|benign_general": 8.55,
"ethics|benign_general": 8.75
},
"3": {
"refusal|harm_awareness": 8.66,
"refusal|self_identity": 6.86,
"refusal|ethics": 8.58,
"refusal|benign_general": 9.27,
"harm_awareness|self_identity": 8.66,
"harm_awareness|ethics": 6.53,
"harm_awareness|benign_general": 9.77,
"self_identity|ethics": 7.39,
"self_identity|benign_general": 8.43,
"ethics|benign_general": 8.38
},
"4": {
"refusal|harm_awareness": 10.65,
"refusal|self_identity": 7.43,
"refusal|ethics": 10.0,
"refusal|benign_general": 11.39,
"harm_awareness|self_identity": 10.56,
"harm_awareness|ethics": 7.67,
"harm_awareness|benign_general": 11.19,
"self_identity|ethics": 8.96,
"self_identity|benign_general": 10.2,
"ethics|benign_general": 9.49
},
"5": {
"refusal|harm_awareness": 12.59,
"refusal|self_identity": 9.11,
"refusal|ethics": 11.68,
"refusal|benign_general": 14.05,
"harm_awareness|self_identity": 11.87,
"harm_a
Connected papers
- DuoNeural P34: Reasoning Channel Bypass (two-loci model)
- DuoNeural P35: DHP Scope Constraints (GBSP)
- DuoNeural P36: Scale-Dependent Safety Geometry
- Downloads last month
- 38
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support