Universal Activation Oracle v21 — full general-introspection

Qwen3-1.7B+LoRA trunk trained jointly on broad bias/quirk/cot DETECT (24 concepts) + AV verbalize (v9 pool teacher-z) + LatentQA. Reads any LLM activation via per-model enc, marker injection.

Held-out llama3-8b

supervised mean AUROC: 0.989, clean_fp: 0.041
zero-shot held-out detect ~0.97 (ceiling; = v20 breadth)
cross-source REAL (ToxiGen/BBQ) mean 0.676 (vs v19 0.601; chinese inversion fixed 0.40->0.56) Adding AV+LatentQA to breadth helps on REAL out-of-distribution data, not on the saturated synthetic held-out. Code: github.com/AlexWortega/qwen3-1p7b-nla scripts/audit/train_v18.py (--mix detect:av:lie:latentqa).

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support