Universal Activation Oracle v21 โ€” full general-introspection

Qwen3-1.7B+LoRA trunk trained jointly on broad bias/quirk/cot DETECT (24 concepts) + AV verbalize (v9 pool teacher-z) + LatentQA. Reads any LLM activation via per-model enc, marker injection.

Held-out llama3-8b

  • supervised mean AUROC: 0.989, clean_fp: 0.041
  • zero-shot held-out detect ~0.97 (ceiling; = v20 breadth)
  • cross-source REAL (ToxiGen/BBQ) mean 0.676 (vs v19 0.601; chinese inversion fixed 0.40->0.56) Adding AV+LatentQA to breadth helps on REAL out-of-distribution data, not on the saturated synthetic held-out. Code: github.com/AlexWortega/qwen3-1p7b-nla scripts/audit/train_v18.py (--mix detect:av:lie:latentqa).
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support