agency-cbm โ Hierarchical Concept Bottleneck Model for Preserving Human Agency
Detects agency-eroding dynamics (sycophancy, option narrowing, dependency invitation, decision transfer, pushback decay, ...) in long multi-turn conversations. Frozen Qwen3-0.6B backbone -> per-turn attention-pooled concept bottleneck (15 named concepts) -> causal aggregator over concept vectors -> 5 trajectory concepts per conversation prefix.
Checkpoints (only bottleneck/aggregator weights; load the backbone separately):
| file | recipe | held-out AUROC (turn / trajectory) |
|---|---|---|
cbm_v3.pt |
frozen backbone, per-concept pooling, augmented data | 0.96 / 0.99 |
cbm_v5_lora.pt + cbm_v5_lora.lora_adapter/ |
+ LoRA stage-ii joint fine-tuning | 0.965 / 0.995 |
Usage, training code, dataset generator, and the full experiment series (MLflow store included) live in the project repository. Trained on a single NVIDIA DGX Spark (GB10).
Dual-use firewall: user-state concepts are detection-only and excluded from any steering interface by construction.
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support