agency-cbm โ€” Hierarchical Concept Bottleneck Model for Preserving Human Agency

Detects agency-eroding dynamics (sycophancy, option narrowing, dependency invitation, decision transfer, pushback decay, ...) in long multi-turn conversations. Frozen Qwen3-0.6B backbone -> per-turn attention-pooled concept bottleneck (15 named concepts) -> causal aggregator over concept vectors -> 5 trajectory concepts per conversation prefix.

Checkpoints (only bottleneck/aggregator weights; load the backbone separately):

file recipe held-out AUROC (turn / trajectory)
cbm_v3.pt frozen backbone, per-concept pooling, augmented data 0.96 / 0.99
cbm_v5_lora.pt + cbm_v5_lora.lora_adapter/ + LoRA stage-ii joint fine-tuning 0.965 / 0.995

Usage, training code, dataset generator, and the full experiment series (MLflow store included) live in the project repository. Trained on a single NVIDIA DGX Spark (GB10).

Dual-use firewall: user-state concepts are detection-only and excluded from any steering interface by construction.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for drsis/agency-cbm

Finetuned
Qwen/Qwen3-0.6B
Finetuned
(987)
this model