Emergent Languages in Populations of Language Model Agents: From Token Efficiency to Oversight Evasion Paper • 2605.31170 • Published 18 days ago • 12
The Arbiter Agent: Continually Monitoring Multi-Agent Conversations to Detect Emergent Misalignment Paper • 2606.10747 • Published 7 days ago • 6
The Arbiter Agent: Continually Monitoring Multi-Agent Conversations to Detect Emergent Misalignment Paper • 2606.10747 • Published 7 days ago • 6
The Arbiter Agent: Continually Monitoring Multi-Agent Conversations to Detect Emergent Misalignment Paper • 2606.10747 • Published 7 days ago • 6
PsychoSafe: Eliciting Psychologically-Informed Refusals in Large Language Models Paper • 2606.09697 • Published 8 days ago • 7
BrainSurgery: Reproducible and Reliable Declarative Weight Manipulations for Model Editing and Upcycling Paper • 2606.09707 • Published 8 days ago • 8
Confidence and Calibration of Activation Oracles for Reliable Interpretation of Language Model Internals Paper • 2605.26045 • Published 22 days ago • 12