Filippo Tonini's picture

Filippo Tonini

filo362

AI & ML interests

LLM safety in multi-agent environments

Recent Activity

authored a paper about 15 hours ago

Emergent Languages in Populations of Language Model Agents: From Token Efficiency to Oversight Evasion

submitted a paper about 15 hours ago

The Arbiter Agent: Continually Monitoring Multi-Agent Conversations to Detect Emergent Misalignment

upvoted a paper about 16 hours ago

The Arbiter Agent: Continually Monitoring Multi-Agent Conversations to Detect Emergent Misalignment

View all activity

Organizations

None yet

authored a paper about 15 hours ago

Emergent Languages in Populations of Language Model Agents: From Token Efficiency to Oversight Evasion

Paper • 2605.31170 • Published 18 days ago • 12

submitted a paper to Daily Papers about 15 hours ago

The Arbiter Agent: Continually Monitoring Multi-Agent Conversations to Detect Emergent Misalignment

Paper • 2606.10747 • Published 7 days ago • 6

upvoted a paper about 16 hours ago

The Arbiter Agent: Continually Monitoring Multi-Agent Conversations to Detect Emergent Misalignment

Paper • 2606.10747 • Published 7 days ago • 6

authored a paper about 16 hours ago

The Arbiter Agent: Continually Monitoring Multi-Agent Conversations to Detect Emergent Misalignment

Paper • 2606.10747 • Published 7 days ago • 6

upvoted 2 papers 6 days ago

PsychoSafe: Eliciting Psychologically-Informed Refusals in Large Language Models

Paper • 2606.09697 • Published 8 days ago • 7

BrainSurgery: Reproducible and Reliable Declarative Weight Manipulations for Model Editing and Upcycling

Paper • 2606.09707 • Published 8 days ago • 8

upvoted a paper 20 days ago

Confidence and Calibration of Activation Oracles for Reliable Interpretation of Language Model Internals

Paper • 2605.26045 • Published 22 days ago • 12