arxiv:2501.18837
Meg Tong
meg-tong
AI & ML interests
None yet
Recent Activity
authored
a paper
1 day ago
Constitutional Classifiers: Defending against Universal Jailbreaks
across Thousands of Hours of Red Teaming
authored
a paper
about 1 year ago
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety
Training
authored
a paper
about 1 year ago
Steering Llama 2 via Contrastive Activation Addition
Organizations
None yet
models
None public yet