arxiv:2605.27355
Dongyoon Hahm
Hahmdong
AI & ML interests
AI Safety
Recent Activity
submitted a paper 1 day ago
Alignment Tampering: How Reinforcement Learning from Human Feedback Is Exploited to Optimize Misaligned Biases authored a paper 2 days ago
Alignment Tampering: How Reinforcement Learning from Human Feedback Is Exploited to Optimize Misaligned BiasesOrganizations
None yet