Steven Basart's picture

15 3 7

Steven Basart

xksteven

·

http://stevenbas.art

xksteven

AI & ML interests

None yet

Recent Activity

upvoted a paper 2 days ago

HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal

new activity about 1 month ago

cais/MASK:Update README.md

new activity 3 months ago

cais/hle:Please gate the dataset

View all activity

Organizations

xksteven's activity

upvoted a paper 2 days ago

HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal

Paper • 2402.04249 • Published Feb 6, 2024 • 5

upvoted 2 papers 11 months ago

What matters when building vision-language models?

Paper • 2405.02246 • Published May 3, 2024 • 104

Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark

Paper • 2304.03279 • Published Apr 6, 2023 • 2