TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks Paper • 2412.14161 • Published 4 days ago • 41
SciDr at SDU-2020: IDEAS -- Identifying and Disambiguating Everyday Acronyms for Scientific Domain Paper • 2102.08818 • Published Feb 17, 2021
WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models Paper • 2406.18510 • Published Jun 26 • 8
PolygloToxicityPrompts: Multilingual Evaluation of Neural Toxic Degeneration in Large Language Models Paper • 2405.09373 • Published May 15 • 1
PolygloToxicityPrompts: Multilingual Evaluation of Neural Toxic Degeneration in Large Language Models Paper • 2405.09373 • Published May 15 • 1
SOTOPIA-$π$: Interactive Learning of Socially Intelligent Language Agents Paper • 2403.08715 • Published Mar 13 • 20
WebArena: A Realistic Web Environment for Building Autonomous Agents Paper • 2307.13854 • Published Jul 25, 2023 • 23