Exploiting Instruction-Following Retrievers for Malicious Information Retrieval Paper • 2503.08644 • Published 25 days ago • 16
SafeArena: Evaluating the Safety of Autonomous Web Agents Paper • 2503.04957 • Published 30 days ago • 18
Social Bias Probing: Fairness Benchmarking for Language Models Paper • 2311.09090 • Published Nov 15, 2023 • 2
Societal Alignment Frameworks Can Improve LLM Alignment Paper • 2503.00069 • Published Feb 27 • 16 • 2
Survey of Cultural Awareness in Language Models: Text and Beyond Paper • 2411.00860 • Published Oct 30, 2024 • 23