Cannot or Should Not? Automatic Analysis of Refusal Composition in IFT/RLHF Datasets and Refusal Behavior of Black-Box LLMs Paper • 2412.16974 • Published Dec 22, 2024 • 1
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training Paper • 2403.09611 • Published Mar 14, 2024 • 126
Black-Box Access is Insufficient for Rigorous AI Audits Paper • 2401.14446 • Published Jan 25, 2024 • 3