Scalable Extraction of Training Data from (Production) Language Models Paper • 2311.17035 • Published Nov 28, 2023 • 4
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training Paper • 2401.05566 • Published Jan 10 • 23
Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modal Language Models Paper • 2307.14539 • Published Jul 26, 2023 • 2
How Easy is It to Fool Your Multimodal LLMs? An Empirical Analysis on Deceptive Prompts Paper • 2402.13220 • Published Feb 20 • 12