-
Neural Network Diffusion
Paper • 2402.13144 • Published • 93 -
Genie: Generative Interactive Environments
Paper • 2402.15391 • Published • 67 -
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models
Paper • 2402.17177 • Published • 87 -
VisionLLaMA: A Unified LLaMA Interface for Vision Tasks
Paper • 2403.00522 • Published • 40
Collections
Discover the best community collections!
Collections including paper arxiv:2404.13208
-
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
Paper • 2401.05566 • Published • 24 -
Weak-to-Strong Jailbreaking on Large Language Models
Paper • 2401.17256 • Published • 14 -
Robust Prompt Optimization for Defending Language Models Against Jailbreaking Attacks
Paper • 2401.17263 • Published • 1 -
Summon a Demon and Bind it: A Grounded Theory of LLM Red Teaming in the Wild
Paper • 2311.06237 • Published • 1
-
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
Paper • 2401.05566 • Published • 24 -
Weak-to-Strong Jailbreaking on Large Language Models
Paper • 2401.17256 • Published • 14 -
How Easy is It to Fool Your Multimodal LLMs? An Empirical Analysis on Deceptive Prompts
Paper • 2402.13220 • Published • 12 -
The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions
Paper • 2404.13208 • Published • 38
-
Scalable Extraction of Training Data from (Production) Language Models
Paper • 2311.17035 • Published • 4 -
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
Paper • 2401.05566 • Published • 24 -
Exploiting Novel GPT-4 APIs
Paper • 2312.14302 • Published • 11 -
The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions
Paper • 2404.13208 • Published • 38