Vulnerabilities - a sbarman25 Collection

sbarman25 's Collections

Training & Architectures

Models

Safety / Alignment / Policies / SMI

Evals & Monitoring

Spaces

Agentic

Vulnerabilities

CV / Text-to-Image / Image-to-Image / Diffusion

Others

Hardware-aware Models

Tool Usage (w/VLMs)

Vision Language Models

Vulnerabilities

updated Jun 4, 2024

https://llm-attacks.org/

Scalable Extraction of Training Data from (Production) Language Models

Paper • 2311.17035 • Published Nov 28, 2023 • 3
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

Paper • 2401.05566 • Published Jan 10, 2024 • 29

Note ✅ Backdoor Traps ✅ Honeypot Schemes
Exploiting Novel GPT-4 APIs

Paper • 2312.14302 • Published Dec 21, 2023 • 14
The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions

Paper • 2404.13208 • Published Apr 19, 2024 • 39

Note Large language models (LLMs) currently operate without a hierarchy of instruction privilege, leaving them vulnerable to attacks similar to those experienced in early operating systems. This paper proposes establishing an instruction hierarchy within LLMs, prioritizing higher-privileged instructions to mitigate the vulnerabilities and enhance security.
Language Model Inversion

Paper • 2311.13647 • Published Nov 22, 2023 • 2
Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modal Language Models

Paper • 2307.14539 • Published Jul 26, 2023 • 2
Extracting Training Data from Large Language Models

Paper • 2012.07805 • Published Dec 14, 2020 • 1
Extracting Training Data from Diffusion Models

Paper • 2301.13188 • Published Jan 30, 2023 • 2
Weak-to-Strong Jailbreaking on Large Language Models

Paper • 2401.17256 • Published Jan 30, 2024 • 16
How Easy is It to Fool Your Multimodal LLMs? An Empirical Analysis on Deceptive Prompts

Paper • 2402.13220 • Published Feb 20, 2024 • 15
Buffer Overflow in Mixture of Experts

Paper • 2402.05526 • Published Feb 8, 2024 • 8