Papers
arxiv:2307.10490
(Ab)using Images and Sounds for Indirect Instruction Injection in Multi-Modal LLMs
Published on Jul 19
Authors:
Abstract
We demonstrate how images and sounds can be used for indirect prompt and instruction injection in multi-modal LLMs. An attacker generates an adversarial perturbation corresponding to the prompt and blends it into an image or audio recording. When the user asks the (unmodified, benign) model about the perturbed image or audio, the perturbation steers the model to output the attacker-chosen text and/or make the subsequent dialog follow the attacker's instruction. We illustrate this attack with several proof-of-concept examples targeting LLaVa and PandaGPT.
Models citing this paper 0
No model linking this paper
Cite arxiv.org/abs/2307.10490 in a model README.md to link it from this page.
Datasets citing this paper 0
No dataset linking this paper
Cite arxiv.org/abs/2307.10490 in a dataset README.md to link it from this page.
Spaces citing this paper 0
No Space linking this paper
Cite arxiv.org/abs/2307.10490 in a Space README.md to link it from this page.