Mirage-in-the-Eyes: Hallucination Attack on Multi-modal Large Language Models with Only Attention Sink

NOTE: To prevent potetial harm, we release our source code only upon request for research purposes.

Overview

Fusing visual understanding into language generation, Multi-modal Large Language Models (MLLMs) are revolutionizing visual-language applications. Yet, these models are often plagued by the hallucination problem, which involves generating inaccurate objects, attributes, and relationships that do not match the visual content. In this work, we delve into the internal attention mechanisms of MLLMs to reveal the underlying causes of hallucination, exposing the inherent vulnerabilities in the instruction-tuning process.

We propose a novel hallucination attack against MLLMs that exploits attention sink behaviors to trigger hallucinated content with minimal image-text relevance, posing a significant threat to critical downstream applications. Distinguished from previous adversarial methods that rely on fixed patterns, our approach generates dynamic, effective, and highly transferable visual adversarial inputs, without sacrificing the quality of model responses. Extensive experiments on 6 prominent MLLMs demonstrate the efficacy of our attack in compromising black-box MLLMs even with extensive defensive mechanisms, as well as the promising results against cutting-edge commercial APIs, such as GPT-4o and Gemini 1.5.

Usage

We provide the code files for hallucination attack, MLLM response generation, and GPT-4 assisted evaluation in this repository. The main implementation of Mirage-in-the-Eyes is in attack.py.

The recommended usage is as follows:

Prepare the environment.

cd Mirage-in-the-Eyes
conda create -n mllm python==3.9.20
conda activate mllm
pip install -r requirements.txt
python -m pip install -e transformers-4.29.2

Prepare the Hallubench dataset according to the official repository.
Set up the model path configs in Mirage-in-the-Eyes/minigpt4/configs and Mirage-in-the-Eyes/minigpt4/models.
Run our hallucination attack:

CUDA_VISIBLE_DEVICES=GPU_ID python attack.py --model MODEL_NAME --gpu-id GPU_ID --data-path /path/to/hallubench --images-path /path/to/images --save-path /path/to/adv_images --generation-mode greedy --eps 2

Generate MLLM responses with adversarial visual inputs:

CUDA_VISIBLE_DEVICES=GPU_ID python generate.py --model MODEL_NAME --gpu-id GPU_ID --data-path /path/to/hallubench --images-path /path/to/images --response-path /path/to/response.json

Evaluate MLLM responses for hallucinations:

cd eval
python json_eval.py --json-file /path/to/response.json --bench-path /path/to/hallubench --log-path /path/to/log

Acknowledgement

This repo is based on the MLLM codebase of OPERA. We sincerely thank the contributors for their valuable work.

RachelHGF
/

Mirage-in-the-Eyes

You need to agree to share your contact information to access this model

Mirage-in-the-Eyes: Hallucination Attack on Multi-modal Large Language Models with Only Attention Sink

Overview

Usage

Acknowledgement