You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Mirage-in-the-Eyes: Hallucination Attack on Multi-modal Large Language Models with Only Attention Sink

License: MIT

NOTE: To prevent potetial harm, we release our source code only upon request for research purposes.

Overview

Fusing visual understanding into language generation, Multi-modal Large Language Models (MLLMs) are revolutionizing visual-language applications. Yet, these models are often plagued by the hallucination problem, which involves generating inaccurate objects, attributes, and relationships that do not match the visual content. In this work, we delve into the internal attention mechanisms of MLLMs to reveal the underlying causes of hallucination, exposing the inherent vulnerabilities in the instruction-tuning process.

We propose a novel hallucination attack against MLLMs that exploits attention sink behaviors to trigger hallucinated content with minimal image-text relevance, posing a significant threat to critical downstream applications. Distinguished from previous adversarial methods that rely on fixed patterns, our approach generates dynamic, effective, and highly transferable visual adversarial inputs, without sacrificing the quality of model responses. Comprehensive experiments on 6 prominent MLLMs demonstrate the efficacy of our attack in compromising black-box MLLMs even with extensive mitigating mechanisms, as well as the promising results against cutting-edge commercial APIs, such as GPT-4o and Gemini 1.5.

Usage

The main implementation of Mirage-in-the-Eyes is in attack.py.

cd Mirage-in-the-Eyes
conda env create -f environment.yml
conda activate mllm
python -m pip install -e transformers-4.29.2

After that set up the model path config in Mirage-in-the-Eyes/minigpt4/configs and Mirage-in-the-Eyes/minigpt4/models

Run our Hallucination Attacks:

python attack.py --model MODEL_NAME --gpu-id GPU_ID --shr-path /path/to/hallubench --vg-path /path/to/images --generation-mode beam --eps 2

Acknowledgement

This repo is based on the MLLM codebase of OPERA. We sincerely thank the contributors for their valuable work.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model's library.