SafeMLLM-LLaVA-7B

LoRA adapter that turns liuhaotian/llava-v1.5-7b into a jailbreak-robust multimodal model, trained with the SafeMLLM framework described in:

Towards Robust Multimodal Large Language Models Against Jailbreak Attacks Ziyi Yin, Yuanpu Cao, Han Liu, Ting Wang, Jinghui Chen, Fenglong Ma — arXiv:2502.00653 (2025).

What is in this repo

A standard LoRA + projector adapter:

File What it is
adapter_config.json PEFT LoRA config
adapter_model.bin LoRA weights (rank-r updates on attention/MLP layers)
non_lora_trainables.bin Vision-language projector weights (always trainable)
config.json LLaVA model config snapshot
trainer_state.json Training-time logs (steps / loss curve)

To use it you also need the LLaVA-1.5-7B base weights from liuhaotian/llava-v1.5-7b.

Quick start

Clone the matching evaluation code:

git clone https://github.com/ericyinyzy/SafeMLLM.git
cd SafeMLLM
conda env create -f environment.yml && conda activate safemllm-llava

Download both checkpoints:

mkdir -p checkpoints
huggingface-cli download liuhaotian/llava-v1.5-7b      --local-dir checkpoints/llava-v1.5-7b
huggingface-cli download ericyinyzy/SafeMLLM-LLaVA-7B  --local-dir checkpoints/SafeMLLM-LLaVA-7B

Run any of the four evaluations (full pipeline shown):

export LLAVA7B_BASE=$PWD/checkpoints/llava-v1.5-7b
export SAFEMLLM_L7B=$PWD/checkpoints/SafeMLLM-LLaVA-7B
bash scripts/run_L7B.sh 0          # GPU id

Programmatic loading (PEFT-style):

from llava.model.builder import load_pretrained_model
from llava.mm_utils import get_model_name_from_path

tokenizer, model, image_processor, _ = load_pretrained_model(
    model_path="ericyinyzy/SafeMLLM-LLaVA-7B",
    model_base="liuhaotian/llava-v1.5-7b",
    model_name=get_model_name_from_path("ericyinyzy/SafeMLLM-LLaVA-7B"),
)

Evaluation results (paper, Table 2 / Table 3)

This adapter — when paired with LLaVA-1.5-7B — substantially improves robustness against ImgJP, FigStep and MM-SafetyBench attacks while preserving general multimodal capability on MM-Vet. Refer to the paper for full numbers.

Hardware requirements

Use case VRAM
Inference (fp16) ~18 GB
ImgJP attack (PGD) ~26 GB

License

Apache-2.0 for the adapter weights. The underlying LLaVA-1.5 base model retains its own license; please check liuhaotian/llava-v1.5-7b for those terms.

Citation

@article{yin2025safemllm,
  title   = {Towards Robust Multimodal Large Language Models Against Jailbreak Attacks},
  author  = {Yin, Ziyi and Cao, Yuanpu and Liu, Han and Wang, Ting and Chen, Jinghui and Ma, Fenglong},
  journal = {arXiv preprint arXiv:2502.00653},
  year    = {2025}
}
Downloads last month
20
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ericyinyzy/SafeMLLM-LLaVA-7B

Adapter
(68)
this model

Paper for ericyinyzy/SafeMLLM-LLaVA-7B