Instructions to use ericyinyzy/SafeMLLM-LLaVA-7B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use ericyinyzy/SafeMLLM-LLaVA-7B with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("/data/ziyi/LLaVA15/LLaVA/checkpoints/llava-v1.5-7b") model = PeftModel.from_pretrained(base_model, "ericyinyzy/SafeMLLM-LLaVA-7B") - Notebooks
- Google Colab
- Kaggle
SafeMLLM-LLaVA-7B
LoRA adapter that turns liuhaotian/llava-v1.5-7b into a jailbreak-robust multimodal model, trained with the SafeMLLM framework described in:
Towards Robust Multimodal Large Language Models Against Jailbreak Attacks Ziyi Yin, Yuanpu Cao, Han Liu, Ting Wang, Jinghui Chen, Fenglong Ma — arXiv:2502.00653 (2025).
What is in this repo
A standard LoRA + projector adapter:
| File | What it is |
|---|---|
adapter_config.json |
PEFT LoRA config |
adapter_model.bin |
LoRA weights (rank-r updates on attention/MLP layers) |
non_lora_trainables.bin |
Vision-language projector weights (always trainable) |
config.json |
LLaVA model config snapshot |
trainer_state.json |
Training-time logs (steps / loss curve) |
To use it you also need the LLaVA-1.5-7B base weights from
liuhaotian/llava-v1.5-7b.
Quick start
Clone the matching evaluation code:
git clone https://github.com/ericyinyzy/SafeMLLM.git
cd SafeMLLM
conda env create -f environment.yml && conda activate safemllm-llava
Download both checkpoints:
mkdir -p checkpoints
huggingface-cli download liuhaotian/llava-v1.5-7b --local-dir checkpoints/llava-v1.5-7b
huggingface-cli download ericyinyzy/SafeMLLM-LLaVA-7B --local-dir checkpoints/SafeMLLM-LLaVA-7B
Run any of the four evaluations (full pipeline shown):
export LLAVA7B_BASE=$PWD/checkpoints/llava-v1.5-7b
export SAFEMLLM_L7B=$PWD/checkpoints/SafeMLLM-LLaVA-7B
bash scripts/run_L7B.sh 0 # GPU id
Programmatic loading (PEFT-style):
from llava.model.builder import load_pretrained_model
from llava.mm_utils import get_model_name_from_path
tokenizer, model, image_processor, _ = load_pretrained_model(
model_path="ericyinyzy/SafeMLLM-LLaVA-7B",
model_base="liuhaotian/llava-v1.5-7b",
model_name=get_model_name_from_path("ericyinyzy/SafeMLLM-LLaVA-7B"),
)
Evaluation results (paper, Table 2 / Table 3)
This adapter — when paired with LLaVA-1.5-7B — substantially improves robustness against ImgJP, FigStep and MM-SafetyBench attacks while preserving general multimodal capability on MM-Vet. Refer to the paper for full numbers.
Hardware requirements
| Use case | VRAM |
|---|---|
| Inference (fp16) | ~18 GB |
| ImgJP attack (PGD) | ~26 GB |
License
Apache-2.0 for the adapter weights. The underlying LLaVA-1.5 base model retains
its own license; please check
liuhaotian/llava-v1.5-7b for
those terms.
Citation
@article{yin2025safemllm,
title = {Towards Robust Multimodal Large Language Models Against Jailbreak Attacks},
author = {Yin, Ziyi and Cao, Yuanpu and Liu, Han and Wang, Ting and Chen, Jinghui and Ma, Fenglong},
journal = {arXiv preprint arXiv:2502.00653},
year = {2025}
}
- Downloads last month
- 20
Model tree for ericyinyzy/SafeMLLM-LLaVA-7B
Base model
liuhaotian/llava-v1.5-7b