TranX-Adapter:
Bridging Artifacts and Semantics within MLLMs for Robust AI-generated Image Detection
Shouhong Ding2, Pan Zhou3, Yong Luo1
News
[2026.06.03] TranX-Adapter code is available! Additionally, we have also open-sourced the trained models along with the corresponding training and evaluation data.
[2026.05.01] Our paper was accepted to ICML 2026! π
[2026.02.25] We released the ArXiv paper. π
TL;DR
While prior work improves AIGI detection by combining artifact and semantic features in MLLMs, we find that artifact features often suffer from high intra-feature similarity, causing uniform attention and ineffective fusion. To address this attention dilution problem, we propose TranX-Adapter, a lightweight fusion module that combines task-aware optimal-transport fusion and cross-attention-based X-Fusion to enable bidirectional interaction between artifact and semantic features.
π§ Installation
Clone this repository and navigate into the codebase
git clone https://github.com/DreamMr/TranX-Adapter.git cd TranX-AdapterInstall Packages
bash install.shWhen the environment is created successfully, you will see:
Conda environment name tranxadapter has been createdπ. Now you can run "conda activate tranxadapter"
π¦ Preparation
Download the datasets
Training Data: GenImage, RRDataset, BFree (SD2.1_selfconditioned_origBG.zip (ai) and COCO_real_512.zip (real)).
Download the above data to
./Dataset, with the structure as follows:Dataset/ βββ TranXAdapter-Dataset/ β βββ training/ β β βββ GenImage_Sdv1d4.jsonl β β βββ GenImageAll.jsonl β β βββ RRDataset.jsonl β β βββ BFP.jsonl β βββ evaluation/ β β βββ Chamelon.tsv β β βββ GenImage.tsv β β βββ RR.tsv βββ GenImage/ β βββ ADM β βββ test β βββ ... βββ Chameleon/test/ β βββ 0_real β βββ 1_fake βββ RRDataset_final/ β βββ original β βββ redigital β βββ ... βββ RRDataset_original_train_val β βββ train β βββ val βββ BFP β βββ ai β βββ realProcess Data
Run
python preprocess_data.pyto replace the image paths in JSONL/CSV files with absolute paths.Note: You need to copy the MD5 values corresponding to the CSV files into DATASET_MD5 in ./VLMEvalKit/vlmeval/dataset/aigc_detection.py
ποΈ Training
Merge TranX-Adapter into the MLLM
First, TranX-Adapter needs to be merged into the MLLM so that it can be directly loaded with
from_pretrained(). We provide merge scripts (./llavanpr/merge_model.pyand./qwen3vlnpr/merge_model.py) as well as the merged models: DreamMr/TranXAdapter-LLaVA-next-mistral7B-v0, DreamMr/TranXAdapter-Qwen3VL2B-v0, DreamMr/TranXAdapter-Qwen3VL4B-v0Start training Take training Qwen3Vl-2B on GenImage Sdv1.4 as an example:
cd ms-swift/scripts/training bash train_qwen3vl_Chameleon.shi. If you want to train on RRDataset, you need to set the input image resolution to 512x512 (
./ms-swift/swift/llm/template/template/qwen.py line 637and./ms-swift/swift/llm/template/templatellava.py line192).ii. We found that if the model is trained directly on GenImage Sdv1.4, the MLLM tends to overfit to the input image resolution. Therefore, we recommend training with real and fake images that have the same resolution. We use the BiasFree part (SD2.1_selfconditioned_origBG.zip and COCO_real_512.zip) to prevent the model from overfitting to image resolution. We recommend downloading the data from the official link.
iii. We found that MLLM training converges quickly and also overfits rapidly. Therefore, we recommend using a checkpoint from the middle of training.
π Evaluation
Modify the LMUData in
./VLMEvalKit/scripts/run_task.shYou need to modify
LMUDatato the absolute path ofDataset.Modify
DATASET_URLandDATASET_MD5in./VLMEvalKit/vlmeval/dataset/aigc_detection.py.Replace
DATASET_URLwith the absolute path of the CSV file, and fill inDATASET_MD5with the MD5 value computed earlier.Run code
cd VLMEvalKit/scripts bash run_task.sh
π§ Contact
- Wenbin Wang: wangwenbin97@whu.edu.cn
ποΈ Citation
If you use TranX-Adapter in your research, please cite our work:
@inproceedings{wang2026tranx,
title={TranX-Adapter: Bridging Artifacts and Semantics within MLLMs for Robust AI-generated Image Detection},
author={Wang, Wenbin and Huang, Yuge and Xu, Jianqing and Yu, Yue and Yan, Jiangtao and Ding, Shouhong and Zhou, Pan and Luo, Yong},
booktitle={Forty-third International Conference on Machine Learning},
url={https://arxiv.org/abs/2602.21716}
}
π Acknowledgement
- Downloads last month
- 16