Update README.md

2023a5e verified 25 days ago

2.45 kB

license: apache-2.0
pipeline_tag: image-classification
tags:
  - computer-vision
  - image-classification
  - pytorch
library_name: pytorch

Official repository for the paper "Simplicity Prevails: The Emergence of Generalizable AIGI Detection in Visual Foundation Models"(https://arxiv.org/pdf/2602.01738)

If you have any questions, please feel free to open a discussion in the Community tab. For direct inquiries, you can also reach out to us via email at 2450042008@mails.szu.edu.cn.

VFM Baselines Release

This directory contains the 7 vision foundation model baselines used in the paper:

MetaCLIP-Linear
MetaCLIP2-Linear
SigLIP-Linear
SigLIP2-Linear
PE-CLIP-Linear
DINOv2-Linear
DINOv3-Linear

models.py: unified model-loading code for all 7 baselines
test_vfm_baselines.py: unified evaluation script
weights/: released checkpoints
core/vision_encoder/: vendored PE vision encoder code required by PE-CLIP-Linear

Model Names

The unified loader and test script accept these names:

metacliplin
metaclip2lin
sigliplin
siglip2lin
pelin
dinov2lin
dinov3lin

The paper names such as MetaCLIP-Linear and DINOv3-Linear are also accepted.

Usage

Evaluate a single model:

python test_vfm_baselines.py \
  --model sigliplin \
  --real-dir /path/to/0_real \
  --fake-dir /path/to/1_fake \
  --max-samples 100

Evaluate all 7 models:

python test_vfm_baselines.py \
  --model all \
  --real-dir /path/to/0_real \
  --fake-dir /path/to/1_fake \
  --max-samples 100

Optional arguments:

--checkpoint: override the default checkpoint for single-model evaluation
--batch-size: batch size for evaluation
--num-workers: dataloader workers
--device: explicit device such as cuda:0 or cpu
--save-json: save results to a JSON file

Dependencies

The release code expects these Python packages:

torch
torchvision
transformers
scikit-learn
Pillow
timm
einops
ftfy
regex
huggingface_hub

Notes

The clip-family and DINO-family baselines instantiate the backbone from Hugging Face model configs and then load the released checkpoint.
PE-CLIP-Linear uses the vendored core/vision_encoder code in this directory.
The checkpoints in weights/ are arranged locally for packaging convenience. For public release, they can be uploaded as the same filenames.

Lunahera
/

simplicityprevails

VFM Baselines Release

Contents

Model Names

Usage

Dependencies

Notes