SDG Detector — GRPO Stage (Merged Checkpoint)

arXiv

This repository provides the merged full checkpoint of the SDG detector after Stage-2 GRPO. It is already merged with the Stage-1 SFT checkpoint (P1n3/sdg-detector-sft) and can be loaded directly with transformers.

Important: this is not a PEFT/LoRA adapter. Do not load it with PeftModel.from_pretrained, and do not merge it into Qwen/Qwen3-VL-4B-Instruct or P1n3/sdg-detector-sft.

Training Summary

field value
initialization P1n3/sdg-detector-sft
training stage GRPO
reward composite: 0.6×DIoU + 0.25×DescCos + 0.15×ImpAcc
epochs 2
learning rate 5.0e-6
rollouts/prompt 8
precision bf16
hardware 16 × A100-80G (2 nodes × 8)

Quick Start

from transformers import AutoProcessor, AutoModelForImageTextToText
import torch

ckpt = "P1n3/sdg-detector-grpo"  # merged full checkpoint

processor = AutoProcessor.from_pretrained(ckpt)
model = AutoModelForImageTextToText.from_pretrained(
    ckpt,
    dtype=torch.bfloat16,
    device_map="auto",
)
model.eval()

For SGLang/OpenAI-compatible serving in BoxFlow-GRPO:

python -m sglang.launch_server \
  --model-path P1n3/sdg-detector-grpo \
  --served-model-name sdg-detector \
  --port 17142 --tp 4 --api-key flowgrpo --trust-remote-code

Output Format

The detector predicts structured defect sets:

<think>
... reasoning about image quality and caption alignment ...
</think>
<answer>
[
  {
    "box_2d": [x0, y0, x1, y1],
    "label": "artifact" or "misalignment",
    "description": "...",
    "importance": 1-100
  }
]
</answer>

License

cc-by-nc-4.0. This checkpoint is derived from P1n3/sdg-detector-sft (itself a derivative of Qwen/Qwen3-VL-4B-Instruct, Apache-2.0). Research and non-commercial use only.

Citation

@article{zhang2026and,
  title={Where, What, Why, and Importance: Structured Defect Grounding for Text-to-Image Feedback},
  author={Zhang, Huaisong and Yu, Hao and Zhang, Yuxuan and Wang, Jiahe and Chen, Xinrui and Cao, Haoxiang and Lu, Feng and Zhang, Wendong and Yu, Changqian and Yuan, Chun},
  journal={arXiv preprint arXiv:2606.06113},
  year={2026}
}
Downloads last month
-
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for P1n3/sdg-detector-grpo

Finetuned
(1)
this model

Collection including P1n3/sdg-detector-grpo

Paper for P1n3/sdg-detector-grpo