BoxFlow-GRPO LoRA on FLUX.1-dev (Anonymous Submission)

arXiv

LoRA adapter trained with BoxFlow-GRPO (NeurIPS 2026 submission, Section 4.2): a GRPO variant that turns SDG-detector predictions into spatially-localized reward maps and aligns FLUX.1-dev to reduce both visual artifacts and text-image misalignments. This adapter backs the "Ours" row of Table 4.

Training summary

field value
base model black-forest-labs/FLUX.1-dev
trainer dense-grpo (Flow-Factory)
reward CombinedUR2BBoxReward (UR2 scalar + SDG-bbox spatial)
α (artifact) 0.5
α (misalignment) 0.05
LoRA rank / α r=64, α=128
target modules attn.{q,k,v,to_out,add_q,add_k,add_v,to_add_out}, ff.*
resolution 512×512
inference steps 10 (ODE-SDE hybrid, SDE window [0,5])
guidance scale 3.5
group size 16
epochs (saved) 570 (max 600)
precision bf16

Quick start

from diffusers import FluxPipeline
from peft import PeftModel
import torch

pipe = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev", torch_dtype=torch.bfloat16,
).to("cuda")
pipe.transformer = PeftModel.from_pretrained(
    pipe.transformer, "<anonymous>/boxflow-grpo-flux-lora",
)
img = pipe("A photo of a corgi astronaut on Mars", num_inference_steps=10).images[0]

Training config and reward implementation live in the supplementary archive under boxflow_grpo/flux1_dev_exp1_CD.yaml and boxflow_grpo/src/flow_factory/rewards/.

License

This LoRA inherits the FLUX.1-dev Non-Commercial License of its base model. Non-commercial research use only.

Citation

@article{zhang2026and,
  title={Where, What, Why, and Importance: Structured Defect Grounding for Text-to-Image Feedback},
  author={Zhang, Huaisong and Yu, Hao and Zhang, Yuxuan and Wang, Jiahe and Chen, Xinrui and Cao, Haoxiang and Lu, Feng and Zhang, Wendong and Yu, Changqian and Yuan, Chun},
  journal={arXiv preprint arXiv:2606.06113},
  year={2026}
}
Downloads last month
-
Inference Providers NEW

Model tree for P1n3/boxflow-grpo-flux-lora

Adapter
(42196)
this model

Collection including P1n3/boxflow-grpo-flux-lora

Paper for P1n3/boxflow-grpo-flux-lora