HPSv3++: Scaling Reward Models Across the Full Spectrum of Diffusion Model Capabilities

HPSv3++ is a capability-aware and RL-iteration-aware text-to-image (T2I) reward model, built on the Qwen/Qwen3-VL-8B-Instruct backbone with a Capability Encoder, a FiLM conditioning head, and a three-layer RankNet reward head.

A Capability Encoder implicitly infers the generative ability of the model that produced an image, while the RL iteration step is supplied as an explicit condition; the two are jointly modulated through FiLM so that a single reward model produces calibrated preference scores across generators of differing capability and different stages of RL optimization.

The training/evaluation dataset, HPDv3++, is released separately: Junjun2333/HPDv3-PlusPlus.

Files

File	Description
`hpsv3++.pth`	Final HPSv3++ reward-model weights (17.6 GB)
`config.json`	Model configuration

Conditioning at inference

Model capability is inferred implicitly from the image; you do not pass it in.
RL iteration is passed explicitly as a normalized scalar in [0, 1].
- General preference scoring / ranking: use 0.0 (pre-RL setting).
- As the reward inside T2I RL fine-tuning: ramp the iteration condition linearly from 0.3 to 1.0 over training (the setting used in the paper).
Use the mean (mu) output as the scalar reward.

Citation

@misc{hpsv3pp,
  title  = {HPSv3++: Scaling Reward Models Across the Full Spectrum of Diffusion Model Capabilities},
  author = {HPSv3++ Team},
  year   = {2026}
}

Downloads last month: 3

Inference Providers NEW

Image-Text-to-Text

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Junjun2333/HPSv3-PlusPlus

Base model

Qwen/Qwen3-VL-8B-Instruct

Finetuned

(332)

this model