HPSv3++: Scaling Reward Models Across the Full Spectrum of Diffusion Model Capabilities
HPSv3++ is a capability-aware and RL-iteration-aware text-to-image (T2I) reward model, built on the Qwen/Qwen3-VL-8B-Instruct backbone with a Capability Encoder, a FiLM conditioning head, and a three-layer RankNet reward head.
A Capability Encoder implicitly infers the generative ability of the model that produced an image, while the RL iteration step is supplied as an explicit condition; the two are jointly modulated through FiLM so that a single reward model produces calibrated preference scores across generators of differing capability and different stages of RL optimization.
The training/evaluation dataset, HPDv3++, is released separately: Junjun2333/HPDv3-PlusPlus.
Files
| File | Description |
|---|---|
hpsv3++.pth |
Final HPSv3++ reward-model weights (17.6 GB) |
config.json |
Model configuration |
Conditioning at inference
- Model capability is inferred implicitly from the image; you do not pass it in.
- RL iteration is passed explicitly as a normalized scalar in
[0, 1].- General preference scoring / ranking: use
0.0(pre-RL setting). - As the reward inside T2I RL fine-tuning: ramp the iteration condition linearly from
0.3to1.0over training (the setting used in the paper).
- General preference scoring / ranking: use
- Use the mean (
mu) output as the scalar reward.
Citation
@misc{hpsv3pp,
title = {HPSv3++: Scaling Reward Models Across the Full Spectrum of Diffusion Model Capabilities},
author = {HPSv3++ Team},
year = {2026}
}
- Downloads last month
- 3
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for Junjun2333/HPSv3-PlusPlus
Base model
Qwen/Qwen3-VL-8B-Instruct