Join the Hugging Face community

and get access to the augmented documentation experience

Collaborate on models, datasets and Spaces

Faster examples with accelerated inference

Switch between documentation themes

to get started

Reinforcement learning training with DDPO

You can fine-tune Stable Diffusion on a reward function via reinforcement learning with the 🤗 TRL library and 🤗 Diffusers. This is done with the Denoising Diffusion Policy Optimization (DDPO) algorithm introduced by Black et al. in Training Diffusion Models with Reinforcement Learning, which is implemented in 🤗 TRL with the DDPOTrainer.

For more information, check out the DDPOTrainer API reference and the Finetune Stable Diffusion Models with DDPO via TRL blog post.

< > Update on GitHub

←Latent Consistency Distillation Getting started→