Diffusers documentation

Reinforcement learning training with DDPO

You are viewing v0.30.2 version. A newer version v0.31.0 is available.
Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Reinforcement learning training with DDPO

You can fine-tune Stable Diffusion on a reward function via reinforcement learning with the 🤗 TRL library and 🤗 Diffusers. This is done with the Denoising Diffusion Policy Optimization (DDPO) algorithm introduced by Black et al. in Training Diffusion Models with Reinforcement Learning, which is implemented in 🤗 TRL with the DDPOTrainer.

For more information, check out the DDPOTrainer API reference and the Finetune Stable Diffusion Models with DDPO via TRL blog post.

< > Update on GitHub