--- license: creativeml-openrail-m language: - en library_name: diffusers pipeline_tag: text-to-image tags: - stable-diffusion - stable-diffusion-diffusers - text-to-image inference: parameters: num_inference_steps: 50 guidance_scale: 5.0 eta: 1.0 widget: - text: "a horse playing chess" example_title: horse + chess - text: "a lion washing dishes" example_title: lion + dishes - text: "a goat riding a bike" example_title: goat + bike --- # ddpo-alignment This model was finetuned from [Stable Diffusion v1-4](https:/CompVis/stable-diffusion-v1-4) using [DDPO](https://arxiv.org/abs/2305.13301) and a reward function that uses [LLaVA](https://llava-vl.github.io/) to measure prompt-image alignment. See [the project website](https://rl-diffusion.github.io/) for more details. The model was finetuned for 200 iterations with a batch size of 256 samples per iteration. During finetuning, we used prompts of the form: "_a(n) \ \_". We selected the animal and activity from the following lists, so try those for the best results. However, we also observed limited generalization to other prompts. Activities: - washing dishes - playing chess - riding a bike Animals: - cat - dog - horse - monkey - rabbit - zebra - spider - bird - sheep - deer - cow - goat - lion - tiger - bear - raccoon - fox - wolf - lizard - beetle - ant - butterfly - fish - shark - whale - dolphin - squirrel - mouse - rat - snake - turtle - frog - chicken - duck - goose - bee - pig - turkey - fly - llama - camel - bat - gorilla - hedgehog - kangaroo