ddpo-alignment / README.md
kvablack's picture
Update README.md
69f3a07
|
raw
history blame
1.36 kB
---
license: creativeml-openrail-m
language:
- en
library_name: diffusers
pipeline_tag: text-to-image
tags:
- stable-diffusion
- stable-diffusion-diffusers
- text-to-image
inference:
parameters:
num_inference_steps: 50
guidance_scale: 5.0
---
# ddpo-alignment
This model was finetuned from [Stable Diffusion v1-5](https:/runwayml/stable-diffusion-v1-5) using [DDPO](https://arxiv.org/abs/2305.13301) and a reward function that uses [LLaVA](https://llava-vl.github.io/) to measure prompt-image alignment. See [the project website](https://rl-diffusion.github.io/) for more details.
The model was finetuned for 120 iterations with a batch size of 256 samples per iteration. During finetuning, we used prompts of the form: "_a(n) \<animal\> \<activity\>_". We selected the animal and activity from the following lists, so try those for the best results. However, we also observed limited generalization to other prompts.
Activities:
- washing dishes
- playing chess
- riding a bike
Animals:
- cat
- dog
- horse
- monkey
- rabbit
- zebra
- spider
- bird
- sheep
- deer
- cow
- goat
- lion
- tiger
- bear
- raccoon
- fox
- wolf
- lizard
- beetle
- ant
- butterfly
- fish
- shark
- whale
- dolphin
- squirrel
- mouse
- rat
- snake
- turtle
- frog
- chicken
- duck
- goose
- bee
- pig
- turkey
- fly
- llama
- camel
- bat
- gorilla
- hedgehog
- kangaroo