ddpo-alignment / README.md
kvablack's picture
Add example inputs to the widget (#1)
23c5dc4
metadata
license: creativeml-openrail-m
language:
  - en
library_name: diffusers
pipeline_tag: text-to-image
tags:
  - stable-diffusion
  - stable-diffusion-diffusers
  - text-to-image
inference:
  parameters:
    num_inference_steps: 50
    guidance_scale: 5
    eta: 1
widget:
  - text: a horse playing chess
    example_title: horse + chess
  - text: a lion washing dishes
    example_title: lion + dishes
  - text: a goat riding a bike
    example_title: goat + bike

ddpo-alignment

This model was finetuned from Stable Diffusion v1-4 using DDPO and a reward function that uses LLaVA to measure prompt-image alignment. See the project website for more details.

The model was finetuned for 200 iterations with a batch size of 256 samples per iteration. During finetuning, we used prompts of the form: "a(n) <animal> <activity>". We selected the animal and activity from the following lists, so try those for the best results. However, we also observed limited generalization to other prompts.

Activities:

  • washing dishes
  • playing chess
  • riding a bike

Animals:

  • cat
  • dog
  • horse
  • monkey
  • rabbit
  • zebra
  • spider
  • bird
  • sheep
  • deer
  • cow
  • goat
  • lion
  • tiger
  • bear
  • raccoon
  • fox
  • wolf
  • lizard
  • beetle
  • ant
  • butterfly
  • fish
  • shark
  • whale
  • dolphin
  • squirrel
  • mouse
  • rat
  • snake
  • turtle
  • frog
  • chicken
  • duck
  • goose
  • bee
  • pig
  • turkey
  • fly
  • llama
  • camel
  • bat
  • gorilla
  • hedgehog
  • kangaroo