A guide to tuning parameters

#7
by atarashansky - opened

Anyone have a solid understanding of what the different parameters are doing? I have an intuitive understanding of scale in theory but I haven't been able to fully rationalize its effects on the resulting image. The best I can say is that turning the guidance up will sacrifice details in favor of highlighting coarser features. Turning the guidance down will include more details but it starts getting chaotic.

What about the number of inference steps? Is this something that asymptotes (i.e. above a certain point there's no difference between 100 vs 500 vs 5000, etc)?

Also, I noticed something strange - I compared #steps=50 vs #steps=51 - the first iteration will produce nearly-identical samples. The following iterations are randomized. Why is that?

Any other parameters that are worth tuning?

The best way to explore the parameters is to stick with one prompt and one random number seed, and vary the one parameter that you want to understand better. Then you can directly see how changing the parameter affects that specific image.

Here is a parameter sweep for one prompt and seed:

https://docs.google.com/spreadsheets/d/1SYQhyJaKkkY0cmPd0WQvPwEX188l5FZxzukkC7IQDw4/edit#gid=0
s = steps
cfg = scale (I think)

My vague understanding is that scale dictates how closely the image generation is controlled by the text prompt. I have not played with it at all.

Steps refers to how many "un-diffusion' steps are taken towards something that matches the prompt. The biggest effect of steps will be in the very first few (1, 2, 3, 4, 5). Then the image usually "settles" into roughly the way it will look somewhere between 10 and 20 steps. In many cases there is a point of diminishing returns in the number of steps that are taken. But every once in a while you can see visible changes with higher step values. I don't think this is anything that can be predicted ahead of time for a given prompt.

There is also a parameter called eta that I have not tried out. I would be interested in learning what effect it has on the results.

Thanks for the google doc. It's really interesting - at high scale, there seems to be a sharp phase transition at higher step counts where the image suddenly gains clarity.

This site may also interest you:

https://botbox.dev/stable-diffusion-settings-guide/
Above site is about Discord beta test though, thus comparison about different sampler is partialy irrelevant. Since only DDIM (without --plms param) and PLMS (with --plms param) are avalable within this repo's model.

It's really interesting - at high scale, there seems to be a sharp phase transition at higher step counts where the image suddenly gains clarity.

I can confirm this effect as well. If guidance scale is high enough ( --scale 15 ~ 20), difference between 100 steps and 150 steps starts to matter.

Sign up or log in to comment