CompVis/stable-diffusion-v1-4 · How to improve quality of outputs?

Aug 23, 2022

Hi, thanks for making this available. I'm coming from using Dalle2, and after playing around with this a bit, I'm finding that stable diffusion doesn't really get what I'm saying in the prompt as much. For example, "a hamster going super saiyan, digital art". In Dalle2, it will actually do this quite well, making a hamster that is kind of on fire. When I use stable diffusion with the same prompt, it's like it ignores the hamster part, just giving me a stock image from dragonball or something. Are there parameters that we can change to modify its behavior? Is there something I'm doing wrong? Is there additional training we can do to improve this aspect? Thanks!

ghpkishore

Aug 23, 2022

You can add classifier free guidance values to it. Basically if i remember correctly, the parameters value is -C. So it would be “prompt” -C 7

patrickvonplaten

Aug 23, 2022

guidance_scale can indeed help quite a bit here! Here you can see how to use it with diffusers: https://huggingface.co/CompVis/stable-diffusion-v1-4#examples

Also prompt engineering like pre-pending something like "High photo-realistic <your/prompt>" can help

aday777

Aug 24, 2022

Thanks! I'll play around with these. I ran it like ... 1000 times, and a few of them gave me the proper image, so it's in there somewhere. Just a matter of teasing it out I guess, haha.