hakurei/waifu-diffusion · The model tends to generate "cropped" images

Oct 7, 2022

When generating smaller images (something around 512*768), the generated images usually do not have a full body in it even using "full_body" tag as prompt. The results look like badly cropped illusts without part of the head or feet.
If I change the resolution to a larger value the problem could be parcially resolved, but larger resolution results would be more likely to have many corrupted body parts in it.
I would like to know if this is the model's problem, or did I configurate something wrong? What can I do to make the results better?

Idou

Oct 8, 2022

Even the "very wide shot" tag does not generate expected result. Seems the model is insensitive to these framing tags?...

Idou

Oct 9, 2022

Try to use a "green_background" tag? It reduces the possibility of cropping head in my environment

CeeGee

Oct 11, 2022

When generating smaller images (something around 512*768), the generated images usually do not have a full body in it even using "full_body" tag as prompt. The results look like badly cropped illusts without part of the head or feet.
If I change the resolution to a larger value the problem could be parcially resolved, but larger resolution results would be more likely to have many corrupted body parts in it.
I would like to know if this is the model's problem, or did I configurate something wrong? What can I do to make the results better?

According to the (Novel AI) blog post at the following link, this effect is due to "Aspect Ratio Bucketing": https://blog.novelai.net/novelai-improvements-on-stable-diffusion-e10d38db82ac

Basically, it's probably not you but an artifact of how the images were scanned in. There are two ways that I use to try to make things better. First is to try different seeds and or prompts- but mainly different seeds. When I get a seed with the pose in the sampler of my choice at around 20 steps, I stick with that. Steps is obviously up to you. The other thing is to use a 'guide' image in IMG2IMG to sort of 'force' SD to put the (object) in the place you were thinking of. Remember to keep the Denoising Strength around .7 or below, or SD won't pay enough attention to the 'guide' image to make a difference. Hope that helps!