stabilityai/stable-diffusion-xl-base-1.0 · Lowres image synthesis is flawed

Jul 27, 2023

Hello. Here I attach three grids (i resized images in grids to 256x256). Prompt for all images: "A photo of Emilia Clarke"

The first grid is synthesized in 1024x1024. Second - 512x512, same seed as 1024x1024. Third - 512x512, another seed.
First of all, lower resolution images look bad. Secondly, and more important, on the third grid there are two images which are complete mess.

I wonder if there is any technique to solve this issue.
I am interested in low-res case because it would be much easier to train any personalization methods on 512x512 rather on 1024x1024.

peterpanzki07

Jul 28, 2023

•

edited Jul 28, 2023

According to the official report it was never trained on 512x512. Training was mostly done on full res images and finetuned on some other aspect ratios with similar pixel count. If an image is too small its upscaled and the original size is added in the prompt encoding process. So I guess what you are trying to do is not intended atm.
See: https://arxiv.org/pdf/2307.01952.pdf#appendix.I

Adreitz

Jul 28, 2023

Are you trying to reduce memory usage by generating small images? AI image generators don't like generating images at much smaller than their native resolution. Classic SD was the same (images smaller than about 320 would break). I wouldn't go smaller than 768 for SDXL, I think.

bghira

Jul 29, 2023

i don't know what peter is talking about. this model was trained on a bulk (39%) of data under 512x512.

it was fine-tuned exclusively at 1 megapixel, which means it now tries to represent details of that resolution in a smaller space. it can't do that.