Lowres image synthesis is flawed

#23
by Pupochek - opened

Hello. Here I attach three grids (i resized images in grids to 256x256). Prompt for all images: "A photo of Emilia Clarke"

  1. grid_sdxl_1024.jpg
  2. grid_sdxl_512.jpg
  3. grid_sdxl_512_v2.jpg

The first grid is synthesized in 1024x1024. Second - 512x512, same seed as 1024x1024. Third - 512x512, another seed.
First of all, lower resolution images look bad. Secondly, and more important, on the third grid there are two images which are complete mess.

I wonder if there is any technique to solve this issue.
I am interested in low-res case because it would be much easier to train any personalization methods on 512x512 rather on 1024x1024.

According to the official report it was never trained on 512x512. Training was mostly done on full res images and finetuned on some other aspect ratios with similar pixel count. If an image is too small its upscaled and the original size is added in the prompt encoding process. So I guess what you are trying to do is not intended atm.
See: https://arxiv.org/pdf/2307.01952.pdf#appendix.I

Are you trying to reduce memory usage by generating small images? AI image generators don't like generating images at much smaller than their native resolution. Classic SD was the same (images smaller than about 320 would break). I wouldn't go smaller than 768 for SDXL, I think.

i don't know what peter is talking about. this model was trained on a bulk (39%) of data under 512x512.

it was fine-tuned exclusively at 1 megapixel, which means it now tries to represent details of that resolution in a smaller space. it can't do that.

Sign up or log in to comment