Discrepancy between number of transformer layers in config and paper

#33
by Sahiljain314 - opened

I noticed that the config.json for the SDXL UNET contains the following: https://huggingface.co/stabilityai/stable-diffusion-xl-base-0.9/blob/main/unet/config.json#L59, which indicates there is 1 transformer block at the highest resolution mapping.

However, when reading the SDXL paper, they make a bit point to mention that the actual transformer blocks are [0, 2, 10], and they have omitted any blocks at the highest level.

Am I missing something? If not, which one is correct?

Sign up or log in to comment