Wrong num_hidden_layers of text encoders?
#27
by
p1atdev
- opened
This model's text encoders use the last layer:
https://huggingface.co/stabilityai/stable-diffusion-xl-base-0.9/blob/main/text_encoder/config.json#L19
https://huggingface.co/stabilityai/stable-diffusion-xl-base-0.9/blob/main/text_encoder_2/config.json#L19
But stability's config seems to use the penultimate layer:
https://github.com/Stability-AI/generative-models/blob/5c10deee76adad0032b412294130090932317a87/configs/inference/sd_xl_base.yaml#L49
https://github.com/Stability-AI/generative-models/blob/5c10deee76adad0032b412294130090932317a87/configs/inference/sd_xl_base.yaml#L58
Is this a mistake? Or was it intended?
For diffusers, selecting which layer to use is done when generating, not when loading.