madebyollin/taesd · Same architecture as on github?

Feb 27

This is great work. I was trying to run this model using a language that's not Python. I scripted the architecture according what's on github:
https://github.com/madebyollin/taesd/blob/main/taesd.py

I then loaded the weights from the safetensors here. However, when I run decoder(encoder(crepe.jpg)), the output is shifted 8px down and to the right. I'm wondering if the architecture is identical or if the diffusers encoder here does not use stride=2, padding=1 to downsample. If they aren't the same architecture, could you perhaps describe it here?

Thank you so much!

madebyollin

Owner Feb 28

The architecture is identical (you can read the diffusers implementation here) - there's probably a mismatch between both pytorch implementations and yours. I recommend generating a png from encoder(crepe.jpg) like the example here and comparing those to narrow down whether the mismatch is in the encoder (e.g. in the strided convs) or decoder (e.g. in the nearest-neighbor upsample).

eeyore1

Feb 28

Thanks for the fast reply! There was a difference in the latents, so I looked just at the encoder. It turned out that the weights are stored differently (cross-correlation vs convolution) between PyTorch and what I'm using. So I just reversed each of the W and H dimensions, and it now performs identically.

eeyore1 changed discussion status to closed Feb 28