madebyollin/sdxl-vae-fp16-fix · How is this achieved?

gkorepanov

Jul 11, 2023

Hi, can you share the details of how you have fixed the VAE in fp16?

madebyollin

Owner Jul 11, 2023

Added a section https://huggingface.co/madebyollin/sdxl-vae-fp16-fix/blob/main/README.md#details

gkorepanov

Jul 11, 2023

Thank you, this is a nice work!

gkorepanov changed discussion status to closed Jul 11, 2023

Kubuxu

Jul 27, 2023

@madebyollin would you be willing to share the code (however nice or not it might be) of how you did it? I'm interested in learning of how something like this is done.

madebyollin

Owner Jul 30, 2023

@Kubuxu No code, sorry, too messy (+too much of it changed during training).

Some notes on fine-tuning process:

I mostly trained in bfloat16to avoid OOM
I watched activation-map magnitudes + output deltas on a test image and manually rebalanced the match-original-output and make-activation-maps-smaller losses occasionally.
I froze the weight matrices and only fine-tuned biases / normalization layers / a single scaler for each weight matrix (screenshot for decoder).

Some speculation on what might have happened to the original VAE:

I think Stability zero-initialized final convs in UNet resblocks, but not VAE resblocks, which I think leads to variance increasing with VAE depth (per FixUp / ReZero papers)
I think Stability initialized up / down convs with too-big weights (default PyTorch initialization assumes a ReLU-like nonlinearity afterwards, but these convs have no nonlinearity), which I think will increase variance after each of these convs (per He initialization paper)
If you compare the original / fixed weights, resblock final convs and up / down convs are mostly what shrunk, which seems like weak evidence in favor of these weights being too large initially.

That is all speculation though - I don't thoroughly understand the issue yet. I just threw some code together and happened to get it working :)