Why VAE is in .ckpt instead of .bin?

#2
by Nilaier - opened

Just asking. Can't use ckpt, don't know how to convert. Big dummy me

You should just rename the .ckpt file of the VAE to the name of the model you're using and change the extension to ".vae.pt", I think the config and the pruner should also be there but I'm not sure.

So, if you're using the "wd-v1-3-float16.ckpt" model file, you should rename the VAE into "wd-v1-3-float16.vae.pt". Both should be present in the "/models/stable-diffusion" folder.
When you select the WD model, the console should say: "Loading VAE weights from: ...\stable-diffusion-webui\models\Stable-diffusion\wd-v1-3-float16.vae.pt"

You should just rename the file .ckpt file of the VAE to the name of the model you're using and change the extension to ".vae.pt"

So, if you're using the "wd-v1-3-float16.ckpt" model file, you should rename the VAE into "wd-v1-3-float16.vae.pt". Both should be present in the "/models/stable-diffusion" folder.
When you select the WD model, the console should say: "Loading VAE weights from: ...\stable-diffusion-webui\models\Stable-diffusion\wd-v1-3-float16.vae.pt"

Cool, but can I merge VAE with the model so they'll be solid? I don't think you can use them separately in dreambooth.

Cool, but can I merge VAE with the model so they'll be solid? I don't think you can use them separately in dreambooth.

I think so, yes. I have merged both successfully in the past, but now that I think of it, I can't be sure if it actually worked lol
Try merging them as they are and update here with the results for others.

Cool, but can I merge VAE with the model so they'll be solid? I don't think you can use them separately in dreambooth.

Not with the Automatic1111's merge option - as far as I know.
But you can do this with diffusers (github) - they have a script to convert a normal SD checkpoint to diffusers and another that does the reverse.
Since the first script splits the checkpoint into parts you can just set it to load the custom VAE instead of the one built in the .ckpt and then use the second script to convert it back to .ckpt (which will integrate the custom VAE into the checkpoint itself). To do this though you need to manually edit the first conversion script to load the external VAE.
First load it like this:
external_vae_path = "Your-Vae-Path"
vae_ckpt = torch.load(external_vae_path, map_location="cpu")["state_dict"]
Then replace:
converted_vae_checkpoint = convert_ldm_vae_checkpoint(checkpoint, vae_config)
with:
converted_vae_checkpoint = convert_ldm_vae_checkpoint(vae_ckpt, vae_config)

EDIT:
Just to be even more clear, you can do this with any model including a dreambooth. You could also do a conversion using a dummy model just to get the diffusers-based custom VAE then use the first script (unedited) to convert any model you want to diffusers and just copy paste and replace the VAE part with the custom one before converting it back to .ckpt.

Cool, but can I merge VAE with the model so they'll be solid? I don't think you can use them separately in dreambooth.

Not with the Automatic1111's merge option - as far as I know.
But you can do this with diffusers (github) - they have a script to convert a normal SD checkpoint to diffusers and another that does the reverse.
Since the first script splits the checkpoint into parts you can just set it to load the custom VAE instead of the one built in the .ckpt and then use the second script to convert it back to .ckpt (which will integrate the custom VAE into the checkpoint itself). To do this though you need to manually edit the first conversion script to load the external VAE.
First load it like this:
external_vae_path = "Your-Vae-Path"
vae_ckpt = torch.load(external_vae_path, map_location="cpu")["state_dict"]
Then replace:
converted_vae_checkpoint = convert_ldm_vae_checkpoint(checkpoint, vae_config)
with:
converted_vae_checkpoint = convert_ldm_vae_checkpoint(vae_ckpt, vae_config)

EDIT:
Just to be even more clear, you can do this with any model including a dreambooth. You could also do a conversion using a dummy model just to get the diffusers-based custom VAE then use the first script (unedited) to convert any model you want to diffusers and just copy paste and replace the VAE part with the custom one before converting it back to .ckpt.

No, i don't think that works. Usually it throws an error exactly on > converted_vae_checkpoint = convert_ldm_vae_checkpoint(vae_ckpt, vae_config)

Either: omegaconf.errors.ConfigAttributeError: Missing key timesteps
full_key: model.params.timesteps
object_type=dict
If provided a config from the vae's folder too

Or: new_checkpoint["encoder.conv_in.weight"] = vae_state_dict["encoder.conv_in.weight"]
KeyError: 'encoder.conv_in.weight'
Without it

@Nilaier
My bad - I didn't actually tested it out. Turns out the keys are a bit different which is why you got errors even without trying to load the included config.yaml.
You can get it working by not using that config file and replacing the following line:

if key.startswith(vae_key):

with:

if key[0:4] != "loss":

However this will not add the 'loss' keys which have some configuration within the given VAE config.yaml - those keys and those config settings do not exist in the original v1-inference.yaml - and so since I'm no dev this is as far as I can go.
The config settings - aside from the 'loss' keys settings - are exactly the same as the original SD so you could just try to ignore those but I don't think the end results will be 100% the same.

@Nilaier
My bad - I didn't actually tested it out. Turns out the keys are a bit different which is why you got errors even without trying to load the included config.yaml.
You can get it working by not using that config file and replacing the following line:

if key.startswith(vae_key):

with:

if key[0:4] != "loss":

However this will not add the 'loss' keys which have some configuration within the given VAE config.yaml - those keys and those config settings do not exist in the original v1-inference.yaml - and so since I'm no dev this is as far as I can go.
The config settings - aside from the 'loss' keys settings - are exactly the same as the original SD so you could just try to ignore those but I don't think the end results will be 100% the same.

Okay, if that's what we can do, then i'm going to try. Let's cross our fingers and dive right into it!

Oh, ok, i guess it did actually worked! It doesn't have such amazing results as it was shown for us on Haru's twitter, but, admitedly, i've used float 16 checkpoint, so i think it would be even better on something like float 32 or full ema.
New VAE:
00009-4260133673-1girl, solo,____.png
Original:
00008-4260133673-1girl, solo,____.png

"Supposedly improves reconstruction of eyes and fingers." - From the rentry.org site.
It would be difficult to improve on that art style and yet there's definitely some improvement in the eyes there.
This will probably fare much better with more 'realistic'/modern anime style. Also when it comes to conversions and merges its always a good idea to experiment with the different schedulers/methods a bit.
Will def give this a try myself when v1.4 gets out.

admitedly, i've used float 16 checkpoint, so i think it would be even better on something like float 32 or full ema.

I thought this shouldn't have any effect on final result, and is used only when you start training?

@HexagonSun
It will obviously affect the final result but not by much. The way I understand it is float 32 is a decimal number which supports double the number of digits than fp16 does. So when converting to fp16 you are basically 'rounding' the weights - the end result is VERY close to the original but never the same. You can see this by directly comparing 2 images generated with a fp16 and its fp32 equivalent model.
When it comes to this VAE - since its meant to affect minor details - using fp16 might have a noticeable negative impact.

When it comes to this VAE - since its meant to affect minor details - using fp16 might have a noticeable negative impact.

This is new to me. Do you perhaps have a link or anything to see a comparison of the 16/32/full models with a VAE? I wasn't aware of this at all.

Sign up or log in to comment