half-precision (fp16) version?

#5
by huggingmaw - opened

hi, I noticed that the diffuser version of the model has fp16 branch available for those who has lower VRAMs. is there a plan to release the CompVis version as well?
I haven't read the code yet, but I have an impression that it's a bit diffcult to adapt diffuser version to img2img purpose.

p.s. thank you for releasing the model, we've been using it to spice up the table-top rpg game of ours and we're loving it so far.

Hi, indeed we don't have pre-converted fp16 weights for the CompVis weights yet - I think this is something we'd love community contribution on.

I haven't read the code yet, but I have an impression that it's a bit diffcult to adapt diffuser version to img2img purpose.

diffusers fully supports img2image, check out: https://github.com/huggingface/diffusers/tree/main/examples/inference - there's also a Colab made for it: https://colab.research.google.com/github/patil-suraj/Notebooks/blob/master/image_2_image_using_diffusers.ipynb

hi, thank you for your answer! seems like I've completely missed the diffuser version of img2img. I'll check it out.

huggingmaw changed discussion status to closed

Hi @multimodalart , did you train a diffusion model from scratch for fp16 or can we simply set a fp32 model to fp16 and expect it to work? Thanks a lot for open-sourcing Stable Diffusion!!

I tried it out. It works!!

@suchatur I'm not an expert when it comes to Python, how exactly can one "set" a model to fp16? Would be awesome if you could reference this for others here :)

# load model
# ...
model = model.to(torch.float16)
model = model.to('cuda')
# run it
# ...

@suchatur does that reduce the model's size and inference time?

It halves the model size thus making it fit in GPU memory. Can't comment on inference time because I couldn't run the full precision (float32) model on my GPU.

Is this only possible with diffusers library? I can run fp32, and I'd be interested to see the difference in inference time.

No, you can do this with the github repository also. I don't think there will be much difference in inference time (in fact fp16 may be slower due to quantization overhead).

Confirmed that this works with the github repo as well. I didn't see any noticeable change in inference time.

Sign up or log in to comment