half-precision (fp16) version?

by huggingmaw - opened Aug 24, 2022

Aug 24, 2022

hi, I noticed that the diffuser version of the model has fp16 branch available for those who has lower VRAMs. is there a plan to release the CompVis version as well?
I haven't read the code yet, but I have an impression that it's a bit diffcult to adapt diffuser version to img2img purpose.

p.s. thank you for releasing the model, we've been using it to spice up the table-top rpg game of ours and we're loving it so far.

multimodalart

CompVis org Aug 26, 2022

•

edited Aug 26, 2022

Hi, indeed we don't have pre-converted fp16 weights for the CompVis weights yet - I think this is something we'd love community contribution on.

I haven't read the code yet, but I have an impression that it's a bit diffcult to adapt diffuser version to img2img purpose.

diffusers fully supports img2image, check out: https://github.com/huggingface/diffusers/tree/main/examples/inference - there's also a Colab made for it: https://colab.research.google.com/github/patil-suraj/Notebooks/blob/master/image_2_image_using_diffusers.ipynb

huggingmaw

Aug 27, 2022

hi, thank you for your answer! seems like I've completely missed the diffuser version of img2img. I'll check it out.

huggingmaw changed discussion status to closed Aug 27, 2022

suchatur

Aug 29, 2022

Hi @multimodalart , did you train a diffusion model from scratch for fp16 or can we simply set a fp32 model to fp16 and expect it to work? Thanks a lot for open-sourcing Stable Diffusion!!

suchatur

Aug 29, 2022

I tried it out. It works!!

niklas2810

Sep 6, 2022

@suchatur I'm not an expert when it comes to Python, how exactly can one "set" a model to fp16? Would be awesome if you could reference this for others here :)

suchatur

Sep 7, 2022

•

edited Sep 7, 2022

# load model
# ...
model = model.to(torch.float16)
model = model.to('cuda')
# run it
# ...

anzorq

Oct 3, 2022

@suchatur does that reduce the model's size and inference time?

suchatur

Oct 4, 2022

It halves the model size thus making it fit in GPU memory. Can't comment on inference time because I couldn't run the full precision (float32) model on my GPU.

wolfgangmeyers

Nov 3, 2022

Is this only possible with diffusers library? I can run fp32, and I'd be interested to see the difference in inference time.

suchatur

Nov 3, 2022

No, you can do this with the github repository also. I don't think there will be much difference in inference time (in fact fp16 may be slower due to quantization overhead).

wolfgangmeyers

Dec 21, 2022

Confirmed that this works with the github repo as well. I didn't see any noticeable change in inference time.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment