half-precision (fp16) version?
hi, I noticed that the diffuser version of the model has fp16
branch available for those who has lower VRAMs. is there a plan to release the CompVis version as well?
I haven't read the code yet, but I have an impression that it's a bit diffcult to adapt diffuser version to img2img purpose.
p.s. thank you for releasing the model, we've been using it to spice up the table-top rpg game of ours and we're loving it so far.
Hi, indeed we don't have pre-converted fp16
weights for the CompVis weights yet - I think this is something we'd love community contribution on.
I haven't read the code yet, but I have an impression that it's a bit diffcult to adapt diffuser version to img2img purpose.
diffusers
fully supports img2image, check out: https://github.com/huggingface/diffusers/tree/main/examples/inference - there's also a Colab made for it: https://colab.research.google.com/github/patil-suraj/Notebooks/blob/master/image_2_image_using_diffusers.ipynb
hi, thank you for your answer! seems like I've completely missed the diffuser version of img2img. I'll check it out.
Hi @multimodalart , did you train a diffusion model from scratch for fp16 or can we simply set a fp32 model to fp16 and expect it to work? Thanks a lot for open-sourcing Stable Diffusion!!
I tried it out. It works!!
@suchatur I'm not an expert when it comes to Python, how exactly can one "set" a model to fp16? Would be awesome if you could reference this for others here :)
# load model
# ...
model = model.to(torch.float16)
model = model.to('cuda')
# run it
# ...
It halves the model size thus making it fit in GPU memory. Can't comment on inference time because I couldn't run the full precision (float32) model on my GPU.
Is this only possible with diffusers library? I can run fp32, and I'd be interested to see the difference in inference time.
No, you can do this with the github repository also. I don't think there will be much difference in inference time (in fact fp16 may be slower due to quantization overhead).
Confirmed that this works with the github repo as well. I didn't see any noticeable change in inference time.