Running on lower Vram

by Donnyed - opened Aug 13, 2022

Aug 13, 2022

I heard this can be run on >10 Vram card, I have a 3070 8gb and when I tried to generate a picture I ran out of Vram, any settings I can change to get it running ?

not-salieri

Aug 13, 2022

•

edited Aug 13, 2022

It's really strange. I've runned stable-diffusion-v-1-3-diffusers model on GeForce RTX 2060 SUPER (8 GB vRAM). It's generates one image in ~ 45 secs with 100 steps perfectly.
Are you generating several samples or one sample? It may ran out of vRAM if you try to generate several samples in one go.

Donnyed

Aug 13, 2022

I tried running the txt2img.py with default settings, what do you use to get it to work on your 2060?

Donnyed

Aug 13, 2022

It's really strange. I've runned stable-diffusion-v-1-3-diffusers model on GeForce RTX 2060 SUPER (8 GB vRAM). It's generates one image in ~ 45 secs with 100 steps perfectly.
Are you generating several samples or one sample? It may ran out of vRAM if you try to generate several samples in one go.

I tried running the txt2img.py with default settings, what do you use to get it to work on your 2060?

not-salieri

Aug 13, 2022

•

edited Aug 13, 2022

It's really strange. I've runned stable-diffusion-v-1-3-diffusers model on GeForce RTX 2060 SUPER (8 GB vRAM). It's generates one image in ~ 45 secs with 100 steps perfectly.
Are you generating several samples or one sample? It may ran out of vRAM if you try to generate several samples in one go.

I tried running the txt2img.py with default settings, what do you use to get it to work on your 2060?

Again, I am using diffusers model. So I think they may be different results.
My settings are: n_samples=1, num_inference_steps=50, guidance_scale=7

Wallpaper Engine/any other GPU intensive apps if running can really influence the GPU capabilities, so consider disabling it if running.

basujindal

Aug 13, 2022

https://twitter.com/EMostaque/status/1557862289394515973 This tweet says that the current model can run on 5.1 Gb VRAM, but my 6Gb GPU is giving me out of memory error. Any suggestions : (?

bottymcbotface

Aug 13, 2022

https://twitter.com/EMostaque/status/1557862289394515973 This tweet says that the current model can run on 5.1 Gb VRAM, but my 6Gb GPU is giving me out of memory error. Any suggestions : (?

By default the code processes 3 images at a time. To reduce the memory footprint, reduce "--_samples" to 1. You can also use --W and --H to reduce the size of the generated image (512x512 default) and thereboy reduce the memory footprint.

On a 3060 12GB I can generate images up to 708x512 using n_samples=1

dlivitz

Aug 15, 2022

With this version of the weights, on an 8GB card I can generate 1 image if I go down to 448x448, and the quality seems fine. If I try much smaller, the images look broken.
512x512 does not work with this set up as is for me, but I suspect it can be squeezed further in the implementation somewhere.

With the diffusers code I can generate the full 512x512 on the same card.

basujindal

Aug 16, 2022

•

edited Aug 17, 2022

Edit: Have added support for batched inference. It can now generate images in batches which reduces inference time to 40 seconds per image on 6Gb RTX 2060 when using a batch size of 6 : )

I have created a modified version of the repo that can run on lower VRAM but requires a slightly longer inference time. It can generate a 512x512 image on a 6Gb GPU (RTX 2060 in my case) in 75 seconds. Please feel free to check it out and give suggestions. https://github.com/basujindal/stable-diffusion

cachirulo001

Aug 17, 2022

hi basujindal, any setting I can change in the optimizedSD files that would lower the bar to 4Gb GPU? Using your files I can run a 128x128 image but not higher. I do not mind long inference time. I just want to run a higher resolution with my poor gtx 970

basujindal

Aug 17, 2022

hi basujindal, any setting I can change in the optimizedSD files that would lower the bar to 4Gb GPU? Using your files I can run a 128x128 image but not higher. I do not mind long inference time. I just want to run a higher resolution with my poor gtx 970

Hi, I am trying to modify the code to make it run on a lower VRAM but it seems complicated. Will let you know if it works.

basujindal

Aug 18, 2022

•

edited Aug 19, 2022

hi basujindal, any setting I can change in the optimizedSD files that would lower the bar to 4Gb GPU? Using your files I can run a 128x128 image but not higher. I do not mind long inference time. I just want to run a higher resolution with my poor gtx 970

Hi, I have updated the repo. Now, you can generate 512x512 images in under 4Gb. Cheers!

db88

Aug 18, 2022

•

edited Aug 18, 2022

try also tinkering with downsampling. Integrated the option into my colab if you want to play with it (via TXT2IMG). If you're running it local, just use the arg
https://colab.research.google.com/drive/1jUwJ0owjigpG-9m6AI_wEStwimisUE17#scrollTo=9QnhfmAM0t-X

Haven't played with the option, but thought it could be a good approach to try and reduce VRAM usage

patrickvonplaten

Aug 23, 2022

You can also remove the vae encoder part to save some more RAM before moving the models on GPU as it's not needed for the pipe line. E.g.

import torch
from diffusers import StableDiffusionPipeline

pipe = StableDiffusionPipeline.from_pretrained("CompVis/stabe-diffusion-v1-4", use_auth_token=True, torch_dtype=torch.float16)
del pipe.vae.encoder

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment