torch.cuda.OutOfMemoryError: CUDA out of memory. How to run on a graphics card with small memory

#94
by zhengjx - opened

It Can't Work.
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 18.00 MiB (GPU 0; 14.85 GiB total capacity; 14.62 GiB already allocated; 12.75 MiB free; 14.74 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

@zhengjx I do not believe you can run the full fp16 model on GPU. You need to offload to CPU a bit or use the one without t5 encoder.

how to use the one without t5 encoder.

model_path = snapshot_download(
repo_id="stabilityai/stable-diffusion-3-medium",
revision="refs/pr/26",
repo_type="model",
ignore_patterns=[".md", "..gitattributes"],
local_dir="stable-diffusion-3-medium",
token="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", # type a new token-id.
)

DESCRIPTION = """# Stable Diffusion 3"""
if not torch.cuda.is_available():
DESCRIPTION += "\n

Running on CPU 🥶 This demo may not work on CPU.

"

MAX_SEED = np.iinfo(np.int32).max
CACHE_EXAMPLES = False
MAX_IMAGE_SIZE = int(os.getenv("MAX_IMAGE_SIZE", "1536"))
USE_TORCH_COMPILE = False
ENABLE_CPU_OFFLOAD = os.getenv("ENABLE_CPU_OFFLOAD", "0") == "1"

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(device)

def load_pipeline(pipeline_type):
if pipeline_type == "text2img":
return StableDiffusion3Pipeline.from_pretrained(model_path, torch_dtype=torch.float16)
elif pipeline_type == "img2img":
return StableDiffusion3Img2ImgPipeline.from_pretrained(model_path, torch_dtype=torch.float16)

use this code to not use t5 encoder(quality will slightly decrease) but you should save a LOT of VRAM.

pipe = StableDiffusion3Pipeline.from_pretrained("stable-diffusion-3-medium", torch_dtype=torch.float16, text_encoder_3=None, revision="refs/pr/26").to('cuda')

if you plan on using t5 encoder then try this, it might save a bit of vram

pipe = StableDiffusion3Pipeline.from_pretrained("stable-diffusion-3-medium", torch_dtype=torch.float16, revision="refs/pr/26")
pipe.enable_model_cpu_offload()

Use t5fp8 safetensors. I can run it on 4070 12G.

It Can't Work.
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 18.00 MiB (GPU 0; 14.85 GiB total capacity; 14.62 GiB already allocated; 12.75 MiB free; 14.74 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

" --cuda-device 0" specify gpu may work
.\python_embeded\python.exe -s ComfyUI\main.py --cuda-device 0 --force-fp16

this works slowly on cpu:

from diffusers import StableDiffusion3Pipeline

model='stabilityai/stable-diffusion-3-medium-diffusers'

base = StableDiffusion3Pipeline.from_pretrained(
model,
use_safetensors=True
)

image = base(
"A cat holding a sign that says hello world",
negative_prompt="",
num_inference_steps=28,
guidance_scale=7.0,
).images[0]

image.save("cat_image.jpg")

You can use this on SwarmUI even with 6 GB VRAM

This is the tutorial you need

Zero to Hero Stable Diffusion 3 Tutorial with Amazing SwarmUI SD Web UI that Utilizes ComfyUI

image

Sign up or log in to comment