torch.cuda.OutOfMemoryError: CUDA out of memory. How to run on a graphics card with small memory

#94

by zhengjx - opened Jun 13, 2024

Jun 13, 2024

It Can't Work.
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 18.00 MiB (GPU 0; 14.85 GiB total capacity; 14.62 GiB already allocated; 12.75 MiB free; 14.74 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

YaTharThShaRma999

Jun 13, 2024

@zhengjx I do not believe you can run the full fp16 model on GPU. You need to offload to CPU a bit or use the one without t5 encoder.

zhengjx

Jun 13, 2024

how to use the one without t5 encoder.

model_path = snapshot_download(
repo_id="stabilityai/stable-diffusion-3-medium",
revision="refs/pr/26",
repo_type="model",
ignore_patterns=[".md", "..gitattributes"],
local_dir="stable-diffusion-3-medium",
token="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", # type a new token-id.
)

DESCRIPTION = """# Stable Diffusion 3"""
if not torch.cuda.is_available():
DESCRIPTION += "\n

Running on CPU 🥶 This demo may not work on CPU.

MAX_SEED = np.iinfo(np.int32).max
CACHE_EXAMPLES = False
MAX_IMAGE_SIZE = int(os.getenv("MAX_IMAGE_SIZE", "1536"))
USE_TORCH_COMPILE = False
ENABLE_CPU_OFFLOAD = os.getenv("ENABLE_CPU_OFFLOAD", "0") == "1"

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(device)

def load_pipeline(pipeline_type):
if pipeline_type == "text2img":
return StableDiffusion3Pipeline.from_pretrained(model_path, torch_dtype=torch.float16)
elif pipeline_type == "img2img":
return StableDiffusion3Img2ImgPipeline.from_pretrained(model_path, torch_dtype=torch.float16)

YaTharThShaRma999

Jun 13, 2024

•

edited Jun 13, 2024

use this code to not use t5 encoder(quality will slightly decrease) but you should save a LOT of VRAM.

pipe = StableDiffusion3Pipeline.from_pretrained("stable-diffusion-3-medium", torch_dtype=torch.float16, text_encoder_3=None, revision="refs/pr/26").to('cuda')

if you plan on using t5 encoder then try this, it might save a bit of vram

pipe = StableDiffusion3Pipeline.from_pretrained("stable-diffusion-3-medium", torch_dtype=torch.float16, revision="refs/pr/26")
pipe.enable_model_cpu_offload()

CHNtentes

Jun 14, 2024

Use t5fp8 safetensors. I can run it on 4070 12G.

alexfang888

Jun 15, 2024

It Can't Work.
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 18.00 MiB (GPU 0; 14.85 GiB total capacity; 14.62 GiB already allocated; 12.75 MiB free; 14.74 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

" --cuda-device 0" specify gpu may work
.\python_embeded\python.exe -s ComfyUI\main.py --cuda-device 0 --force-fp16