CompVis/stable-diffusion-v1-4 · Utilize Apple M1 chip causes error (kernel death)

Aug 24, 2022

I don't have Nvidia GPU, so tried to use M1 on my Macbook air. However, executing the code below leads to kernel death.

import torch
from torch import autocast
from diffusers import StableDiffusionPipeline

model_id = "CompVis/stable-diffusion-v1-4"
device = "mps"


pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16, revision="fp16", use_auth_token=True)
pipe = pipe.to(device)

prompt = "a photo of an astronaut riding a horse on mars"
image = pipe(prompt, guidance_scale=7.5)["sample"][0]

Note that pipe.to(device) executes successfully.
Have anyone made M1 work yet? My pytorch version is '1.13.0.dev20220823'

patrickvonplaten

Aug 25, 2022

We're working on exactly this! Pinging @pcuenq and @apolinario here as well

patrickvonplaten

Aug 25, 2022

Please also check announcements on Twitter - we'll publish something about that soon!

tomwjhtom

Aug 25, 2022

Thanks!

loretoparisi

Aug 25, 2022

•

edited Aug 25, 2022

In my case on Apple M1 with the code

# make sure you're logged in with `huggingface-cli login`
import os
import torch
from torch import autocast
from diffusers import StableDiffusionPipeline, LMSDiscreteScheduler

# To swap out the noise scheduler, pass it to from_pretrained:
lms = LMSDiscreteScheduler(
    beta_start=0.00085, 
    beta_end=0.012, 
    beta_schedule="scaled_linear"
)

device = 'cuda' if torch.cuda.is_available() else 'cpu'
print(f'running on {device}')
pipe = StableDiffusionPipeline.from_pretrained(
    "CompVis/stable-diffusion-v1-3", 
    scheduler=lms,
    torch_dtype=torch.float16,
    revision="fp16",
    use_auth_token=True,
    cache_dir=os.getenv("cache_dir", "./models")
).to(device)

prompt = "a photo of an astronaut riding a horse on mars"
with autocast(device):
    image = pipe(prompt)["sample"][0]  
    
image.save("astronaut_rides_horse.png")

I get the following error

Traceback (most recent call last):
  File "diffuser.py", line 27, in <module>
    image = pipe(prompt)["sample"][0]  
  ...
  File "/Documents/Projects/bloom/.venv/lib/python3.7/site-packages/torch/nn/functional.py", line 2503, in layer_norm
    return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: "LayerNormKernelImpl" not implemented for 'Half'

multimodalart

CompVis org Aug 26, 2022

•

edited Aug 26, 2022

For device mps it doesn't work out of the box yet indeed, however if device = cpu it should work @loretoparisi . Can you try removing the with autocast(device)? Autocast doesn't work for CPU as of now

loretoparisi

Aug 26, 2022

For device mps it doesn't work out of the box yet indeed, however if device = cpu it should work @loretoparisi . Can you try removing the with autocast(device)? Autocast doesn't work for CPU as of now

Thanks, I slightly modified the code like

prompt = "a photo of an astronaut riding a horse on mars"
samples = 2
steps = 45
scale = 7.5
if device=='cuda':
    with autocast(device):
        image = pipe(
            [prompt]*samples,
            num_inference_steps=steps,
            guidance_scale=scale,
            )["sample"][0]
else:
    image = pipe(prompt)["sample"][0]

but I'm still getting the same error:

Traceback (most recent call last):
  File "diffuser.py", line 39, in <module>
    image = pipe(prompt)["sample"][0]
  File "/Projects/bloom/.venv/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
...
  File "/Projects/bloom/.venv/lib/python3.7/site-packages/torch/nn/functional.py", line 2503, in layer_norm
    return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: "LayerNormKernelImpl" not implemented for 'Half'

multimodalart

CompVis org Aug 26, 2022

•

edited Aug 26, 2022

@loretoparisi , oh this is probably because you are trying to load the fp16 version of the model - which also doesn't work on CPU 😅
Try this for pipe

pipe = StableDiffusionPipeline.from_pretrained(
    "CompVis/stable-diffusion-v1-4", #better model btw 
    scheduler=lms,
    use_auth_token=True,
    cache_dir=os.getenv("cache_dir", "./models")
).to(device)

loretoparisi

Aug 27, 2022

Thank you it works without on Apple M1, removing autocast and fp16!

sgt101

Aug 27, 2022

Here is the code for other people's convenience

# make sure you're logged in with `huggingface-cli login`
import os
import torch
from torch import autocast
from diffusers import StableDiffusionPipeline, LMSDiscreteScheduler

# To swap out the noise scheduler, pass it to from_pretrained:
lms = LMSDiscreteScheduler(
    beta_start=0.00085, 
    beta_end=0.012, 
    beta_schedule="scaled_linear"
)

device = 'cuda' if torch.cuda.is_available() else 'cpu'
print(f'running on {device}')

pipe = StableDiffusionPipeline.from_pretrained(
    "CompVis/stable-diffusion-v1-4", #better model btw 
    scheduler=lms,
    use_auth_token=True,
    cache_dir=os.getenv("cache_dir", "./models")
).to(device)


prompt = "a photo of an astronaut riding a horse on mars"
samples = 2
steps = 45
scale = 7.5
image = pipe(prompt)["sample"][0]

it definitely works, very slowly though.

fragmede

Aug 30, 2022

•

edited Aug 30, 2022

On an M1 (not M1 Max) I get TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead. if I don't specify revision and torch_dtype. The script python scripts/txt2img.py works to create images though, so it's an issue with diffusers and not stable-diffusion I think.

sgt101

Aug 30, 2022

On an M1 (not M1 Max) I get TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead. if I don't specify revision and torch_dtype. The script python scripts/txt2img.py works to create images though, so it's an issue with diffusers and not stable-diffusion I think.

Can you say how you specify revision and torch_dtype?

I think txt2image.py is using CPU if Cuda is not available

fragmede

Aug 30, 2022

To sgt101, I think you're running in CPU mode, because of the line that says device = 'cuda' if torch.cuda.is_available() else 'cpu'

fragmede

Aug 30, 2022

•

edited Aug 30, 2022

I'm pretty sure my txt2image is using MPS (magnusviri's fork) because it takes about a minute to run instead of upwards of 30 mins, among other things. I have

pipe = StableDiffusionPipeline.from_pretrained(
    "CompVis/stable-diffusion-v1-4", 
    torch_dtype=torch.float16, revision="fp16",
    use_auth_token=True,
).to("mps")

But that errors out with

0it [00:00, ?it/s]loc("mps_add"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/20d6c351-ee94-11ec-bcaf-7247572f23b4/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":219:0)): error: input types 'tensor<2x1280xf32>' and 'tensor<*xf16>' are not broadcast compatible
LLVM ERROR: Failed to infer result type(s).
Abort trap: 6
/Users/fragmede/miniforge3/envs/ldm/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

fragmede

Aug 30, 2022

Figured it out! I opened a PR so hugging face can get my fix to diffusers to get MPS to work.

coolcloud

Sep 15, 2022

•

edited Sep 15, 2022

Figured it out! I opened a PR so hugging face can get my fix to diffusers to get MPS to work.

It works after upgrading diffusers.

 pip install -U diffusers

pcuenq changed discussion status to closed Sep 15, 2022

liyang-happy

Mar 3, 2023

torch_dtype=torch.float16
remove this, and it works for me

Moltes74

May 11, 2023

Hey !

I have the same problem :

loc("varianceEps"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/a0876c02-1788-11ed-b9c4-96898e02b808/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":219:0)): error: input types 'tensor<1x77x1xf16>' and 'tensor<1xf32>' are not broadcast compatible
LLVM ERROR: Failed to infer result type(s).

But I don't understand how to fix it ... I'm an architect and I don't have skills with coding ... Can you please develop a bit the method (if there is any) ?

Thanks a lot in advance !