RuntimeError: "LayerNormKernelImpl" not implemented for 'Half'

#64
by MoscaZzz - opened

How do I fix this ?

This looks like an issue related to some card not implementing operations on half floats. Try running the script with those two arguments, see if this helps: ' --precision full --no-half'

@rey9009 You're correct, but I think it's CPU, not Cards?

Adding the param to webui-user.bat at the ARGs section made mine actually render, BUT it's using 100% CPU, 0% GPU
AMD Ryzen 5800, AMD XFX 6800

ie. it's defaulting to CPU because a (pseudocode) torch.device('cuda') fails somewhere, I don't know enough yet to know if I'm missing a driver, an install or something.

This looks like an issue related to some card not implementing operations on half floats. Try running the script with those two arguments, see if this helps: ' --precision full --no-half'

Thanks! will try.

@rey9009 You're correct, but I think it's CPU, not Cards?

Adding the param to webui-user.bat at the ARGs section made mine actually render, BUT it's using 100% CPU, 0% GPU
AMD Ryzen 5800, AMD XFX 6800

ie. it's defaulting to CPU because a (pseudocode) torch.device('cuda') fails somewhere, I don't know enough yet to know if I'm missing a driver, an install or something.

Do you think this might be related to the fact that AMD is not supported?
"Only Nvidia cards are officially supported." this is from Reddit. However there seems to be a guy who has found a way to run it on AMD.
https://www.reddit.com/r/StableDiffusion/comments/wv3zam/comment/ild7yv3/?utm_source=share&utm_medium=web2x&context=3

This looks like an issue related to some card not implementing operations on half floats. Try running the script with those two arguments, see if this helps: ' --precision full --no-half'

This worked for me. Thanks

@upizs
Adding the param to webui-user.bat
where should i add it man....

@soumya12
Adding the param to webui-user.bat
where should i add it man....
In your stable Diffusion folder there is a file webui.bat use some text editor to open it and then there is a line
set COMMANDLINE_ARGS= ...... <- this is where you add the args as above. every arg starts with --
mine looks like this:
set COMMANDLINE_ARGS=--skip-torch-cuda-test --precision full --no-half
if you cant find it in there, then just add it somewhere in beginning like line 6
Good luck

Yeah it works fine on Linux after a bit of troubleshooting to get the right drivers etc.

I use:
--skip-torch-cuda-test --precision full --no-half --no-progressbar-hiding --opt-channelslast

I just read about the last parameter, not sure whether it would help or not yet on my card - doesn't break when it's on or off.

The most important detail I've found is this:

AMD GPUs detect/run run as Cuda devices
https://discuss.pytorch.org/t/how-to-run-torch-with-amd-gpu/157069/4

So if you really want to get into it there's probably loads of optimizations to be made, I don't understand enough Python & Torch to do it yet.

Worked for Rocm 5.1.1 on Linux Mint 21
My card is not officially supported on that driver, that's okay.
The driver doesn't officially support Ubuntu 22 but I got it working.

Ubuntu 20 recommended to avoid that, if you want.

remove: torch_dtype=torch.float16,

pipe = StableDiffusionPipeline.from_pretrained(
model_path,
# revision="fp16",
# torch_dtype=torch.float16, #
use_auth_token=True
).to(device)
works

This looks like an issue related to some card not implementing operations on half floats. Try running the script with those two arguments, see if this helps: ' --precision full --no-half'

Worked for me :)

I did what yall di like this
'--precision full --no-half'
But it still doesnt work though?, anyone knows why this happens

Thanks for all the comments I had the same problem and already solve it. The only issue is that it fully uses my CPU and RAM, and it doesn't activates the GPU. It may sound silly but I have the Intel Iris Xe Graphics, is there any way that I can "replace" the Nvidia cards and use my GPU?

This looks like an issue related to some card not implementing operations on half floats. Try running the script with those two arguments, see if this helps: ' --precision full --no-half'

Where do I put it? (I'm new) Do y'all run it locally or smth?

Where should we put those arguments? Feels like there's an elephant in this room :p

you put those argument in the webui-user.bat file

Where should we put those arguments? Feels like there's an elephant in this room :p

Right click webui-user.bat or webui-user and then click on edit. Then add --skip-torch-cuda-test --precision full --no-half , next to COMMANDLINE_ARGS=
It would look something like this:
set COMMANDLINE_ARGS= --skip-torch-cuda-test --precision full --no-half
Then save it and click again on webui-user.bat or webui-user, it will take some time to load and then you just have to copy the link at the end that starts with http:// and paste it on your search engine:)
Hope it works!

This looks like an issue related to some card not implementing operations on half floats. Try running the script with those two arguments, see if this helps: ' --precision full --no-half'

I have modified in webui-user.bat file, but it still hasn't been solved. Is it because of my MAC?

make ur arg look like this: set COMMANDLINE_ARGS= --lowvram --precision full --no-half --skip-torch-cuda-test

Lmao I got it to run on this CPU [Radeon RX 580 Series] with the following arg setting
[set COMMANDLINE_ARGS=--skip-torch-cuda-test --precision full --no-half --lowvram]
Even though it ran on that CPU, it did take like 4 minutes to load a single txt2img...
So even though these settings make it run on CPU's other than NVIDIA... it unfortunately costs longer wait times.
Quoting here:

@valk "You're correct, but I think it's CPU, not Cards?"
@JulyFX "make ur arg look like this: set COMMANDLINE_ARGS= --lowvram --precision full --no-half --skip-torch-cuda-test"

@averyrune you might be a bit confused.

  1. Did it take 4 minutes for a simple prompt (under 75 tokens) 512x512 no embeddings/loras/controlnet etc?
  2. So even though these settings make it run on CPU's other than NVIDIA... it unfortunately costs longer wait times.
    False

--skip-cuda-torch-test only SKIPS the test for CUDA
in other words, it is needed to make the frontend run on AMD.

If it STILL runs on CPU, then your drivers / something else is incorrectly setup and the backend automatically swaps to using CPU.

The error itself, title of the thread, is INDEPENDENT from whether running on CPU or GPU, if the command isn't supported, this error happens.
e.g. some models like for example SD 2.1 require --precision full + --no-half to force FULL mode (full precision, meaning 32bit i believe)
because half precision isn't supported in that model and results in black output images...

Same thing, certain GPUs don't support either 16bit or 32bit correctly in the way that is needed by something in these Diffuser pipelines.

I hope this information helps, those were good questions.

when you said:
Even though it ran on that CPU, it did take like 4 minutes to load a single txt2img...
did you mean:
Even though it ran on that GPU, it did take like 4 minutes to load a single txt2img...
?

I don't expect an 8gb GPU to generate pictures very quickly at all.

Instead of using --lowvram, try the new optimization parameter --opt-sdp-attention and don't go over 512 when testing.

Lowvram makes generation slower, to make it use less VRAM.
Using the new optimization will also make it use less VRAM.

try --medvram too.

There's a way to check if torch(cuda) picks up your GPU outside of the UI stuff (on the commandline/terminal) but I forgot how.

One more note, Precision full and --no-half both slow down generation too, with the new optimization + slightly newer ROCM version that got patched into the auto install, I don't need those parameters anymore unless I want to use SD 2.1 or models based on it

This looks like an issue related to some card not implementing operations on half floats. Try running the script with those two arguments, see if this helps: ' --precision full --no-half'

I have modified in webui-user.bat file, but it still hasn't been solved. Is it because of my MAC?

if you are MAC to modify weui-macos-env.sh file ,modify its some line to :export COMMANDLINE_ARGS="--precision full --no-half --skip-torch-cuda-test"

./webui.sh --precision full --no-half

in the end use this command instead of ./webui.sh

This looks like an issue related to some card not implementing operations on half floats. Try running the script with those two arguments, see if this helps: ' --precision full --no-half'

I have modified in webui-user.bat file, but it still hasn't been solved. Is it because of my MAC?

I have a mac with an m1 chip, works fine but very slow. Open up the webui folder in the command line, and type

./webui.sh --precision full --no-half

This will activate a server link in the cli (it is clickable, locally hosted, looks like this http://127.0.0.1:787-0/ ). Open that up in your browser and if all your models are installed correctly, you should have stable diffusion running. TIPS: do a quick search to see if your system can support stable diffusion. on the https://github.com/AUTOMATIC1111/stable-diffusion-webui git hub repo, there are specific instructions for Apple Silicon. Also, this guy https://www.youtube.com/watch?v=DUqsYm_rYcA has a short tutorial on how to get automatic 1111 (necessary for SD) on your mac (if it is supported).

Hi guys I had a problem with this error"upsample_nearest2d_channels_last" not implemented for 'Half' and I could fix it with this export COMMANDLINE_ARGS="--precision full --no-half --skip-torch-cuda-test" also I changer the command to this and finally it worked, but when it generated the image I couldn't even see it or it was too pixelated I changed the width and height to 768 I've got this error:RuntimeError: MPS backend out of memory (MPS allocated: 4.54 GB, other allocations: 2.32 GB, max allowed: 6.77 GB). Tried to allocate 256 bytes on private pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure).
Time taken: 3m 51.57sis there any one to help me regarding this
and I have mac

This looks like an issue related to some card not implementing operations on half floats. Try running the script with those two arguments, see if this helps: ' --precision full --no-half'

I have modified in webui-user.bat file, but it still hasn't been solved. Is it because of my MAC?

if you are MAC to modify weui-macos-env.sh file ,modify its some line to :export COMMANDLINE_ARGS="--precision full --no-half --skip-torch-cuda-test"

this worked for me, although when I cloned the repo it already have several other arguments in there, including "skip-torch-cuda-test", but it was missing "--precision full --no-half"

To whom it may concern, I used the args command "--skip-torch-cuda-test --precision full --no-half --no-progressbar-hiding --opt-channelslast" as indicated previously on windows 11 (a standard professional PC without GPU) and it solved the issue.

Sign up or log in to comment