Convert control net model failed with "Missing key(s) in state_dict" error

#4
by TimYao - opened

I run the command as 3. Converting A ControlNet Model.txt Line 5:

python -m python_coreml_stable_diffusion.torch2coreml --convert-controlnet lllyasviel/control_v11p_sd15_softedge --model-version "runwayml/stable-diffusion-v1-5" --bundle-resources-for-swift-cli --attention-implementation SPLIT_EINSUM -o "./SoftEdge" 

But meet error as below:

INFO:__main__:Converting controlnet
INFO:__main__:Sample ControlNet inputs spec: {'sample': (torch.Size([2, 4, 64, 64]), torch.float32), 'timestep': (torch.Size([2]), torch.float32), 'encoder_hidden_states': (torch.Size([2, 768, 1, 77]), torch.float32), 'controlnet_cond': (torch.Size([2, 3, 512, 512]), torch.float32)}
Traceback (most recent call last):
  File "/Users/kyd6/miniconda3/envs/coreml_stable_diffusion/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/Users/kyd6/miniconda3/envs/coreml_stable_diffusion/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/Users/kyd6/Downloads/ml-stable-diffusion/python_coreml_stable_diffusion/torch2coreml.py", line 1427, in <module>
    main(args)
  File "/Users/kyd6/Downloads/ml-stable-diffusion/python_coreml_stable_diffusion/torch2coreml.py", line 1262, in main
    convert_controlnet(pipe, args)
  File "/Users/kyd6/Downloads/ml-stable-diffusion/python_coreml_stable_diffusion/torch2coreml.py", line 1144, in convert_controlnet
    load_state_dict_summary = reference_controlnet.load_state_dict(
  File "/Users/kyd6/miniconda3/envs/coreml_stable_diffusion/lib/python3.8/site-packages/torch/nn/modules/module.py", line 2041, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for ControlNetModel:
    Missing key(s) in state_dict: "down_blocks.3.downsamplers.0.conv.weight", "down_blocks.3.downsamplers.0.conv.bias". 

What should I do to make the conversion success?

Core ML Models org

I can duplicate the error message that you are getting. Something has been changed in one of the packages involved in doing these conversions (maybe torch) that breaks compatibility with one of the other packages (maybe python_coreml_stable_diffusion). This happens from time to time as packages get updated. I think I remember this particular break happening once before, and that I was able to fix it by commenting out a line in one of the files listed in the error message. I will look into this in the next day or two. In the meantime, almost any converted CN model you might need is already available in this repo, including all of the ones from the 14 official CN models converted for Split-Einsum. You should only need to be doing a conversion yourself if you need a specific Original model type size that has not already been converted, or you are trying to convert from an "unofficial" CN model (which often does not work because they deviate from the specifications). You are probably experimenting just to learn though, so I will try to find a fix for you.

Actually, what I'm trying to do is to switch to the usable Split_Einsum_V2 version and incorporate the palletized 6-bit feature. Up to now, whenever I've attempted to test Stable Diffusion + ControlNet on the iPhone 14 Pro, it has failed due to insufficient memory. I'm wondering if using this format would allow for successful execution on the iPhone.

Core ML Models org

Full use of the optimizations in s-e-v2 may require iOS 17. This is the case in macOS 13 vs 14. So unless you are running the iOS 17 beta, you may need to wait a bit.

Beyond that, there are a lot of reports of s-e-v2 not really working at all, particularly on iOS. I don't use any of this on iOS, so I can't say more than suggesting that you check through the open and closed issues at ml-stable-diffusion.

Back to tying to do the conversion yourself . . . If I go back to using coremltools 6.3.0 and ml-stable-diffusion 0.4.0, the ControlNet conversion works again. I will try some of these packages in newer versions, one at a time, to see if I can pin the issue down to just one package.

Palletizing and s-e-v2 were both added in ml-stable-diffusion 1.0.0, so I will try with that package upgraded first. It "requires" coremltools 7.0b1, but I'll try first without that. I want to change just one thing at a time. But I have one other issue to look into first. I should have some additional info to report here in a few hours.

Core ML Models org

The issue is with the upgrade from ml-stable-diffusion 0.4.0 to 1.0.0. The current coremltools package 7.0b1 and the torch 2.0.0 (or the torch 2.0.1 package) work fine with ml-stable-diffusion 0.4.0. But updating to ml-stable-diffusion to 1.0.0 causes the ControlNet conversion process to errors out.

The torch2coreml.py file that lists in the error message is a part of the ml-stable-diffusion package. It is in the python_coreml_stable_diffusion subfolder that gets linked to the miniconda environment python site-packages. It is the primary script running all conversions.

If I substitute just the older torch2coreml.py file from the 0.4.0 ml-stable-diffusion package into the newer ml-stable-diffusion 1.0.0 package (and thus into the python_coreml_stable_diffusion package), the conversion still errors out. And this older torch2coreml.py from 0.4.0 doesn't support split-einsum-v2 or palletizing anyway, which is our goal.

But, If I substitute just the newer torch2coreml.py file from the 1.0.0 ml-stable-diffusion package into the older ml-stable-diffusion 0.4.0 package (and thus into the python_coreml_stable_diffusion package), the conversion completes.

This may be a quick fix for getting a split-einsum-v2 palletized conversion done. I will try it shortly. I need to try it with the --quantize-nbits argument and test the result, and then with the argument for split-einsum-v2 (I assume there is one) and test that result. Be back in a while.

Core ML Models org
β€’
edited Aug 15, 2023

The above fix works for palletizing, but not for split-einsum-v2. I did a regular split-einsum SoftEdge 6-bit conversion of SoftEdge and it works. Save yourself some time dealing with the conversion and download it from here:

https://huggingface.co/jrrjrr/Playground/blob/main/SoftEdge_S-E_6b.zip

If it solves the memory problem and you want to make other conversions and the post above is not detailed enough, leave a message here and I'll post better instructions.

It is possible that this fix for converting ControlNets breaks converting regular models, so one might need to keep two miniconda environments or switch out some files/folders from a single one depending on what is being converted. It is too late today for me to try a regular conversion with this altered pipeline, but I will tomorrow even if it is just to know for myself.

And I'll file a bug report with Apple in a day or two so they can fix this at some point so that all conversions work with the most current packages, as they should.

Thank you for your help.

After trying the SD1.5 split_einsum_v2 6 bit palletized model with ControlNet SoftEdge_S-E_6b model, it works in 13 steps denoising.
The app crashed after completing the process of generating images for the second time. It might have been because my phone was too hot and overheated.

I think the iPhone should be able to run ControlNet; it's a bit disappointing that it crashes.

Test device: iPhone 14 Pro,
Test OS: iOS 14 beta 5

TimYao changed discussion status to closed
Core ML Models org

You should try it with a SD-1.5 split_einsum 6-bit (not a split_einsum_v2). As I mentioned way up top, there seem to be a lot of problems with v2 on iOS that Apple is ignoring so far. The v1 might work, though, to reduce memory pressure, but not to be faster. I think it is worth a try, and if you do, post back here any results, good or bad.

Core ML Models org

I happen to be converting an SD-1.5 split_einsum 6-bit right now. I will post a link to it in case you don't have access to one yourself. It may take a while to upload it successfully. Uploading to HF has been problematic for me the past week or two.

Core ML Models org

I tried but it still got crash after generating an image with controlNet. Maybe we need iPhone 15 with more RAM to run controlNet.

I'd like to ask if there are any limitations with Core ML's ControlNet?
For example, if the Base Model is quantized to 6 bits, does the ControlNet also need to be quantized to 6 bits, can not be quantized to 4 bit or no quantized? Or if the Base Model uses the Split_einsum_V2 attention implementation, is it not possible for the ControlNet to use Original or Split_Einsum?

Core ML Models org

That's professional. Would you like to join the Mochi Diffusion Discord? There are more people available to answer your questions.

Actually, I think this issue is more related to Apple's ml-stable-diffusion library. But I wanted to ask since some people are able to run it without any problems.

Core ML Models org

I've mixed quantized and non-quantized without problems, so far, but I only tried a few combinations. The quantized models get "converted" to 16 bit on the fly when they are used, so the pipeline is all 16 bit, which make me think any combination would work.

It is probably trickier to mix spilt-einsum, split-einsum-v2, and original because they all use somewhat different unet architectures (dimensions, layers, etc) and the ControlNet unet needs to be able to "merge" with the base model unet. It is something that someone should try, though, in various combinations. I have not tried it at all. I think one of the Discord member said he was able to use a 512x512 original ControlNet model with a split-einsum base model at one point.

Core ML Models org

The people at ml-stable-diffusion that could answer these kinds of things generally don't answer questions. As zhuguanyu suggested, the Mochi Discord will almost always get you answers or help faster.

Sign up or log in to comment