Model conversion differences VS Mochi Diffusion and future improvements?

#15
by jettan - opened

First of all, I'd like to give huge compliments for the great job done!
The UI is very snappy and feature-wise it is ahead of mainstream due to support of ControlNet extensions and textual inversions (on MacOS at least... any chances they will also arrive for iOS?)

I would like to ask whether there is some difference between the models converted by Guernika Model Converter/script provided in this repository and the procedure described in Mochi Diffusion repository: https://github.com/godly-devotion/MochiDiffusion/wiki/How-to-convert-Stable-Diffusion-models-to-Core-ML

I tried to use the models that were converted from the one on the other and vice versa, but was not able to get it working.
Additionally, it seems that the Guernika Model Converter adds some metadata (guernika.json). I thought that by making this and putting the coreml converted models would be enough, but it seems to crash Guernika, so clearly something else is done additionally. It would be nice if they were cross compatible through apps.

Also, I'd like to ask whether there are plans to support HD upscaling (with e.g., Real ESRGAN) just like Mochi Diffusion has (they even have it included in their repository.

Lastly, do you have any data on how much memory is needed for an iOS device to run stable diffusion with Unet chunked VS non chunked?

Thanks in advance.

Let me answer !
Guernika models are a little different, because the author implemented features before apple's official ones (yea, big brain), so it diverged a bit.
While Mochi refers to Apple official implementation (stuck at t2i/i2i as today), it lacks the new features like inpainting, Textual Inversion, ControlNet, Prompt weighting etc...
Transfering model into Guernika may work with text2image, not the other way around, Apple/Mochi has to catch up xD

You also have graphical user interface for converting models simply, you can see amazing updates pop often.

  • In my honest personal opinion, HD upscale is possible (and wanted), I looked up for myself but I found little sample codes online and I wouldn't leech a concurrent app code...
    If that's all you needed from cross compatibility, I bet it Upscale will come any time soon.

  • For iOS devices, you MUST have it chunked, no matter what, I usually chunk every time for my iPad/Mac usage, I guess the chunking is supposed to swap chunks one after another thus not fill the whole ram.

Welcome to the gang.

Thanks for the prompt and clear response!

Interesting! Looking forward to updates here then.
I'll try chunked on iPad then (on macbook air M2 I have had no problems without chunks), but did not have luck on iPad.

And thanks for the warm welcome. Glad to be here and experience the next wave of AI first hand.

Here's how it works when the model is chunked :

  • on iOS devices it does the swap-thing
  • on Mac, both chunks are loaded into memory
    So you can have the chunked model for both iPad/Mac, so everytime I need a 512x512 I always chunk/AllComputeMode and have a sync to play with.
    Edit : uncheck "Chunk U-net" if you want to make it Mac-only, checked is for both

@GuiyeC must confirm everything tho, I'm just part of the fanbase :D

Have fun :)

I got it working as you described. Thanks!
Sadly I was a bit ambitious for memory constraints trying to get it working on an iPad mini 6 (4GB RAM) and even chunked could not help this case...
It's really just for testing/fun purposes though, as I do have other compute resources available. Appreciate the help.

It looks like in CoreML models they have all kinds of upscaler models converted already: https://github.com/john-rocky/CoreML-Models
I might try something myself to see how it works. Coming from Python, Swift is very new to me.

Guernika org

@jettan hey! thank you for your comments and thank you too @ZProphete for helping here.

I got good news on the model compatibility, I created a Pull Request in Apple’s main repository and this should bring compatibility to newer models converted using Apple’s official repository, which is the one used in the instructions you shared. But yes, the reason was pretty much what ZProphete explained, img2img implementation was a bit different and so the enconder module would be different.

I have the upscaling in my TODO list, should be easy to implement but there are a few apps that do this already like Upscalo and that’s why I didn’t hurry.

About the chunks, I understand that unchunked models should be faster in macOS, the step is calculated at once in the whole Unet instead of having it calculated in two and then joined, I’m not sure how noticeable the difference is. In iOS, there is some loading/unloading of resources as ZProphete explained, and I don’t think the unchunked model would work, I have not tested this thoroughly as I simply don’t have one of the higher end iPad Pros, I would recommend testing what works best on your device.

Thanks for the response!

Wow, what a timely PR. Looking forward to test it out between different apps and all the updates you may add.
For now I have given up on running models on my iOS devices as they are a bit outdated. Will revisit this some day when I decide to upgrade my iPhone or so, but for now will stick to macOS variants. Thanks for the information.

If there is anything that needs testing after a release, please let me know and I'd be more than happy to test out as user.

Sign up or log in to comment