Guernika/CoreMLStableDiffusion · How to load models?

Jan 2, 2023

Do you always have to convert an existing model locally? Or are there files (like the .CKPT files normally) that you can download and load?

I've been trying to figure it out for the past days but I guess I need some help haha.

GuiyeC

Guernika org Jan 2, 2023

@maavangent there are some converted models on this repository zipped, you can go to the "Models" tab and tap on the download model button (Box with and arrow down) and enter the zip URL there.
You can get URLs from the models.json file, you should be using the "model" URL from the model you want to download.
In any case, I'm working on a Model Converter app that will simplify the converting process and on an update to allow downloading these preconverted models from the app.

Thank you for the comment :)

Michaelangelo

Jan 3, 2023

•

edited Jan 3, 2023

I'm running a 2 TB 16" 2021 MacBook Pro with 64GB RAM and 32 GPU cores … but it's no 4090. 😊 Much of the time I'm simultaneously rendering with several implementations open at once.

I've installed Diffusion Bee, Auto1111, InvokeAI, as well as CoreML variant Mocha Diffusion, and I'm running Flight Test with the beta version of CoreML PromptToImage. I'm happy to support the Mac community and any CoreML efforts coming about, so I'm about to purchase Guernika from the App Store.

Surprisingly, I hadn't heard of it before randomly coming across it here just now when searching for CoreML models. If you haven't already, you should post to Reddit r/stablediffusion to get some eyes on it.

What are your plans for the future? Do you have a Discord to discuss ideas for tentative features and implementations?

— Thanks!

GuiyeC

Guernika org Jan 3, 2023

That's a good idea, I will post something on that Reddit!

As for plans for the future, so far I'm using the app and when I think something would be useful I implement it, I have plans to improve collection viewing, prompt history...

You could share ideas here if you want or if you think a Discord would be useful I could try to set one up, or maybe a Reddit?

Thank you for the support!

Michaelangelo

Jan 4, 2023

Reddit would be great, you could also link to Gihub or Huggingface pages with your profile or this page as an option for the Help tab in the app.

What rendering algorithm is being employed by Guernika? Is it PNDM, DPM-Solver, or (…)? Thanks again!

GuiyeC

Guernika org Jan 4, 2023

At the moment Guernika uses the PNDM which is the default one but it has support for DPMSolver too, I have to add an option to change that but I was not able to find a lot of information on what the difference really is and didn't want to confuse people. If you are asking for it, it definitely seems people would find that useful.

Michaelangelo

Jan 4, 2023

Thanks, I know from reading on the Mochi Diffusion Github page that DPM++ gives great results after only 10-25 steps. I'm not familiar with either but was wondering if PNDM does as well; the option would be good to have.

GuiyeC

Guernika org Jan 4, 2023

@Michaelangelo I will add that option on the next update then 👌. Do you have any other requests?

Michaelangelo

Jan 4, 2023

•

edited Jan 4, 2023

Short-term — A lot depends on what we can do given Apple's implementation.

I noticed there's the option for single image or continuous inference. It would be nice to have set values possible between 1 and 100, e.g. 10, 20, 40, 50, 100, if not in increments of n=1.

Also, a way to bulk delete images, rather than having to select each image manually with a right click to delete them one-by-one.

How are tokens handled by the app, what's the maximum? After crossing the threshold for max tokens, are any further tokens silently dropped?—or are they merged to together, as the novel solution employed by the Automatic1111 repo, which consequently doesn't have a length limitation? If that can be worked in, as with the approach Auto1111 took would be the ideal solution.

What's the syntax for tokens, how is prompt weighting handled per token—are different weights allowed as in the Auto1111 instance with parenthesis and brackets or values 1.1, 1.2, etc.?

Long-term — Inpainting, outpainting, Dreambooth and LORA training, different output sizes.

GuiyeC

Guernika org Jan 5, 2023

@Michaelangelo I will think of a nice way of adding the image limit 👌

Yes, I also do want a nicer way of dealing with lots of images but it does come with a lot of things to take into account, maybe I could add a "Show in Finder" for now which would allow selecting multiple images.
At the moment images are stored here /Users/{YOUR_USER}/Library/Containers/com.guiyec.Guernika/Data/Documents/Images

At the moment they are truncated at the TextEconder's input length, I will take a look at merging but that seems tricky to test, I'm not promising anything here 😅

Same for prompt weightning , I have to take a deeper look at how this is handled, at the moment it's just been fed into the TextEncoder and I'm not sure if it's actually taking that into account.

Inpainting should already be working, not an ideal solution but you should be able to load an inpainting model and draw a mask to generate new images.

Outpaining will be cool, I have to improve inpainting implementation and this will hopefully facilitate outpainting.

Any kind of training will probably be out of scope or very far into the future.

Finally, different output sizes, Apple mentions how this could work recommending what to do when converting models, I have tried a lot and I have not been able to convert any models with variable output sizes or even different output sizes that actually work. I really want this to work but we may have to wait for Apple to fix something on CoreML tools before we get it 😕

Michaelangelo

Jan 5, 2023

•

edited Jan 5, 2023

I noted that the solution employed by the developers of CoreML SD GUI PromptToText is to have a different model for each image output size; this is obviously not as convenient as a drop-down box as with Python model implementations but it's a temporary stopgap measure.

@GuiyeC – Also, for selecting multiple images, see the solution employed by PromptToImage for navigating the gallery, both to view images using the arrow key and selecting multiple images for deletion. I believe their project is open-source and on Github so you should be able to copy and paste the code handling over with proper attribution, of course.

GuiyeC

Guernika org Jan 6, 2023

•

edited Jan 6, 2023

Where did you see the different models for different output size?

~~Can you link to this PromptToImage project?~~ Found it

Michaelangelo

Jan 6, 2023

•

edited Jan 6, 2023

It would be nice to have a default model loaded at startup; I noticed every time I start I've got to select and load one …

I also noticed there's no upper limit to the numbers for steps or guidance, when most implementations cap steps at 75-100 and most guidance scales cap at 24. What, realistically, is the result of setting a guidance scale to some crazy high number like 523? Is it clipped behind the scenes to the SD limit of 24?

GuiyeC

Guernika org Jan 7, 2023

@Michaelangelo I did not see the limits you mention on the python implementation, where did you see them? I have not tinkered a lot with guidance but I have tried using more than 100 steps with no problems, I have not tested if the gains are visible after a certain amount of steps though.

Maybe I can add an option to autoload the last model, or auto load it and an option to cancel loading? I agree that having it loaded on start up would be nicer, I didn't do it at first as it can take a while to actually load and people might want to switch models.

Thank you again for all of these comments, I really appreciate them 🙏

Also, I followed your advice and created a reddit community and posted on r/StableDiffusion

Michaelangelo

Jan 8, 2023

•

edited Jan 8, 2023

Excellent!

There are many different implementations, but by far the most popular is the Automatic1111 repo (Wiki, Features, Scripts, Extensions).

UI Values
Sampling steps: (max) 150 (default) 20
Size: (max) 2048×2048 (default) 512×512
Batch count: (max) 100 (default)1
Batch size (max): 8 (default) 1
Guidance Scale: (max) 30 (default) 6-8. Depending on the model, anything over 12-15 can result in noise from overbaked, overtrained results.

Whether or not there are gains to be had in higher step counts depends in part on the decoder selected. See attached grid plot showing the effect of different CFG levels and link to Reddit with a run-down comparison of different samplers at different step counts.

Reddit: Sampler and Step Count Comparison

CFG Strength Comparison

kosmar

Nov 18, 2024

cannot download any model on macos 15.1 M4, always hitting timeouts.