Better example code?

#13
by Softology - opened

Using the example code in the git text_to_image.ipynb I was able to get text to image working.
There is no example code for the "image to image" or "image variations"?
I was able to get basic Text-to-Image using the diffusers code you show on the Model card, but that does not show how to set a seed image or create image variations.
Can you provide some more complete examples (using diffusers) for Text-to-Image, Image-to-Image and Image Variations? Or documentation for all parameters for StableCascadeDecoderPipeline and StableCascadePriorPipeline?
Thanks.

Example Colab here: https://colab.research.google.com/drive/1qV14_OzZDNx6G-Lx2NE2Imk_7dfDbwkm?usp=sharing

OK, thanks, I did manage to get the diffusers Text-to-Image working (I should have updated this post). I don't see any options/parameters for a seed/init image in your notebook?

I am really after the syntax to pass in an init_image to the prior pipeline. This code (similar to your notebook) runs OK without any errors (for Text-to-Image), but the init image I specify does not affect the output.

init_image = load_image("D:\\seedimage.jpg")

prior_output = prior(
    prompt="prompt goes here",
    negative_prompt="negative prompt here",
    generator=generator,
    
    image=init_image,  #does nothing
    strength=0.5,    #also seems to do nothing
    
    width=1024,
    height=1024,
    guidance_scale=4.0,
    num_inference_steps=20,
    num_images_per_prompt=1
)

Is there any documentation for all the acceptable parameters for the StableCascadeDecoderPipeline and StableCascadePriorPipeline pipelines?

The source code for the prior pipeline
https://github.com/kashif/diffusers/blob/wuerstchen-v3/src/diffusers/pipelines/stable_cascade/pipeline_stable_cascade_prior.py
shows an image parameter as a list of images, but if I create a list of images, add my seed image, then pass that in it still seems to be ignored.

The Stability blog post shows image to image results, so it must be possible?
https://stability.ai/news/introducing-stable-cascade

Thanks for any tips or snippet of code to get this working.

Currently only text to image is properly supported in diffusers. For everything else we recommend using the GitHub inference code. There are several notebooks for everything

Currently only text to image is properly supported in diffusers. For everything else we recommend using the GitHub inference code. There are several notebooks for everything

Can you give me a link to the notebook(s) that show how to use image to image?

Will image to image be implemented in diffusers? The diffuser code is so much faster (over 2x here compared to the non diffuser code).
For now, a working but slow image to image is better than nothing.

Thanks.

The example code is giving an error:

ValueError: Cannot load /home/fenrir/.cache/huggingface/hub/models--stabilityai--stable-cascade/snapshots/f2a84281d6f8db3c757195dd0c9a38dbdea90bb4/decoder because embedding.1.weight expected shape tensor(..., device='meta', size=(320, 64, 1, 1)), but got torch.Size([320, 16, 1, 1]). If you want to instead overwrite randomly initialized weights, please make sure to pass both low_cpu_mem_usage=False and ignore_mismatched_sizes=True. For more information, see also: https://github.com/huggingface/diffusers/issues/1619#issuecomment-1345604389 as an example.

If I pass low_cpu_mem_usage=False and ignore_mismatched_sizes=True, I got the following error:

RuntimeError: Error(s) in loading state_dict for StableCascadeUnet:
size mismatch for embedding.1.weight: copying a param with shape torch.Size([320, 16, 1, 1]) from checkpoint, the shape in current model is torch.Size([320, 64, 1, 1]).
You may consider adding ignore_mismatched_sizes=True in the model from_pretrained method.

The code giving this error is as follows:

decoder = StableCascadeDecoderPipeline.from_pretrained(
"stabilityai/stable-cascade",
torch_dtype=torch.float16,
low_cpu_mem_usage=False,
ignore_mismatched_sizes=True).to(device)

How can I solve this? Thanks in advance.

To fix the size mismatch error, replace
"c_in": 4
with
"in_channels": 4
in the decoder config.json. (Reason is this commit: https://github.com/kashif/diffusers/commit/cbd07758adc1848f3a9eb115f9467e43f8560726)

Currently only text to image is properly supported in diffusers. For everything else we recommend using the GitHub inference code. There are several notebooks for everything

OK, thanks, I can see the (non diffusers) Image-to-Image in the text_to_image.ipynb

Any idea when or if this functionality will be added to the diffuser pipelines? They run over twice as fast and seem to use less VRAM.

This functionality Softology mentions above would be very useful for faster rendering, I am currently waiting for this now.

Where is the decoder config.json file please?

Sign up or log in to comment