Usage with diffusers

#2
by ayan4m1 - opened

Hello, I am trying to get this model to run using the diffusers StableDiffusionPipeline/StableDiffusionImg2ImgPipeline. As best I can tell, there are two formats for models - CKPT and then "directory with model_info.json" - and diffusers only supports loading the latter using from_pretrained - is there anything I'm missing or straightforward script that could be added to the documentation that shows how to use this model with the diffusers package?

EDIT: Thank you so much for this work!

I haven't tried it myself, but it looks like there is a PR for checkpoint conversion:
https://github.com/huggingface/diffusers/pull/154

That's great, is there any chance you could provide the "YAML config file corresponding to the original architecture"? It's a required argument for that script.

Thanks! Unfortunately, it seems like that YAML is missing configs for {'safety_checker', 'text_encoder', 'vae', 'feature_extractor'} - I get the following when trying to initialize a pipeline with the model that the script generates:

TypeError: init() missing 4 required positional arguments: 'vae', 'text_encoder', 'safety_checker', and 'feature_extractor'

Any thoughts? Thanks so much for your help!

Can you try downloading the original (diffusers) SD1.4 repo and merge the missing/required files?

This repo is intended to be used with latent-diffusion/stable-diffusion repos rather than diffusers, and if anyone is willing to upload a converted checkpoint I'd welcome.

Hi, @ayan4m1 @naclbit
I converted the model a while ago, and if I remember correctly, I think the following is what I did after running the script mentioned above:

  • Rename vqvae to vae and bert to text_encoder in the config file.
  • Rename the directories vqvae to vae and bert to text_encoder.
  • Copy the directories feature_extractor and safety_checker from the directory of diffusers' version of Stable Diffusion model.

The converted model seems to be working fine, but I haven't checked if the generated results are the same with the original and converted models.

Hope this helps.

@naclbit Awesome work, btw.

@hysts Thank you so much, that worked perfectly!

@naclbit I have the converted checkpoint ready, would you prefer me to make a pull request against this repository or make a new one?

@ayan4m1
FYI, I realized that you don't have to manually rename or copy files by applying the following patch to the ldm-txt2im-conv-script branch of diffusers.

--- a/scripts/convert_ldm_txt2img_original_checkpoint_to_diffusers.py
+++ b/scripts/convert_ldm_txt2img_original_checkpoint_to_diffusers.py
@@ -22,9 +22,10 @@ try:
 except ImportError:
     raise ImportError("OmegaConf is required to convert the LDM checkpoints. Please install it with `pip install OmegaConf`.")

-from transformers import  BertTokenizerFast, CLIPTokenizer, CLIPTextModel
-from diffusers import LDMTextToImagePipeline, AutoencoderKL, UNet2DConditionModel, DDIMScheduler
+from transformers import  BertTokenizerFast, CLIPFeatureExtractor, CLIPTokenizer, CLIPTextModel
+from diffusers import StableDiffusionPipeline, AutoencoderKL, UNet2DConditionModel, DDIMScheduler
 from diffusers.pipelines.latent_diffusion.pipeline_latent_diffusion import LDMBertModel, LDMBertConfig
+from diffusers.pipelines.stable_diffusion import StableDiffusionSafetyChecker


 def shave_segments(path, n_shave_prefix_segments=1):
@@ -595,6 +596,8 @@ if __name__ == "__main__":
         tokenizer = BertTokenizerFast.from_pretrained("bert-base-uncased")

     scheduler = create_diffusers_schedular(original_config)
-    pipe = LDMTextToImagePipeline(vqvae=vae, bert=text_model, tokenizer=tokenizer, unet=unet, scheduler=scheduler)
+    safety_checker = StableDiffusionSafetyChecker.from_pretrained('CompVis/stable-diffusion-safety-checker')
+    feature_extractor = CLIPFeatureExtractor()
+    pipe = StableDiffusionPipeline(vae=vae, text_encoder=text_model, tokenizer=tokenizer, unet=unet, scheduler=scheduler, safety_checker=safety_checker, feature_extractor=feature_extractor)
     pipe.save_pretrained(args.dump_path)

Related issue: https://github.com/huggingface/diffusers/issues/491

Hey,

Happy to help with the conversion to diffusers if you want!

@hysts - should we maybe do a clean stable diffusion script that makes conversion a bit easier? Happy to help with a PR that adds a clean script

I think it would make sense to put everything in the same repo or else we could also have two branches "compvis" and "diffusers" - up to you!

@ayan4m1
Please do upload as the different repo! I think it's handy to have two versions.

@patrickvonplaten
Yes I agree that two branches would be more convenient given that diffusers port is technically a conversion, and some people may want to continue training from the checkpoints.

Cool - should we go for two branches then?
@naclbit you have 3 options I guess :sweat:

  1. The "main" branch stays this model and we add a "diffusers" branch (and potentially add a copy of "main" to a "diffusers" branch)
  2. The "main" branch becomes "diffusers" and we add a "compvis" branch (and potentially add a copy of "main" to a "diffusers" branch)

Maybe easiest to go for 1) which would essentially mean to just add a diffusers branch

@patrickvonplaten
The option 1 sounds like a stress-free way to handle this.

Super - @ayan4m1 would you like to open a PR with the converted weights? Then I could somewhat manually move the weigths into a branch (sadly we cannot do this in an easy way yet)

Pushed the diffusers model to https://huggingface.co/ayan4m1/trinart_diffusers_v2 - feel free to pull into your diffusers branch, I couldn't figure out how to open a PR against any branch other than main.

Working on trimming it down to be a bit smaller, but at least it works for now.

Thanks a lot @ayan4m1 - I've used your checkpoitns and added it and two newly converted checkpoints to this repo as discussed. It should now be trivially easy to use them with diffusers:

from torch import autocast
from diffusers import StableDiffusionPipeline

model_id = "naclbit/trinart_stable_diffusion_v2"
device = "cuda"


pipe = StableDiffusionPipeline.from_pretrained(model_id)
pipe = pipe.to(device)

prompt = "A magical dragon flying in front of the Himalaya in manga style"
with autocast("cuda"):
    image = pipe(prompt).images[0]

By default, I've added the K-LMS sampler as proposed in the README.

Also opened a PR to add some examples to the README: https://huggingface.co/naclbit/trinart_stable_diffusion_v2/discussions/4

Very cool work @naclbit and @ayan4m1

This is a minor thing, but I think the branch needs to be specified as follows in the example code above:

pipe = StableDiffusionPipeline.from_pretrained(model_id, revision='diffusers-115k')

Ah, but the example code in https://huggingface.co/naclbit/trinart_stable_diffusion_v2/discussions/4 is correct, so it doesn't matter.

Thanks all for the help here, closing this as we've solved the problem I came in with.

ayan4m1 changed discussion status to closed

Sign up or log in to comment