Birchlabs
/

waifu-diffusion-xl-unofficial

Text-to-Image

Diffusers

English

StableDiffusionXLPipeline

stable-diffusion-xl

Model card Files Files and versions Community

Birchlabs commited on Jul 24, 2023

Commit

95328ee

1 Parent(s): 33eca3b

Update README.md

Browse files

Files changed (1) hide show

README.md +30 -5

README.md CHANGED Viewed

@@ -27,8 +27,11 @@ Diffusers' StableDiffusionXLPipeline convention handles text encoders + UNet + V
 ```python
 from diffusers import StableDiffusionXLPipeline, DPMSolverMultistepScheduler
 import torch
 from torch import Generator
 # scheduler args documented here:
 # https://github.com/huggingface/diffusers/blob/main/src/diffusers/schedulers/scheduling_dpmsolver_multistep.py#L98
@@ -43,6 +46,8 @@ scheduler: DPMSolverMultistepScheduler = DPMSolverMultistepScheduler.from_pretra
   use_karras_sigmas=True,
 )
 pipe: StableDiffusionXLPipeline = StableDiffusionXLPipeline.from_pretrained(
   'Birchlabs/waifu-diffusion-xl-unofficial',
   scheduler=scheduler,
@@ -50,12 +55,15 @@ pipe: StableDiffusionXLPipeline = StableDiffusionXLPipeline.from_pretrained(
   use_safetensors=True,
   variant='fp16'
 )
-pipe.to("cuda")
 prompt = 'masterpiece, best quality, 1girl, green hair, sweater, looking at viewer, upper body, beanie, outdoors, watercolor, night, turtleneck'
 negative_prompt = 'lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry, artist name'
-images = pipe(
   prompt=prompt,
   negative_prompt=negative_prompt,
   num_inference_steps=25,
@@ -64,9 +72,13 @@ images = pipe(
   target_size=(1024, 1024),
   height=1024,
   width=1024,
-  generator=Generator().manual_seed(45),
-).images[0]
 ```
 ### UNet2DConditionModel
@@ -88,7 +100,20 @@ base_unet: UNet2DConditionModel = UNet2DConditionModel.from_pretrained(
 ## How it was converted
-I used Kohya's converter script. See [this commit](https://github.com/Birch-san/diffusers-play/commit/3f16355dd0064932d0bf356ed78676089b9e46ca), and my [previous explanation](https://huggingface.co/Birchlabs/wd-1-5-beta3-unofficial#how-wd15b3-compvis-checkpoint-was-converted) for a bit more detail on how I invoke such scripts.
 ### NOTE: The work here is a Work in Progress! Nothing in this repository is final.

 ```python
 from diffusers import StableDiffusionXLPipeline, DPMSolverMultistepScheduler
+from diffusers.pipelines.stable_diffusion_xl import StableDiffusionXLPipelineOutput
 import torch
 from torch import Generator
+from PIL import Image
+from typing import List
 # scheduler args documented here:
 # https://github.com/huggingface/diffusers/blob/main/src/diffusers/schedulers/scheduling_dpmsolver_multistep.py#L98
   use_karras_sigmas=True,
 )
+# pipeline args documented here:
+# https://github.com/huggingface/diffusers/blob/95b7de88fd0dffef2533f1cbaf9ffd9d3c6d04c8/src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl.py#L548
 pipe: StableDiffusionXLPipeline = StableDiffusionXLPipeline.from_pretrained(
   'Birchlabs/waifu-diffusion-xl-unofficial',
   scheduler=scheduler,
   use_safetensors=True,
   variant='fp16'
 )
+pipe.to('cuda')
+# StableDiffusionXLPipeline is hardcoded to cast the VAE to float32, but Ollin's VAE works fine in float16
+pipe.vae.to(torch.float16)
 prompt = 'masterpiece, best quality, 1girl, green hair, sweater, looking at viewer, upper body, beanie, outdoors, watercolor, night, turtleneck'
 negative_prompt = 'lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry, artist name'
+out: StableDiffusionXLPipelineOutput = pipe(
   prompt=prompt,
   negative_prompt=negative_prompt,
   num_inference_steps=25,
   target_size=(1024, 1024),
   height=1024,
   width=1024,
+  generator=Generator().manual_seed(48),
+)
+images: List[Image.Image] = out.images
+img, *_ = images
+img.save('waifu.png')
 ```
 ### UNet2DConditionModel
 ## How it was converted
+I used Kohya's converter script, to convert the official (`hakurei/waifu-diffusion-xl`) [`wdxl-aesthetic-0.9.safetensors`](https://huggingface.co/hakurei/waifu-diffusion-xl/blob/main/wdxl-aesthetic-0.9.safetensors). See [this commit](https://github.com/Birch-san/diffusers-play/commit/3f16355dd0064932d0bf356ed78676089b9e46ca).
+I forked [kohya's converter script](https://github.com/bmaltais/kohya_ss/blob/master/tools/convert_diffusers20_original_sd.py), making one [for SDXL](https://github.com/Birch-san/diffusers-play/blob/3f16355dd0064932d0bf356ed78676089b9e46ca/scripts/convert_diffusers20_original_sdxl.py).
+I invoked it like so:
+```bash
+python scripts/convert_diffusers20_original_sdxl.py \
+--fp16 \
+--use_safetensors \
+--reference_model stabilityai/stable-diffusion-xl-base-0.9 \
+in/wdxl-aesthetic-0.9.safetensors \
+out/wdxl-diffusers
+```
 ### NOTE: The work here is a Work in Progress! Nothing in this repository is final.