File size: 7,236 Bytes
b8b279c
 
 
5e22649
 
 
696afbc
5e22649
 
 
 
 
 
 
 
 
f957023
5e22649
 
 
 
 
 
f957023
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5e22649
 
 
 
 
 
 
 
f957023
 
5e22649
 
 
 
 
 
 
f957023
 
5e22649
 
 
 
 
f957023
 
5e22649
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f957023
 
 
 
 
 
 
 
 
12b5b52
f957023
 
 
 
 
 
 
 
 
 
 
12b5b52
f957023
 
 
5e22649
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
---
license: other
---

# WD 1.5 Beta 3 (Diffusers-compatible)

<img width="582px" height="256px" src="https://birchlabs.co.uk/share/radiance0triptych.jpg" title="Triptych of Reimu, Sanae and Flandre in 'radiance' aesthetic">

This unofficial repository hosts diffusers-compatible float16 checkpoints of WD 1.5 beta 3.  
Float16 is [all you need](https://twitter.com/Birchlabs/status/1599903883278663681)  for inference.

## Usage (via diffusers)

```python
# make sure you're logged in with `huggingface-cli login`
from diffusers import StableDiffusionPipeline, DPMSolverMultistepScheduler
from diffusers.models.autoencoder_kl import AutoencoderKL
from diffusers.pipelines.stable_diffusion import StableDiffusionPipelineOutput
import torch
from torch import Generator, compile
from PIL import Image
from typing import List

vae: AutoencoderKL = AutoencoderKL.from_pretrained('hakurei/waifu-diffusion', subfolder='vae', torch_dtype=torch.float16)

# scheduler args documented here:
# https://github.com/huggingface/diffusers/blob/0392eceba8d42b24fcecc56b2cc1f4582dbefcc4/src/diffusers/schedulers/scheduling_dpmsolver_multistep.py#L83
scheduler: DPMSolverMultistepScheduler = DPMSolverMultistepScheduler.from_pretrained(
  'Birchlabs/wd-1-5-beta3-unofficial',
  subfolder='scheduler',
  # sde-dpmsolver++ is very new. if your diffusers version doesn't have it: use 'dpmsolver++' instead.
  algorithm_type='sde-dpmsolver++',
  solver_order=2,
  # solver_type='heun' may give a sharper image. Cheng Lu reckons midpoint is better.
  solver_type='midpoint',
  use_karras_sigmas=True,
)

# variant=None
# variant='ink'
# variant='mofu'
variant='radiance'
# variant='illusion'
pipe: StableDiffusionPipeline = StableDiffusionPipeline.from_pretrained(
  'Birchlabs/wd-1-5-beta3-unofficial',
  torch_dtype=torch.float16,
  vae=vae,
  scheduler=scheduler,
  variant=variant,
)
pipe.to('cuda')
compile(pipe.unet, mode='reduce-overhead')

# WD1.5 was trained on area=896**2 and no side longer than 1152
sqrt_area=896
# note: pipeline requires width and height to be multiples of 8
height = 1024
width = sqrt_area**2//height

prompt = 'artoria pendragon (fate), reddizen, 1girl, best aesthetic, best quality, blue dress, full body, white shirt, blonde hair, looking at viewer, hair between eyes, floating hair, green eyes, blue ribbon, long sleeves, juliet sleeves, light smile, hair ribbon, outdoors, painting (medium), traditional media'
negative_prompt = 'lowres, bad anatomy, bad hands, missing fingers, extra fingers, blurry, mutation, deformed face, ugly, bad proportions, monster, cropped, worst quality, jpeg, bad posture, long body, long neck, jpeg artifacts, deleted, bad aesthetic, realistic, real life, instagram'

# pipeline invocation args documented here:
# https://github.com/huggingface/diffusers/blob/0392eceba8d42b24fcecc56b2cc1f4582dbefcc4/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py#LL544C18-L544C18
out: StableDiffusionPipelineOutput = pipe.__call__(
  prompt,
  negative_prompt=negative_prompt,
  height=height,
  width=width,
  num_inference_steps=22,
  generator=Generator().manual_seed(1234)
)
images: List[Image.Image] = out.images
img, *_ = images

img.save('out_pipe/saber.png')
```

Should output the following image:

<img height="256px" src="https://birchlabs.co.uk/share/saber-radiance.smol.jpg" title="Saber in 'radiance' aesthetic">

## How WD1.5b3 CompVis checkpoint was converted

I converted the official [CompVis-style checkpoints](https://huggingface.co/waifu-diffusion/wd-1-5-beta3) using [kohya's converter script](https://github.com/bmaltais/kohya_ss/blob/master/tools/convert_diffusers20_original_sd.py).

To convert the five aesthetics: I added [converter support](https://github.com/Birch-san/diffusers-play/commit/b8b3cd31081e18a898d888efa7e13dc2a08908be) for [checkpoint variants](https://huggingface.co/docs/diffusers/using-diffusers/loading#checkpoint-variants).

I [commented-out](https://github.com/Birch-san/diffusers-play/blob/b8b3cd31081e18a898d888efa7e13dc2a08908be/src/kohya/library/model_util.py#L869-L874) vae-conversion, because WD 1.5 b3 does not distribute a VAE. Instead it re-uses WD1.4's VAE (checkpoints: [CompVis](https://huggingface.co/hakurei/waifu-diffusion-v1-4) [diffusers](https://huggingface.co/hakurei/waifu-diffusion/tree/main/vae)).

I told the converter to [load WD 1.4's VAE](https://github.com/Birch-san/diffusers-play/blob/b8b3cd31081e18a898d888efa7e13dc2a08908be/src/kohya/library/model_util.py#L1065-L1066).

I invoked my modified [`scripts/convert_diffusers20_original_sd.py`](https://github.com/Birch-san/diffusers-play/blob/b8b3cd31081e18a898d888efa7e13dc2a08908be/scripts/convert_diffusers20_original_sd.py) like so:

```bash
python scripts/convert_diffusers20_original_sd.py \
--fp16 \
--v2 \
--unet_use_linear_projection \
--use_safetensors \
--reference_model stabilityai/stable-diffusion-2-1 \
--variant illusion \
in/wd-1-5-beta3/wd-beta3-base-fp16.safetensors \
out/wd1-5-b3
```

Except the "base" aesthetic was a special case, where I didn't pass any `--variant <whatever>` option.

### Why is there a `vae` folder

The `vae` folder contains copies of WD 1.4's VAE, to make it easier to load stable-diffusion via diffusers [pipelines](https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines#readme).  
I saved a duplicate of the VAE for each variant.

So you _can_ skip the `vae` arg, and load the pipeline like this:

```python
pipe: StableDiffusionPipeline = StableDiffusionPipeline.from_pretrained(
  'Birchlabs/wd-1-5-beta3-unofficial',
  torch_dtype=torch.float16,
  variant='radiance',
)
```

But I recommend to supply the WD1.4 `vae` explicitly, to save disk space (i.e. because you already had WD1.4, or because you intend to try multiple variants of WD1.5 and don't want to download VAE duplicates for each variant):

```python
vae: AutoencoderKL = AutoencoderKL.from_pretrained('hakurei/waifu-diffusion', subfolder='vae', torch_dtype=torch.float16)

pipe: StableDiffusionPipeline = StableDiffusionPipeline.from_pretrained(
  'Birchlabs/wd-1-5-beta3-unofficial',
  torch_dtype=torch.float16,
  variant='radiance',
  vae=vae,
)
```

## Original model card

![WD 1.5 Radiance](https://i.ibb.co/hYjgvGZ/00160-2195473148.png)

For this release, we release five versions of the model:

  - WD 1.5 Beta3 Base
  - WD 1.5 Radiance
  - WD 1.5 Ink
  - WD 1.5 Mofu
  - WD 1.5 Illusion

The WD 1.5 Base model is only intended for training use. For generation, it is recomended to create your own finetunes and loras on top of WD 1.5 Base or use one of the aesthetic models. More information and sample generations for the aesthetic models are in the release notes

### Release Notes

https://saltacc.notion.site/WD-1-5-Beta-3-Release-Notes-1e35a0ed1bb24c5b93ec79c45c217f63
# VAE
WD 1.5 uses the same VAE as WD 1.4, which can be found here https://huggingface.co/hakurei/waifu-diffusion-v1-4/blob/main/vae/kl-f8-anime2.ckpt


## License
WD 1.5 is released under the Fair AI Public License 1.0-SD (https://freedevproject.org/faipl-1.0-sd/). If any derivative of this model is made, please share your changes accordingly. Special thanks to ronsor/undeleted (https://undeleted.ronsor.com/) for help with the license.