File size: 6,252 Bytes
acbcad5
 
 
 
 
 
 
 
 
 
 
e3e9ec8
 
33eca3b
acbcad5
33eca3b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
95328ee
33eca3b
 
95328ee
 
33eca3b
 
 
 
 
 
 
 
 
 
 
 
 
95328ee
 
33eca3b
 
 
 
 
 
 
95328ee
 
 
 
33eca3b
 
 
 
95328ee
33eca3b
 
 
 
 
 
 
 
95328ee
 
33eca3b
95328ee
 
 
 
33eca3b
 
e3e9ec8
 
 
 
33eca3b
 
 
acbcad5
 
 
 
 
 
33eca3b
acbcad5
 
 
 
 
 
 
33eca3b
 
95328ee
 
 
 
 
 
 
 
 
 
 
 
 
 
33eca3b
acbcad5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
33eca3b
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
---
language:
- en
tags:
- stable-diffusion-xl
- text-to-image
license: unknown
inference: true

---



This unofficial repository hosts a diffusers-compatible float16 checkpoint of the [WDXL](https://huggingface.co/hakurei/waifu-diffusion-xl) base UNet.  

For convenience (i.e. for use in a StableDiffusionXLPipeline) we include mirrors of other models (please adhere to their terms of usage):

- [SDXL 0.9](stabilityai/stable-diffusion-xl-base-0.9)
  - tokenizers
  - text encoders
  - scheduler config
- [madebyollin's fp16 VAE](https://huggingface.co/madebyollin/sdxl-vae-fp16-fix)

## Usage (diffusers)

### StableDiffusionXLPipeline

Diffusers' StableDiffusionXLPipeline convention handles text encoders + UNet + VAE for you:

```python
from diffusers import StableDiffusionXLPipeline, DPMSolverMultistepScheduler
from diffusers.pipelines.stable_diffusion_xl import StableDiffusionXLPipelineOutput
import torch
from torch import Generator
from PIL import Image
from typing import List

# scheduler args documented here:
# https://github.com/huggingface/diffusers/blob/main/src/diffusers/schedulers/scheduling_dpmsolver_multistep.py#L98
scheduler: DPMSolverMultistepScheduler = DPMSolverMultistepScheduler.from_pretrained(
  'Birchlabs/waifu-diffusion-xl-unofficial',
  subfolder='scheduler',
  algorithm_type='sde-dpmsolver++',
  solver_order=2,
  # solver_type='heun' may give a sharper image. Cheng Lu reckons midpoint is better.
  solver_type='midpoint',
  use_karras_sigmas=True,
)

# pipeline args documented here:
# https://github.com/huggingface/diffusers/blob/95b7de88fd0dffef2533f1cbaf9ffd9d3c6d04c8/src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl.py#L548
pipe: StableDiffusionXLPipeline = StableDiffusionXLPipeline.from_pretrained(
  'Birchlabs/waifu-diffusion-xl-unofficial',
  scheduler=scheduler,
  torch_dtype=torch.float16,
  use_safetensors=True,
  variant='fp16'
)
pipe.to('cuda')

# StableDiffusionXLPipeline is hardcoded to cast the VAE to float32, but Ollin's VAE works fine in float16
pipe.vae.to(torch.float16)

prompt = 'masterpiece, best quality, 1girl, green hair, sweater, looking at viewer, upper body, beanie, outdoors, watercolor, night, turtleneck'
negative_prompt = 'lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry, artist name'

out: StableDiffusionXLPipelineOutput = pipe(
  prompt=prompt,
  negative_prompt=negative_prompt,
  num_inference_steps=25,
  guidance_scale=12.,
  original_size=(4096, 4096),
  target_size=(1024, 1024),
  height=1024,
  width=1024,
  generator=Generator().manual_seed(48),
)

images: List[Image.Image] = out.images
img, *_ = images

img.save('waifu.png')
```

You should get a picture like this:

<img width="384px" height="384px" src="https://birchlabs.co.uk/share/wdxl-unofficial/0_48_waifu.png" title="seed 48: girl with green hair and sweater at night">

### UNet2DConditionModel

If you just want the UNet, you can load it like so:

```python
import torch
from diffusers import UNet2DConditionModel

base_unet: UNet2DConditionModel = UNet2DConditionModel.from_pretrained(
  'Birchlabs/waifu-diffusion-xl-unofficial',
  torch_dtype=torch.float16,
  use_safetensors=True,
  variant='fp16',
  subfolder='unet',
).eval().to(torch.device('cuda'))
```

## How it was converted

I used Kohya's converter script, to convert the official (`hakurei/waifu-diffusion-xl`) [`wdxl-aesthetic-0.9.safetensors`](https://huggingface.co/hakurei/waifu-diffusion-xl/blob/main/wdxl-aesthetic-0.9.safetensors). See [this commit](https://github.com/Birch-san/diffusers-play/commit/3f16355dd0064932d0bf356ed78676089b9e46ca).

I forked [kohya's converter script](https://github.com/bmaltais/kohya_ss/blob/master/tools/convert_diffusers20_original_sd.py), making one [for SDXL](https://github.com/Birch-san/diffusers-play/blob/3f16355dd0064932d0bf356ed78676089b9e46ca/scripts/convert_diffusers20_original_sdxl.py).

I invoked it like so:

```bash
python scripts/convert_diffusers20_original_sdxl.py \
--fp16 \
--use_safetensors \
--reference_model stabilityai/stable-diffusion-xl-base-0.9 \
in/wdxl-aesthetic-0.9.safetensors \
out/wdxl-diffusers
```

### NOTE: The work here is a Work in Progress! Nothing in this repository is final.

# waifu-diffusion-xl - Diffusion for Rich Weebs

waifu-diffusion-xl is a latent text-to-image diffusion model that has been conditioned on high-quality anime images through fine-tuning StabilityAI's SDXL 0.9 model provided as a research preview.

![image](https://user-images.githubusercontent.com/26317155/254350263-59eca9df-503d-4ee7-b12e-b060d8eebd60.png)

<sub>masterpiece, best quality, 1girl, green hair, sweater, looking at viewer, upper body, beanie, outdoors, watercolor, night, turtleneck</sub>

## Model Description(s)

- [wdxl-aesthetic-0.9](https://huggingface.co/hakurei/waifu-diffusion-xl/blob/main/wdxl-aesthetic-0.9.safetensors) is a checkpoint that has been finetuned against our in-house aesthetic dataset which was created with the help of 15k aesthetic labels collected by volunteers. This model also used Stability.AI's [SDXL 0.9 checkpoint](https://huggingface.co/stabilityai/stable-diffusion-xl-base-0.9) as the base model for finetuning.

## License

This model has been released under the [SDXL 0.9 RESEARCH LICENSE AGREEMENT](https://huggingface.co/hakurei/waifu-diffusion-xl/blob/main/LICENSE.md) due to the repository containing the SDXL 0.9 weights before an official release. We have been given permission to release this model.

## Downstream Uses

This model can be used for entertainment purposes and as a generative art assistant.

## Team Members and Acknowledgements

This project would not have been possible without the incredible work by Stability AI and Novel AI.

- [Haru](https://github.com/harubaru)
- [Salt](https://github.com/sALTaccount/)
- [closertodeath](https://huggingface.co/closertodeath)
- [Kudo](https://negotiator.itch.io/)

In order to reach us, you can join our [Discord server](https://discord.gg/touhouai).

[![Discord Server](https://discordapp.com/api/guilds/930499730843250783/widget.png?style=banner2)](https://discord.gg/touhouai)