---
language:
- en
tags:
- stable-diffusion-xl
- text-to-image
license: unknown
inference: true

---


This unofficial repository hosts a diffusers-compatible float16 checkpoint of the [WDXL](https://huggingface.co/hakurei/waifu-diffusion-xl) base UNet.  

For convenience (i.e. for use in a StableDiffusionXLPipeline) we include mirrors of other models (please adhere to their terms of usage):

- [SDXL 0.9](stabilityai/stable-diffusion-xl-base-0.9)
  - tokenizers
  - text encoders
  - scheduler config
- [madebyollin's fp16 VAE](https://huggingface.co/madebyollin/sdxl-vae-fp16-fix)

## Usage (diffusers)

### StableDiffusionXLPipeline

Diffusers' StableDiffusionXLPipeline convention handles text encoders + UNet + VAE for you:

```python
from diffusers import StableDiffusionXLPipeline, DPMSolverMultistepScheduler
from diffusers.pipelines.stable_diffusion_xl import StableDiffusionXLPipelineOutput
import torch
from torch import Generator
from PIL import Image
from typing import List

# scheduler args documented here:
# https://github.com/huggingface/diffusers/blob/main/src/diffusers/schedulers/scheduling_dpmsolver_multistep.py#L98
scheduler: DPMSolverMultistepScheduler = DPMSolverMultistepScheduler.from_pretrained(
  'Birchlabs/waifu-diffusion-xl-unofficial',
  subfolder='scheduler',
  algorithm_type='sde-dpmsolver++',
  solver_order=2,
  # solver_type='heun' may give a sharper image. Cheng Lu reckons midpoint is better.
  solver_type='midpoint',
  use_karras_sigmas=True,
)

# pipeline args documented here:
# https://github.com/huggingface/diffusers/blob/95b7de88fd0dffef2533f1cbaf9ffd9d3c6d04c8/src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl.py#L548
pipe: StableDiffusionXLPipeline = StableDiffusionXLPipeline.from_pretrained(
  'Birchlabs/waifu-diffusion-xl-unofficial',
  scheduler=scheduler,
  torch_dtype=torch.float16,
  use_safetensors=True,
  variant='fp16'
)
pipe.to('cuda')

# StableDiffusionXLPipeline is hardcoded to cast the VAE to float32, but Ollin's VAE works fine in float16
pipe.vae.to(torch.float16)

prompt = 'masterpiece, best quality, 1girl, green hair, sweater, looking at viewer, upper body, beanie, outdoors, watercolor, night, turtleneck'
negative_prompt = 'lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry, artist name'

out: StableDiffusionXLPipelineOutput = pipe(
  prompt=prompt,
  negative_prompt=negative_prompt,
  num_inference_steps=25,
  guidance_scale=12.,
  original_size=(4096, 4096),
  target_size=(1024, 1024),
  height=1024,
  width=1024,
  generator=Generator().manual_seed(48),
)

images: List[Image.Image] = out.images
img, *_ = images

img.save('waifu.png')
```

You should get a picture like this:

<img width="384px" height="384px" src="https://birchlabs.co.uk/share/wdxl-unofficial/0_48_waifu.png" title="seed 48: girl with green hair and sweater at night">

### UNet2DConditionModel

If you just want the UNet, you can load it like so:

```python
import torch
from diffusers import UNet2DConditionModel

base_unet: UNet2DConditionModel = UNet2DConditionModel.from_pretrained(
  'Birchlabs/waifu-diffusion-xl-unofficial',
  torch_dtype=torch.float16,
  use_safetensors=True,
  variant='fp16',
  subfolder='unet',
).eval().to(torch.device('cuda'))
```

## How it was converted

I used Kohya's converter script, to convert the official (`hakurei/waifu-diffusion-xl`) [`wdxl-aesthetic-0.9.safetensors`](https://huggingface.co/hakurei/waifu-diffusion-xl/blob/main/wdxl-aesthetic-0.9.safetensors). See [this commit](https://github.com/Birch-san/diffusers-play/commit/3f16355dd0064932d0bf356ed78676089b9e46ca).

I forked [kohya's converter script](https://github.com/bmaltais/kohya_ss/blob/master/tools/convert_diffusers20_original_sd.py), making one [for SDXL](https://github.com/Birch-san/diffusers-play/blob/3f16355dd0064932d0bf356ed78676089b9e46ca/scripts/convert_diffusers20_original_sdxl.py).

I invoked it like so:

```bash
python scripts/convert_diffusers20_original_sdxl.py \
--fp16 \
--use_safetensors \
--reference_model stabilityai/stable-diffusion-xl-base-0.9 \
in/wdxl-aesthetic-0.9.safetensors \
out/wdxl-diffusers
```

### NOTE: The work here is a Work in Progress! Nothing in this repository is final.

# waifu-diffusion-xl - Diffusion for Rich Weebs

waifu-diffusion-xl is a latent text-to-image diffusion model that has been conditioned on high-quality anime images through fine-tuning StabilityAI's SDXL 0.9 model provided as a research preview.

![image](https://user-images.githubusercontent.com/26317155/254350263-59eca9df-503d-4ee7-b12e-b060d8eebd60.png)

<sub>masterpiece, best quality, 1girl, green hair, sweater, looking at viewer, upper body, beanie, outdoors, watercolor, night, turtleneck</sub>

## Model Description(s)

- [wdxl-aesthetic-0.9](https://huggingface.co/hakurei/waifu-diffusion-xl/blob/main/wdxl-aesthetic-0.9.safetensors) is a checkpoint that has been finetuned against our in-house aesthetic dataset which was created with the help of 15k aesthetic labels collected by volunteers. This model also used Stability.AI's [SDXL 0.9 checkpoint](https://huggingface.co/stabilityai/stable-diffusion-xl-base-0.9) as the base model for finetuning.

## License

This model has been released under the [SDXL 0.9 RESEARCH LICENSE AGREEMENT](https://huggingface.co/hakurei/waifu-diffusion-xl/blob/main/LICENSE.md) due to the repository containing the SDXL 0.9 weights before an official release. We have been given permission to release this model.

## Downstream Uses

This model can be used for entertainment purposes and as a generative art assistant.

## Team Members and Acknowledgements

This project would not have been possible without the incredible work by Stability AI and Novel AI.

- [Haru](https://github.com/harubaru)
- [Salt](https://github.com/sALTaccount/)
- [closertodeath](https://huggingface.co/closertodeath)
- [Kudo](https://negotiator.itch.io/)

In order to reach us, you can join our [Discord server](https://discord.gg/touhouai).

[![Discord Server](https://discordapp.com/api/guilds/930499730843250783/widget.png?style=banner2)](https://discord.gg/touhouai)