File size: 6,252 Bytes

acbcad5
 
 
 
 
 
 
 
 
 
 
e3e9ec8
 
33eca3b
acbcad5
33eca3b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
95328ee
33eca3b
 
95328ee
 
33eca3b
 
 
 
 
 
 
 
 
 
 
 
 
95328ee
 
33eca3b
 
 
 
 
 
 
95328ee
 
 
 
33eca3b
 
 
 
95328ee
33eca3b
 
 
 
 
 
 
 
95328ee
 
33eca3b
95328ee
 
 
 
33eca3b
 
e3e9ec8
 
 
 
33eca3b
 
 
acbcad5
 
 
 
 
 
33eca3b
acbcad5
 
 
 
 
 
 
33eca3b
 
95328ee
 
 
 
 
 
 
 
 
 
 
 
 
 
33eca3b
acbcad5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
33eca3b

---
language:
- en
tags:
- stable-diffusion-xl
- text-to-image
license: unknown
inference: true

---



This unofficial repository hosts a diffusers-compatible float16 checkpoint of the [WDXL](https://huggingface.co/hakurei/waifu-diffusion-xl) base UNet.  

For convenience (i.e. for use in a StableDiffusionXLPipeline) we include mirrors of other models (please adhere to their terms of usage):

- [SDXL 0.9](stabilityai/stable-diffusion-xl-base-0.9)
  - tokenizers
  - text encoders
  - scheduler config
- [madebyollin's fp16 VAE](https://huggingface.co/madebyollin/sdxl-vae-fp16-fix)

## Usage (diffusers)

### StableDiffusionXLPipeline

Diffusers' StableDiffusionXLPipeline convention handles text encoders + UNet + VAE for you:

```python
from diffusers import StableDiffusionXLPipeline, DPMSolverMultistepScheduler
from diffusers.pipelines.stable_diffusion_xl import StableDiffusionXLPipelineOutput
import torch
from torch import Generator
from PIL import Image
from typing import List

# scheduler args documented here:
# https://github.com/huggingface/diffusers/blob/main/src/diffusers/schedulers/scheduling_dpmsolver_multistep.py#L98
scheduler: DPMSolverMultistepScheduler = DPMSolverMultistepScheduler.from_pretrained(
  'Birchlabs/waifu-diffusion-xl-unofficial',
  subfolder='scheduler',
  algorithm_type='sde-dpmsolver++',
  solver_order=2,
  # solver_type='heun' may give a sharper image. Cheng Lu reckons midpoint is better.
  solver_type='midpoint',
  use_karras_sigmas=True,
)

# pipeline args documented here:
# https://github.com/huggingface/diffusers/blob/95b7de88fd0dffef2533f1cbaf9ffd9d3c6d04c8/src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl.py#L548
pipe: StableDiffusionXLPipeline = StableDiffusionXLPipeline.from_pretrained(
  'Birchlabs/waifu-diffusion-xl-unofficial',
  scheduler=scheduler,
  torch_dtype=torch.float16,
  use_safetensors=True,
  variant='fp16'
)
pipe.to('cuda')

# StableDiffusionXLPipeline is hardcoded to cast the VAE to float32, but Ollin's VAE works fine in float16
pipe.vae.to(torch.float16)

prompt = 'masterpiece, best quality, 1girl, green hair, sweater, looking at viewer, upper body, beanie, outdoors, watercolor, night, turtleneck'
negative_prompt = 'lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry, artist name'

out: StableDiffusionXLPipelineOutput = pipe(
  prompt=prompt,
  negative_prompt=negative_prompt,
  num_inference_steps=25,
  guidance_scale=12.,
  original_size=(4096, 4096),
  target_size=(1024, 1024),
  height=1024,
  width=1024,
  generator=Generator().manual_seed(48),
)

images: List[Image.Image] = out.images
img, *_ = images

img.save('waifu.png')
```

You should get a picture like this:

<img width="384px" height="384px" src="https://birchlabs.co.uk/share/wdxl-unofficial/0_48_waifu.png" title="seed 48: girl with green hair and sweater at night">

### UNet2DConditionModel

If you just want the UNet, you can load it like so:

```python
import torch
from diffusers import UNet2DConditionModel

base_unet: UNet2DConditionModel = UNet2DConditionModel.from_pretrained(
  'Birchlabs/waifu-diffusion-xl-unofficial',
  torch_dtype=torch.float16,
  use_safetensors=True,
  variant='fp16',
  subfolder='unet',
).eval().to(torch.device('cuda'))
```

## How it was converted

I used Kohya's converter script, to convert the official (`hakurei/waifu-diffusion-xl`) [`wdxl-aesthetic-0.9.safetensors`](https://huggingface.co/hakurei/waifu-diffusion-xl/blob/main/wdxl-aesthetic-0.9.safetensors). See [this commit](https://github.com/Birch-san/diffusers-play/commit/3f16355dd0064932d0bf356ed78676089b9e46ca).

I forked [kohya's converter script](https://github.com/bmaltais/kohya_ss/blob/master/tools/convert_diffusers20_original_sd.py), making one [for SDXL](https://github.com/Birch-san/diffusers-play/blob/3f16355dd0064932d0bf356ed78676089b9e46ca/scripts/convert_diffusers20_original_sdxl.py).

I invoked it like so:

```bash
python scripts/convert_diffusers20_original_sdxl.py \
--fp16 \
--use_safetensors \
--reference_model stabilityai/stable-diffusion-xl-base-0.9 \
in/wdxl-aesthetic-0.9.safetensors \
out/wdxl-diffusers
```

### NOTE: The work here is a Work in Progress! Nothing in this repository is final.

# waifu-diffusion-xl - Diffusion for Rich Weebs

waifu-diffusion-xl is a latent text-to-image diffusion model that has been conditioned on high-quality anime images through fine-tuning StabilityAI's SDXL 0.9 model provided as a research preview.

![image](https://user-images.githubusercontent.com/26317155/254350263-59eca9df-503d-4ee7-b12e-b060d8eebd60.png)

<sub>masterpiece, best quality, 1girl, green hair, sweater, looking at viewer, upper body, beanie, outdoors, watercolor, night, turtleneck</sub>

## Model Description(s)

- [wdxl-aesthetic-0.9](https://huggingface.co/hakurei/waifu-diffusion-xl/blob/main/wdxl-aesthetic-0.9.safetensors) is a checkpoint that has been finetuned against our in-house aesthetic dataset which was created with the help of 15k aesthetic labels collected by volunteers. This model also used Stability.AI's [SDXL 0.9 checkpoint](https://huggingface.co/stabilityai/stable-diffusion-xl-base-0.9) as the base model for finetuning.

## License

This model has been released under the [SDXL 0.9 RESEARCH LICENSE AGREEMENT](https://huggingface.co/hakurei/waifu-diffusion-xl/blob/main/LICENSE.md) due to the repository containing the SDXL 0.9 weights before an official release. We have been given permission to release this model.

## Downstream Uses

This model can be used for entertainment purposes and as a generative art assistant.

## Team Members and Acknowledgements

This project would not have been possible without the incredible work by Stability AI and Novel AI.

- [Haru](https://github.com/harubaru)
- [Salt](https://github.com/sALTaccount/)
- [closertodeath](https://huggingface.co/closertodeath)
- [Kudo](https://negotiator.itch.io/)

In order to reach us, you can join our [Discord server](https://discord.gg/touhouai).

[![Discord Server](https://discordapp.com/api/guilds/930499730843250783/widget.png?style=banner2)](https://discord.gg/touhouai)