Birchlabs's picture
Update README.md
e3e9ec8
---
language:
- en
tags:
- stable-diffusion-xl
- text-to-image
license: unknown
inference: true
---
This unofficial repository hosts a diffusers-compatible float16 checkpoint of the [WDXL](https://huggingface.co/hakurei/waifu-diffusion-xl) base UNet.
For convenience (i.e. for use in a StableDiffusionXLPipeline) we include mirrors of other models (please adhere to their terms of usage):
- [SDXL 0.9](stabilityai/stable-diffusion-xl-base-0.9)
- tokenizers
- text encoders
- scheduler config
- [madebyollin's fp16 VAE](https://huggingface.co/madebyollin/sdxl-vae-fp16-fix)
## Usage (diffusers)
### StableDiffusionXLPipeline
Diffusers' StableDiffusionXLPipeline convention handles text encoders + UNet + VAE for you:
```python
from diffusers import StableDiffusionXLPipeline, DPMSolverMultistepScheduler
from diffusers.pipelines.stable_diffusion_xl import StableDiffusionXLPipelineOutput
import torch
from torch import Generator
from PIL import Image
from typing import List
# scheduler args documented here:
# https://github.com/huggingface/diffusers/blob/main/src/diffusers/schedulers/scheduling_dpmsolver_multistep.py#L98
scheduler: DPMSolverMultistepScheduler = DPMSolverMultistepScheduler.from_pretrained(
'Birchlabs/waifu-diffusion-xl-unofficial',
subfolder='scheduler',
algorithm_type='sde-dpmsolver++',
solver_order=2,
# solver_type='heun' may give a sharper image. Cheng Lu reckons midpoint is better.
solver_type='midpoint',
use_karras_sigmas=True,
)
# pipeline args documented here:
# https://github.com/huggingface/diffusers/blob/95b7de88fd0dffef2533f1cbaf9ffd9d3c6d04c8/src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl.py#L548
pipe: StableDiffusionXLPipeline = StableDiffusionXLPipeline.from_pretrained(
'Birchlabs/waifu-diffusion-xl-unofficial',
scheduler=scheduler,
torch_dtype=torch.float16,
use_safetensors=True,
variant='fp16'
)
pipe.to('cuda')
# StableDiffusionXLPipeline is hardcoded to cast the VAE to float32, but Ollin's VAE works fine in float16
pipe.vae.to(torch.float16)
prompt = 'masterpiece, best quality, 1girl, green hair, sweater, looking at viewer, upper body, beanie, outdoors, watercolor, night, turtleneck'
negative_prompt = 'lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry, artist name'
out: StableDiffusionXLPipelineOutput = pipe(
prompt=prompt,
negative_prompt=negative_prompt,
num_inference_steps=25,
guidance_scale=12.,
original_size=(4096, 4096),
target_size=(1024, 1024),
height=1024,
width=1024,
generator=Generator().manual_seed(48),
)
images: List[Image.Image] = out.images
img, *_ = images
img.save('waifu.png')
```
You should get a picture like this:
<img width="384px" height="384px" src="https://birchlabs.co.uk/share/wdxl-unofficial/0_48_waifu.png" title="seed 48: girl with green hair and sweater at night">
### UNet2DConditionModel
If you just want the UNet, you can load it like so:
```python
import torch
from diffusers import UNet2DConditionModel
base_unet: UNet2DConditionModel = UNet2DConditionModel.from_pretrained(
'Birchlabs/waifu-diffusion-xl-unofficial',
torch_dtype=torch.float16,
use_safetensors=True,
variant='fp16',
subfolder='unet',
).eval().to(torch.device('cuda'))
```
## How it was converted
I used Kohya's converter script, to convert the official (`hakurei/waifu-diffusion-xl`) [`wdxl-aesthetic-0.9.safetensors`](https://huggingface.co/hakurei/waifu-diffusion-xl/blob/main/wdxl-aesthetic-0.9.safetensors). See [this commit](https://github.com/Birch-san/diffusers-play/commit/3f16355dd0064932d0bf356ed78676089b9e46ca).
I forked [kohya's converter script](https://github.com/bmaltais/kohya_ss/blob/master/tools/convert_diffusers20_original_sd.py), making one [for SDXL](https://github.com/Birch-san/diffusers-play/blob/3f16355dd0064932d0bf356ed78676089b9e46ca/scripts/convert_diffusers20_original_sdxl.py).
I invoked it like so:
```bash
python scripts/convert_diffusers20_original_sdxl.py \
--fp16 \
--use_safetensors \
--reference_model stabilityai/stable-diffusion-xl-base-0.9 \
in/wdxl-aesthetic-0.9.safetensors \
out/wdxl-diffusers
```
### NOTE: The work here is a Work in Progress! Nothing in this repository is final.
# waifu-diffusion-xl - Diffusion for Rich Weebs
waifu-diffusion-xl is a latent text-to-image diffusion model that has been conditioned on high-quality anime images through fine-tuning StabilityAI's SDXL 0.9 model provided as a research preview.
![image](https://user-images.githubusercontent.com/26317155/254350263-59eca9df-503d-4ee7-b12e-b060d8eebd60.png)
<sub>masterpiece, best quality, 1girl, green hair, sweater, looking at viewer, upper body, beanie, outdoors, watercolor, night, turtleneck</sub>
## Model Description(s)
- [wdxl-aesthetic-0.9](https://huggingface.co/hakurei/waifu-diffusion-xl/blob/main/wdxl-aesthetic-0.9.safetensors) is a checkpoint that has been finetuned against our in-house aesthetic dataset which was created with the help of 15k aesthetic labels collected by volunteers. This model also used Stability.AI's [SDXL 0.9 checkpoint](https://huggingface.co/stabilityai/stable-diffusion-xl-base-0.9) as the base model for finetuning.
## License
This model has been released under the [SDXL 0.9 RESEARCH LICENSE AGREEMENT](https://huggingface.co/hakurei/waifu-diffusion-xl/blob/main/LICENSE.md) due to the repository containing the SDXL 0.9 weights before an official release. We have been given permission to release this model.
## Downstream Uses
This model can be used for entertainment purposes and as a generative art assistant.
## Team Members and Acknowledgements
This project would not have been possible without the incredible work by Stability AI and Novel AI.
- [Haru](https://github.com/harubaru)
- [Salt](https://github.com/sALTaccount/)
- [closertodeath](https://huggingface.co/closertodeath)
- [Kudo](https://negotiator.itch.io/)
In order to reach us, you can join our [Discord server](https://discord.gg/touhouai).
[![Discord Server](https://discordapp.com/api/guilds/930499730843250783/widget.png?style=banner2)](https://discord.gg/touhouai)