|
--- |
|
language: |
|
- en |
|
tags: |
|
- stable-diffusion-xl |
|
- text-to-image |
|
license: unknown |
|
inference: true |
|
|
|
--- |
|
|
|
|
|
|
|
This unofficial repository hosts a diffusers-compatible float16 checkpoint of the [WDXL](https://huggingface.co/hakurei/waifu-diffusion-xl) base UNet. |
|
|
|
For convenience (i.e. for use in a StableDiffusionXLPipeline) we include mirrors of other models (please adhere to their terms of usage): |
|
|
|
- [SDXL 0.9](stabilityai/stable-diffusion-xl-base-0.9) |
|
- tokenizers |
|
- text encoders |
|
- scheduler config |
|
- [madebyollin's fp16 VAE](https://huggingface.co/madebyollin/sdxl-vae-fp16-fix) |
|
|
|
## Usage (diffusers) |
|
|
|
### StableDiffusionXLPipeline |
|
|
|
Diffusers' StableDiffusionXLPipeline convention handles text encoders + UNet + VAE for you: |
|
|
|
```python |
|
from diffusers import StableDiffusionXLPipeline, DPMSolverMultistepScheduler |
|
from diffusers.pipelines.stable_diffusion_xl import StableDiffusionXLPipelineOutput |
|
import torch |
|
from torch import Generator |
|
from PIL import Image |
|
from typing import List |
|
|
|
# scheduler args documented here: |
|
# https://github.com/huggingface/diffusers/blob/main/src/diffusers/schedulers/scheduling_dpmsolver_multistep.py#L98 |
|
scheduler: DPMSolverMultistepScheduler = DPMSolverMultistepScheduler.from_pretrained( |
|
'Birchlabs/waifu-diffusion-xl-unofficial', |
|
subfolder='scheduler', |
|
algorithm_type='sde-dpmsolver++', |
|
solver_order=2, |
|
# solver_type='heun' may give a sharper image. Cheng Lu reckons midpoint is better. |
|
solver_type='midpoint', |
|
use_karras_sigmas=True, |
|
) |
|
|
|
# pipeline args documented here: |
|
# https://github.com/huggingface/diffusers/blob/95b7de88fd0dffef2533f1cbaf9ffd9d3c6d04c8/src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl.py#L548 |
|
pipe: StableDiffusionXLPipeline = StableDiffusionXLPipeline.from_pretrained( |
|
'Birchlabs/waifu-diffusion-xl-unofficial', |
|
scheduler=scheduler, |
|
torch_dtype=torch.float16, |
|
use_safetensors=True, |
|
variant='fp16' |
|
) |
|
pipe.to('cuda') |
|
|
|
# StableDiffusionXLPipeline is hardcoded to cast the VAE to float32, but Ollin's VAE works fine in float16 |
|
pipe.vae.to(torch.float16) |
|
|
|
prompt = 'masterpiece, best quality, 1girl, green hair, sweater, looking at viewer, upper body, beanie, outdoors, watercolor, night, turtleneck' |
|
negative_prompt = 'lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry, artist name' |
|
|
|
out: StableDiffusionXLPipelineOutput = pipe( |
|
prompt=prompt, |
|
negative_prompt=negative_prompt, |
|
num_inference_steps=25, |
|
guidance_scale=12., |
|
original_size=(4096, 4096), |
|
target_size=(1024, 1024), |
|
height=1024, |
|
width=1024, |
|
generator=Generator().manual_seed(48), |
|
) |
|
|
|
images: List[Image.Image] = out.images |
|
img, *_ = images |
|
|
|
img.save('waifu.png') |
|
``` |
|
|
|
You should get a picture like this: |
|
|
|
<img width="384px" height="384px" src="https://birchlabs.co.uk/share/wdxl-unofficial/0_48_waifu.png" title="seed 48: girl with green hair and sweater at night"> |
|
|
|
### UNet2DConditionModel |
|
|
|
If you just want the UNet, you can load it like so: |
|
|
|
```python |
|
import torch |
|
from diffusers import UNet2DConditionModel |
|
|
|
base_unet: UNet2DConditionModel = UNet2DConditionModel.from_pretrained( |
|
'Birchlabs/waifu-diffusion-xl-unofficial', |
|
torch_dtype=torch.float16, |
|
use_safetensors=True, |
|
variant='fp16', |
|
subfolder='unet', |
|
).eval().to(torch.device('cuda')) |
|
``` |
|
|
|
## How it was converted |
|
|
|
I used Kohya's converter script, to convert the official (`hakurei/waifu-diffusion-xl`) [`wdxl-aesthetic-0.9.safetensors`](https://huggingface.co/hakurei/waifu-diffusion-xl/blob/main/wdxl-aesthetic-0.9.safetensors). See [this commit](https://github.com/Birch-san/diffusers-play/commit/3f16355dd0064932d0bf356ed78676089b9e46ca). |
|
|
|
I forked [kohya's converter script](https://github.com/bmaltais/kohya_ss/blob/master/tools/convert_diffusers20_original_sd.py), making one [for SDXL](https://github.com/Birch-san/diffusers-play/blob/3f16355dd0064932d0bf356ed78676089b9e46ca/scripts/convert_diffusers20_original_sdxl.py). |
|
|
|
I invoked it like so: |
|
|
|
```bash |
|
python scripts/convert_diffusers20_original_sdxl.py \ |
|
--fp16 \ |
|
--use_safetensors \ |
|
--reference_model stabilityai/stable-diffusion-xl-base-0.9 \ |
|
in/wdxl-aesthetic-0.9.safetensors \ |
|
out/wdxl-diffusers |
|
``` |
|
|
|
### NOTE: The work here is a Work in Progress! Nothing in this repository is final. |
|
|
|
# waifu-diffusion-xl - Diffusion for Rich Weebs |
|
|
|
waifu-diffusion-xl is a latent text-to-image diffusion model that has been conditioned on high-quality anime images through fine-tuning StabilityAI's SDXL 0.9 model provided as a research preview. |
|
|
|
![image](https://user-images.githubusercontent.com/26317155/254350263-59eca9df-503d-4ee7-b12e-b060d8eebd60.png) |
|
|
|
<sub>masterpiece, best quality, 1girl, green hair, sweater, looking at viewer, upper body, beanie, outdoors, watercolor, night, turtleneck</sub> |
|
|
|
## Model Description(s) |
|
|
|
- [wdxl-aesthetic-0.9](https://huggingface.co/hakurei/waifu-diffusion-xl/blob/main/wdxl-aesthetic-0.9.safetensors) is a checkpoint that has been finetuned against our in-house aesthetic dataset which was created with the help of 15k aesthetic labels collected by volunteers. This model also used Stability.AI's [SDXL 0.9 checkpoint](https://huggingface.co/stabilityai/stable-diffusion-xl-base-0.9) as the base model for finetuning. |
|
|
|
## License |
|
|
|
This model has been released under the [SDXL 0.9 RESEARCH LICENSE AGREEMENT](https://huggingface.co/hakurei/waifu-diffusion-xl/blob/main/LICENSE.md) due to the repository containing the SDXL 0.9 weights before an official release. We have been given permission to release this model. |
|
|
|
## Downstream Uses |
|
|
|
This model can be used for entertainment purposes and as a generative art assistant. |
|
|
|
## Team Members and Acknowledgements |
|
|
|
This project would not have been possible without the incredible work by Stability AI and Novel AI. |
|
|
|
- [Haru](https://github.com/harubaru) |
|
- [Salt](https://github.com/sALTaccount/) |
|
- [closertodeath](https://huggingface.co/closertodeath) |
|
- [Kudo](https://negotiator.itch.io/) |
|
|
|
In order to reach us, you can join our [Discord server](https://discord.gg/touhouai). |
|
|
|
[![Discord Server](https://discordapp.com/api/guilds/930499730843250783/widget.png?style=banner2)](https://discord.gg/touhouai) |
|
|