--- language: - en tags: - stable-diffusion-xl - text-to-image license: unknown inference: true --- This unofficial repository hosts a diffusers-compatible float16 checkpoint of the [WDXL](https://huggingface.co/hakurei/waifu-diffusion-xl) base UNet. For convenience (i.e. for use in a StableDiffusionXLPipeline) we include mirrors of other models (please adhere to their terms of usage): - [SDXL 0.9](stabilityai/stable-diffusion-xl-base-0.9) - tokenizers - text encoders - scheduler config - [madebyollin's fp16 VAE](https://huggingface.co/madebyollin/sdxl-vae-fp16-fix) ## Usage (diffusers) ### StableDiffusionXLPipeline Diffusers' StableDiffusionXLPipeline convention handles text encoders + UNet + VAE for you: ```python from diffusers import StableDiffusionXLPipeline, DPMSolverMultistepScheduler from diffusers.pipelines.stable_diffusion_xl import StableDiffusionXLPipelineOutput import torch from torch import Generator from PIL import Image from typing import List # scheduler args documented here: # https://github.com/huggingface/diffusers/blob/main/src/diffusers/schedulers/scheduling_dpmsolver_multistep.py#L98 scheduler: DPMSolverMultistepScheduler = DPMSolverMultistepScheduler.from_pretrained( 'Birchlabs/waifu-diffusion-xl-unofficial', subfolder='scheduler', algorithm_type='sde-dpmsolver++', solver_order=2, # solver_type='heun' may give a sharper image. Cheng Lu reckons midpoint is better. solver_type='midpoint', use_karras_sigmas=True, ) # pipeline args documented here: # https://github.com/huggingface/diffusers/blob/95b7de88fd0dffef2533f1cbaf9ffd9d3c6d04c8/src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl.py#L548 pipe: StableDiffusionXLPipeline = StableDiffusionXLPipeline.from_pretrained( 'Birchlabs/waifu-diffusion-xl-unofficial', scheduler=scheduler, torch_dtype=torch.float16, use_safetensors=True, variant='fp16' ) pipe.to('cuda') # StableDiffusionXLPipeline is hardcoded to cast the VAE to float32, but Ollin's VAE works fine in float16 pipe.vae.to(torch.float16) prompt = 'masterpiece, best quality, 1girl, green hair, sweater, looking at viewer, upper body, beanie, outdoors, watercolor, night, turtleneck' negative_prompt = 'lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry, artist name' out: StableDiffusionXLPipelineOutput = pipe( prompt=prompt, negative_prompt=negative_prompt, num_inference_steps=25, guidance_scale=12., original_size=(4096, 4096), target_size=(1024, 1024), height=1024, width=1024, generator=Generator().manual_seed(48), ) images: List[Image.Image] = out.images img, *_ = images img.save('waifu.png') ``` You should get a picture like this: ### UNet2DConditionModel If you just want the UNet, you can load it like so: ```python import torch from diffusers import UNet2DConditionModel base_unet: UNet2DConditionModel = UNet2DConditionModel.from_pretrained( 'Birchlabs/waifu-diffusion-xl-unofficial', torch_dtype=torch.float16, use_safetensors=True, variant='fp16', subfolder='unet', ).eval().to(torch.device('cuda')) ``` ## How it was converted I used Kohya's converter script, to convert the official (`hakurei/waifu-diffusion-xl`) [`wdxl-aesthetic-0.9.safetensors`](https://huggingface.co/hakurei/waifu-diffusion-xl/blob/main/wdxl-aesthetic-0.9.safetensors). See [this commit](https://github.com/Birch-san/diffusers-play/commit/3f16355dd0064932d0bf356ed78676089b9e46ca). I forked [kohya's converter script](https://github.com/bmaltais/kohya_ss/blob/master/tools/convert_diffusers20_original_sd.py), making one [for SDXL](https://github.com/Birch-san/diffusers-play/blob/3f16355dd0064932d0bf356ed78676089b9e46ca/scripts/convert_diffusers20_original_sdxl.py). I invoked it like so: ```bash python scripts/convert_diffusers20_original_sdxl.py \ --fp16 \ --use_safetensors \ --reference_model stabilityai/stable-diffusion-xl-base-0.9 \ in/wdxl-aesthetic-0.9.safetensors \ out/wdxl-diffusers ``` ### NOTE: The work here is a Work in Progress! Nothing in this repository is final. # waifu-diffusion-xl - Diffusion for Rich Weebs waifu-diffusion-xl is a latent text-to-image diffusion model that has been conditioned on high-quality anime images through fine-tuning StabilityAI's SDXL 0.9 model provided as a research preview. ![image](https://user-images.githubusercontent.com/26317155/254350263-59eca9df-503d-4ee7-b12e-b060d8eebd60.png) masterpiece, best quality, 1girl, green hair, sweater, looking at viewer, upper body, beanie, outdoors, watercolor, night, turtleneck ## Model Description(s) - [wdxl-aesthetic-0.9](https://huggingface.co/hakurei/waifu-diffusion-xl/blob/main/wdxl-aesthetic-0.9.safetensors) is a checkpoint that has been finetuned against our in-house aesthetic dataset which was created with the help of 15k aesthetic labels collected by volunteers. This model also used Stability.AI's [SDXL 0.9 checkpoint](https://huggingface.co/stabilityai/stable-diffusion-xl-base-0.9) as the base model for finetuning. ## License This model has been released under the [SDXL 0.9 RESEARCH LICENSE AGREEMENT](https://huggingface.co/hakurei/waifu-diffusion-xl/blob/main/LICENSE.md) due to the repository containing the SDXL 0.9 weights before an official release. We have been given permission to release this model. ## Downstream Uses This model can be used for entertainment purposes and as a generative art assistant. ## Team Members and Acknowledgements This project would not have been possible without the incredible work by Stability AI and Novel AI. - [Haru](https://github.com/harubaru) - [Salt](https://github.com/sALTaccount/) - [closertodeath](https://huggingface.co/closertodeath) - [Kudo](https://negotiator.itch.io/) In order to reach us, you can join our [Discord server](https://discord.gg/touhouai). [![Discord Server](https://discordapp.com/api/guilds/930499730843250783/widget.png?style=banner2)](https://discord.gg/touhouai)