Reimu Hakurei commited on
Commit
9e45f8d
1 Parent(s): a982013

Add model card

Browse files
Files changed (1) hide show
  1. README.md +59 -0
README.md CHANGED
@@ -1,3 +1,62 @@
1
  ---
 
 
 
 
 
2
  license: bigscience-bloom-rail-1.0
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - en
4
+ tags:
5
+ - stable-diffusion
6
+ - text-to-image
7
  license: bigscience-bloom-rail-1.0
8
+ inference: false
9
+
10
  ---
11
+
12
+ # waifu-diffusion - Diffusion for Weebs
13
+
14
+ waifu-diffusion is a latent text-to-image diffusion model that has been conditioned on high-quality anime images through [Textual Inversion](https://github.com/rinongal/textual_inversion).
15
+
16
+ ## Model Description
17
+
18
+ The model originally used for fine-tuning is [Stable Diffusion V1-4](https://huggingface.co/CompVis/stable-diffusion-v1-4), which is a latent image diffusion model trained on [LAION2B-en](https://huggingface.co/datasets/laion/laion2B-en).
19
+
20
+ The current model is based from [Yasu Seno](https://twitter.com/naclbbr)'s [TrinArt Stable Diffusion](https://huggingface.co/naclbit/trinart_stable_diffusion) which has been fine-tuned on 30,000 high-resolution manga/anime-style images for 3.5 epochs.
21
+
22
+ With [Textual Inversion](https://github.com/rinongal/textual_inversion), the embeddings for the text encoder has been trained to align more with anime-styled images, reducing excessive prompting.
23
+
24
+ ## Training Data & Annotative Prompting
25
+
26
+ The data used for Textual Inversion has come from a random sample of 25k Danbooru images, which were then filtered based on [CLIP Aesthetic Scoring](https://github.com/christophschuhmann/improved-aesthetic-predictor) where only images with an aesthetic score greater than `6.0` were used.
27
+
28
+ Then, the embeddings were further tuned on a smaller subset of 2k higher quality aesthetic images which had an aesthetic score greater than `6.0` and featured diverse subjects, backgrounds, and compositions.
29
+
30
+ ## Downstream Uses
31
+
32
+ This model can be used for entertainment purposes and as a generative art assistant.
33
+
34
+ ## Example Code
35
+
36
+ ```
37
+ import torch
38
+ from torch import autocast
39
+ from diffusers import StableDiffusionPipeline
40
+
41
+ model_id = "hakurei/waifu-diffusion"
42
+ device = "cuda"
43
+
44
+
45
+ pipe = StableDiffusionPipeline.from_pretrained(model_id, use_auth_token=True)
46
+ pipe = pipe.to(device)
47
+
48
+ prompt = "a photo of reimu hakurei. anime style"
49
+ with autocast("cuda"):
50
+ image = pipe(prompt, guidance_scale=7.5)["sample"][0]
51
+
52
+ image.save("reimu_hakurei.png")
53
+ ```
54
+
55
+ ## Team Members and Acknowledgements
56
+
57
+ This project would not have been possible without the incredible work by the [CompVis Researchers](https://ommer-lab.com/) and the author of the original finetuned model that this work was based upon, [Yasu Seno](https://twitter.com/naclbbr)!
58
+
59
+ Additionally, the methods presented in the [Textual Inversion](https://github.com/rinongal/textual_inversion) repo was an incredible help.
60
+
61
+ - [Anthony Mercurio](https://github.com/harubaru)
62
+ - [Salt](https://github.com/sALTaccount/)