kirisame commited on
Commit
1574306
2 Parent(s): c291e92 b45bafc

Add WD 1.3 float16 weights

Browse files
README.md CHANGED
@@ -13,6 +13,7 @@ inference: false
13
 
14
  waifu-diffusion is a latent text-to-image diffusion model that has been conditioned on high-quality anime images through fine-tuning.
15
 
 
16
  <img src=https://i.imgur.com/Y5Tmw1S.png width=75% height=75%>
17
 
18
  [Original Weights](https://huggingface.co/hakurei/waifu-diffusion-v1-3)
@@ -22,11 +23,15 @@ waifu-diffusion is a latent text-to-image diffusion model that has been conditio
22
  We also support a [Gradio](https://github.com/gradio-app/gradio) Web UI and Colab with Diffusers to run Waifu Diffusion:
23
  [![Open In Spaces](https://camo.githubusercontent.com/00380c35e60d6b04be65d3d94a58332be5cc93779f630bcdfc18ab9a3a7d3388/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f25463025394625413425393725323048756767696e67253230466163652d5370616365732d626c7565)](https://huggingface.co/spaces/hakurei/waifu-diffusion-demo)
24
  [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1_8wPN7dJO746QXsFnB09Uq2VGgSRFuYE#scrollTo=1HaCauSq546O)
 
 
 
25
 
26
  ## Model Description
27
 
28
  [See here for a full model overview.](https://gist.github.com/harubaru/f727cedacae336d1f7877c4bbe2196e1)
29
 
 
30
  ## License
31
 
32
  This model is open access and available to all, with a CreativeML OpenRAIL-M license further specifying rights and usage.
@@ -36,6 +41,15 @@ The CreativeML OpenRAIL License specifies:
36
  2. The authors claims no rights on the outputs you generate, you are free to use them and are accountable for their use which must not go against the provisions set in the license
37
  3. You may re-distribute the weights and use the model commercially and/or as a service. If you do, please be aware you have to include the same use restrictions as the ones in the license and share a copy of the CreativeML OpenRAIL-M to all your users (please read the license entirely and carefully)
38
  [Please read the full license here](https://huggingface.co/spaces/CompVis/stable-diffusion-license)
 
 
 
 
 
 
 
 
 
39
 
40
  ## Downstream Uses
41
 
@@ -53,7 +67,15 @@ pipe = StableDiffusionPipeline.from_pretrained(
53
  torch_dtype=torch.float32
54
  ).to('cuda')
55
 
 
56
  prompt = "1girl, aqua eyes, baseball cap, blonde hair, closed mouth, earrings, green background, hat, hoop earrings, jewelry, looking at viewer, shirt, short hair, simple background, solo, upper body, yellow shirt"
 
 
 
 
 
 
 
57
  with autocast("cuda"):
58
  image = pipe(prompt, guidance_scale=6)["sample"][0]
59
 
 
13
 
14
  waifu-diffusion is a latent text-to-image diffusion model that has been conditioned on high-quality anime images through fine-tuning.
15
 
16
+ <<<<<<< HEAD
17
  <img src=https://i.imgur.com/Y5Tmw1S.png width=75% height=75%>
18
 
19
  [Original Weights](https://huggingface.co/hakurei/waifu-diffusion-v1-3)
 
23
  We also support a [Gradio](https://github.com/gradio-app/gradio) Web UI and Colab with Diffusers to run Waifu Diffusion:
24
  [![Open In Spaces](https://camo.githubusercontent.com/00380c35e60d6b04be65d3d94a58332be5cc93779f630bcdfc18ab9a3a7d3388/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f25463025394625413425393725323048756767696e67253230466163652d5370616365732d626c7565)](https://huggingface.co/spaces/hakurei/waifu-diffusion-demo)
25
  [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1_8wPN7dJO746QXsFnB09Uq2VGgSRFuYE#scrollTo=1HaCauSq546O)
26
+ =======
27
+ <img src=https://cdn.discordapp.com/attachments/930499731451428926/1017258164439220254/unknown.png width=20% height=20%>
28
+ >>>>>>> b45bafccd9d0e0757b70a54c7ebc32ff56ca9ee1
29
 
30
  ## Model Description
31
 
32
  [See here for a full model overview.](https://gist.github.com/harubaru/f727cedacae336d1f7877c4bbe2196e1)
33
 
34
+ <<<<<<< HEAD
35
  ## License
36
 
37
  This model is open access and available to all, with a CreativeML OpenRAIL-M license further specifying rights and usage.
 
41
  2. The authors claims no rights on the outputs you generate, you are free to use them and are accountable for their use which must not go against the provisions set in the license
42
  3. You may re-distribute the weights and use the model commercially and/or as a service. If you do, please be aware you have to include the same use restrictions as the ones in the license and share a copy of the CreativeML OpenRAIL-M to all your users (please read the license entirely and carefully)
43
  [Please read the full license here](https://huggingface.co/spaces/CompVis/stable-diffusion-license)
44
+ =======
45
+ The current model has been fine-tuned with a learning rate of 5.0e-6 for 4 epochs on 56k Danbooru text-image pairs which all have an aesthetic rating greater than `6.0`.
46
+
47
+ ## Training Data & Annotative Prompting
48
+
49
+ The data used for fine-tuning has come from a random sample of 56k Danbooru images, which were filtered based on [CLIP Aesthetic Scoring](https://github.com/christophschuhmann/improved-aesthetic-predictor) where only images with an aesthetic score greater than `6.0` were used.
50
+
51
+ Captions are Danbooru-style captions.
52
+ >>>>>>> b45bafccd9d0e0757b70a54c7ebc32ff56ca9ee1
53
 
54
  ## Downstream Uses
55
 
 
67
  torch_dtype=torch.float32
68
  ).to('cuda')
69
 
70
+ <<<<<<< HEAD
71
  prompt = "1girl, aqua eyes, baseball cap, blonde hair, closed mouth, earrings, green background, hat, hoop earrings, jewelry, looking at viewer, shirt, short hair, simple background, solo, upper body, yellow shirt"
72
+ =======
73
+
74
+ pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16, revision='fp16')
75
+ pipe = pipe.to(device)
76
+
77
+ prompt = "touhou hakurei_reimu 1girl solo portrait"
78
+ >>>>>>> b45bafccd9d0e0757b70a54c7ebc32ff56ca9ee1
79
  with autocast("cuda"):
80
  image = pipe(prompt, guidance_scale=6)["sample"][0]
81
 
model_index.json CHANGED
@@ -11,7 +11,11 @@
11
  ],
12
  "scheduler": [
13
  "diffusers",
 
14
  "LMSDiscreteScheduler"
 
 
 
15
  ],
16
  "text_encoder": [
17
  "transformers",
 
11
  ],
12
  "scheduler": [
13
  "diffusers",
14
+ <<<<<<< HEAD
15
  "LMSDiscreteScheduler"
16
+ =======
17
+ "DDIMScheduler"
18
+ >>>>>>> b45bafccd9d0e0757b70a54c7ebc32ff56ca9ee1
19
  ],
20
  "text_encoder": [
21
  "transformers",
safety_checker/config.json CHANGED
@@ -1,6 +1,10 @@
1
  {
 
2
  "_commit_hash": null,
3
  "_name_or_path": "CompVis/stable-diffusion-safety-checker",
 
 
 
4
  "architectures": [
5
  "StableDiffusionSafetyChecker"
6
  ],
@@ -77,7 +81,11 @@
77
  "top_p": 1.0,
78
  "torch_dtype": null,
79
  "torchscript": false,
 
80
  "transformers_version": "4.22.2",
 
 
 
81
  "typical_p": 1.0,
82
  "use_bfloat16": false,
83
  "vocab_size": 49408
@@ -161,7 +169,11 @@
161
  "top_p": 1.0,
162
  "torch_dtype": null,
163
  "torchscript": false,
 
164
  "transformers_version": "4.22.2",
 
 
 
165
  "typical_p": 1.0,
166
  "use_bfloat16": false
167
  },
 
1
  {
2
+ <<<<<<< HEAD
3
  "_commit_hash": null,
4
  "_name_or_path": "CompVis/stable-diffusion-safety-checker",
5
+ =======
6
+ "_name_or_path": "waifu-diffusion/safety_checker",
7
+ >>>>>>> b45bafccd9d0e0757b70a54c7ebc32ff56ca9ee1
8
  "architectures": [
9
  "StableDiffusionSafetyChecker"
10
  ],
 
81
  "top_p": 1.0,
82
  "torch_dtype": null,
83
  "torchscript": false,
84
+ <<<<<<< HEAD
85
  "transformers_version": "4.22.2",
86
+ =======
87
+ "transformers_version": "4.21.3",
88
+ >>>>>>> b45bafccd9d0e0757b70a54c7ebc32ff56ca9ee1
89
  "typical_p": 1.0,
90
  "use_bfloat16": false,
91
  "vocab_size": 49408
 
169
  "top_p": 1.0,
170
  "torch_dtype": null,
171
  "torchscript": false,
172
+ <<<<<<< HEAD
173
  "transformers_version": "4.22.2",
174
+ =======
175
+ "transformers_version": "4.21.3",
176
+ >>>>>>> b45bafccd9d0e0757b70a54c7ebc32ff56ca9ee1
177
  "typical_p": 1.0,
178
  "use_bfloat16": false
179
  },
safety_checker/pytorch_model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:1d37ca6e57ace94e4c2f03ed0f67b6dc83e1ef1160892074917aa68b28e2afc1
3
- size 608098599
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6b1dc15150c06764bb60249c8a68b3e31319c66293d800c252b4d400e3e7ea17
3
+ size 295
scheduler/scheduler_config.json CHANGED
@@ -4,6 +4,13 @@
4
  "beta_end": 0.012,
5
  "beta_schedule": "scaled_linear",
6
  "beta_start": 0.00085,
 
7
  "num_train_timesteps": 1000,
 
 
 
 
 
 
8
  "trained_betas": null
9
  }
 
4
  "beta_end": 0.012,
5
  "beta_schedule": "scaled_linear",
6
  "beta_start": 0.00085,
7
+ <<<<<<< HEAD
8
  "num_train_timesteps": 1000,
9
+ =======
10
+ "clip_sample": false,
11
+ "num_train_timesteps": 1000,
12
+ "set_alpha_to_one": false,
13
+ "timestep_values": null,
14
+ >>>>>>> b45bafccd9d0e0757b70a54c7ebc32ff56ca9ee1
15
  "trained_betas": null
16
  }
text_encoder/config.json CHANGED
@@ -1,5 +1,5 @@
1
  {
2
- "_name_or_path": "openai/clip-vit-large-patch14",
3
  "architectures": [
4
  "CLIPTextModel"
5
  ],
@@ -18,8 +18,13 @@
18
  "num_attention_heads": 12,
19
  "num_hidden_layers": 12,
20
  "pad_token_id": 1,
 
21
  "projection_dim": 768,
22
  "torch_dtype": "float16",
23
  "transformers_version": "4.22.2",
 
 
 
 
24
  "vocab_size": 49408
25
  }
 
1
  {
2
+ "_name_or_path": "waifu-diffusion/text_encoder",
3
  "architectures": [
4
  "CLIPTextModel"
5
  ],
 
18
  "num_attention_heads": 12,
19
  "num_hidden_layers": 12,
20
  "pad_token_id": 1,
21
+ <<<<<<< HEAD
22
  "projection_dim": 768,
23
  "torch_dtype": "float16",
24
  "transformers_version": "4.22.2",
25
+ =======
26
+ "torch_dtype": "float16",
27
+ "transformers_version": "4.21.3",
28
+ >>>>>>> b45bafccd9d0e0757b70a54c7ebc32ff56ca9ee1
29
  "vocab_size": 49408
30
  }
text_encoder/pytorch_model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:88bd85efb0f84e70521633f578715afb2873db4f2615fdfb1f66e99934715865
3
- size 246184375
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:439fc72b1855e991f9c6525d746fa8a7590e2763a9fc7ac077cea4b2e4a1ce93
3
+ size 295
tokenizer/tokenizer_config.json CHANGED
@@ -19,7 +19,7 @@
19
  },
20
  "errors": "replace",
21
  "model_max_length": 77,
22
- "name_or_path": "openai/clip-vit-large-patch14",
23
  "pad_token": "<|endoftext|>",
24
  "special_tokens_map_file": "./special_tokens_map.json",
25
  "tokenizer_class": "CLIPTokenizer",
 
19
  },
20
  "errors": "replace",
21
  "model_max_length": 77,
22
+ "name_or_path": "waifu-diffusion/tokenizer",
23
  "pad_token": "<|endoftext|>",
24
  "special_tokens_map_file": "./special_tokens_map.json",
25
  "tokenizer_class": "CLIPTokenizer",
unet/config.json CHANGED
@@ -1,6 +1,11 @@
1
  {
2
  "_class_name": "UNet2DConditionModel",
 
3
  "_diffusers_version": "0.4.1",
 
 
 
 
4
  "act_fn": "silu",
5
  "attention_head_dim": 8,
6
  "block_out_channels": [
 
1
  {
2
  "_class_name": "UNet2DConditionModel",
3
+ <<<<<<< HEAD
4
  "_diffusers_version": "0.4.1",
5
+ =======
6
+ "_diffusers_version": "0.2.4",
7
+ "_name_or_path": "waifu-diffusion/unet",
8
+ >>>>>>> b45bafccd9d0e0757b70a54c7ebc32ff56ca9ee1
9
  "act_fn": "silu",
10
  "attention_head_dim": 8,
11
  "block_out_channels": [
unet/diffusion_pytorch_model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:74e8495acd79b59493a5e3512d93d418f9188fb99dd66bd36560e6f7155a82c6
3
- size 1719312805
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8a9d5548268e0013c7d21a0f451ba016098a7069bbfba4b6744312b58fe86707
3
+ size 297
vae/config.json CHANGED
@@ -1,6 +1,11 @@
1
  {
2
  "_class_name": "AutoencoderKL",
 
3
  "_diffusers_version": "0.4.1",
 
 
 
 
4
  "act_fn": "silu",
5
  "block_out_channels": [
6
  128,
 
1
  {
2
  "_class_name": "AutoencoderKL",
3
+ <<<<<<< HEAD
4
  "_diffusers_version": "0.4.1",
5
+ =======
6
+ "_diffusers_version": "0.2.4",
7
+ "_name_or_path": "waifu-diffusion/vae",
8
+ >>>>>>> b45bafccd9d0e0757b70a54c7ebc32ff56ca9ee1
9
  "act_fn": "silu",
10
  "block_out_channels": [
11
  128,
vae/diffusion_pytorch_model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:51c8904bc921e1e6f354b5fa8e99a1c82ead2f0540114de21557b8abfbb24ad0
3
- size 167399505
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9980e77e11dfc7e18e14aefc2604f2bba3db7737e722cd11610f0cf6a1e96271
3
+ size 295