.gitattributes CHANGED
@@ -29,15 +29,3 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
29
  *.zip filter=lfs diff=lfs merge=lfs -text
30
  *.zst filter=lfs diff=lfs merge=lfs -text
31
  *tfevents* filter=lfs diff=lfs merge=lfs -text
32
- safety_checker/pytorch_model.bin filter=lfs diff=lfs merge=lfs -text
33
- text_encoder/pytorch_model.bin filter=lfs diff=lfs merge=lfs -text
34
- unet/diffusion_pytorch_model.bin filter=lfs diff=lfs merge=lfs -text
35
- vae/diffusion_pytorch_model.bin filter=lfs diff=lfs merge=lfs -text
36
- text_encoder/model.safetensors filter=lfs diff=lfs merge=lfs -text
37
- unet/diffusion_pytorch_model.safetensors filter=lfs diff=lfs merge=lfs -text
38
- vae/diffusion_pytorch_model.safetensors filter=lfs diff=lfs merge=lfs -text
39
- safety_checker/model.safetensors filter=lfs diff=lfs merge=lfs -text
40
- unet/diffusion_pytorch_model.fp16.safetensors filter=lfs diff=lfs merge=lfs -text
41
- text_encoder/model.fp16.safetensors filter=lfs diff=lfs merge=lfs -text
42
- vae/diffusion_pytorch_model.fp16.safetensors filter=lfs diff=lfs merge=lfs -text
43
- safety_checker/model.fp16.safetensors filter=lfs diff=lfs merge=lfs -text
 
29
  *.zip filter=lfs diff=lfs merge=lfs -text
30
  *.zst filter=lfs diff=lfs merge=lfs -text
31
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
 
 
 
 
 
 
 
README.md CHANGED
@@ -4,30 +4,36 @@ language:
4
  tags:
5
  - stable-diffusion
6
  - text-to-image
7
- license: creativeml-openrail-m
8
- inference: true
9
 
10
  ---
11
 
12
- # waifu-diffusion v1.4 - Diffusion for Weebs
13
 
14
  waifu-diffusion is a latent text-to-image diffusion model that has been conditioned on high-quality anime images through fine-tuning.
15
 
16
- ![image](https://user-images.githubusercontent.com/26317155/210155933-db3a5f1a-1ec3-4777-915c-6deff2841ce9.png)
17
 
18
- <sub>masterpiece, best quality, 1girl, green hair, sweater, looking at viewer, upper body, beanie, outdoors, watercolor, night, turtleneck</sub>
19
 
20
- [Original Weights](https://huggingface.co/hakurei/waifu-diffusion-v1-4)
21
 
22
- # Gradio & Colab
23
 
24
- We also support a [Gradio](https://github.com/gradio-app/gradio) Web UI and Colab with Diffusers to run Waifu Diffusion:
25
- [![Open In Spaces](https://camo.githubusercontent.com/00380c35e60d6b04be65d3d94a58332be5cc93779f630bcdfc18ab9a3a7d3388/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f25463025394625413425393725323048756767696e67253230466163652d5370616365732d626c7565)](https://huggingface.co/spaces/hakurei/waifu-diffusion-demo)
26
- [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1_8wPN7dJO746QXsFnB09Uq2VGgSRFuYE#scrollTo=1HaCauSq546O)
27
 
28
  ## Model Description
29
 
30
- [See here for a full model overview.](https://gist.github.com/harubaru/f727cedacae336d1f7877c4bbe2196e1)
 
 
 
 
 
 
 
 
31
 
32
  ## License
33
 
@@ -48,25 +54,38 @@ This model can be used for entertainment purposes and as a generative art assist
48
  ```python
49
  import torch
50
  from torch import autocast
51
- from diffusers import StableDiffusionPipeline
52
 
53
- pipe = StableDiffusionPipeline.from_pretrained(
54
- 'hakurei/waifu-diffusion',
55
- torch_dtype=torch.float32
56
- ).to('cuda')
57
 
58
- prompt = "1girl, aqua eyes, baseball cap, blonde hair, closed mouth, earrings, green background, hat, hoop earrings, jewelry, looking at viewer, shirt, short hair, simple background, solo, upper body, yellow shirt"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
59
  with autocast("cuda"):
60
- image = pipe(prompt, guidance_scale=6)["sample"][0]
61
 
62
- image.save("test.png")
63
  ```
64
 
65
  ## Team Members and Acknowledgements
66
 
67
- This project would not have been possible without the incredible work by Stability AI and Novel AI.
68
 
69
- - [Haru](https://github.com/harubaru)
70
  - [Salt](https://github.com/sALTaccount/)
71
  - [Sta @ Bit192](https://twitter.com/naclbbr)
72
 
 
4
  tags:
5
  - stable-diffusion
6
  - text-to-image
7
+ license: bigscience-bloom-rail-1.0
8
+ inference: false
9
 
10
  ---
11
 
12
+ # waifu-diffusion - Diffusion for Weebs
13
 
14
  waifu-diffusion is a latent text-to-image diffusion model that has been conditioned on high-quality anime images through fine-tuning.
15
 
16
+ # Gradio
17
 
18
+ We also support a [Gradio](https://github.com/gradio-app/gradio) web ui with diffusers to run inside a colab notebook:
19
 
20
+ [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1_8wPN7dJO746QXsFnB09Uq2VGgSRFuYE#scrollTo=1HaCauSq546O)
21
 
22
+ <img src=https://cdn.discordapp.com/attachments/930559077170421800/1017265913231327283/unknown.png width=40% height=40%>
23
 
24
+ [Original PyTorch Model Download Link](https://thisanimedoesnotexist.ai/downloads/wd-v1-2-full-ema.ckpt)
 
 
25
 
26
  ## Model Description
27
 
28
+ The model originally used for fine-tuning is [Stable Diffusion V1-4](https://huggingface.co/CompVis/stable-diffusion-v1-4), which is a latent image diffusion model trained on [LAION2B-en](https://huggingface.co/datasets/laion/laion2B-en).
29
+
30
+ The current model has been fine-tuned with a learning rate of 5.0e-6 for 4 epochs on 56k text-image pairs obtained through Danbooru which all have an aesthetic rating greater than `6.0`.
31
+
32
+ **Note:** This project has **no affiliation with Danbooru.**
33
+
34
+ ## Training Data & Annotative Prompting
35
+
36
+ The data used for fine-tuning has come from a random sample of 56k Danbooru images, which were filtered based on [CLIP Aesthetic Scoring](https://github.com/christophschuhmann/improved-aesthetic-predictor) where only images with an aesthetic score greater than `6.0` were used.
37
 
38
  ## License
39
 
 
54
  ```python
55
  import torch
56
  from torch import autocast
57
+ from diffusers import StableDiffusionPipeline, DDIMScheduler
58
 
59
+ model_id = "hakurei/waifu-diffusion"
60
+ device = "cuda"
 
 
61
 
62
+
63
+ pipe = StableDiffusionPipeline.from_pretrained(
64
+ model_id,
65
+ torch_dtype=torch.float16,
66
+ revision="fp16",
67
+ scheduler=DDIMScheduler(
68
+ beta_start=0.00085,
69
+ beta_end=0.012,
70
+ beta_schedule="scaled_linear",
71
+ clip_sample=False,
72
+ set_alpha_to_one=False,
73
+ ),
74
+ )
75
+ pipe = pipe.to(device)
76
+
77
+ prompt = "touhou hakurei_reimu 1girl solo portrait"
78
  with autocast("cuda"):
79
+ image = pipe(prompt, guidance_scale=7.5)["sample"][0]
80
 
81
+ image.save("reimu_hakurei.png")
82
  ```
83
 
84
  ## Team Members and Acknowledgements
85
 
86
+ This project would not have been possible without the incredible work by the [CompVis Researchers](https://ommer-lab.com/).
87
 
88
+ - [Anthony Mercurio](https://github.com/harubaru)
89
  - [Salt](https://github.com/sALTaccount/)
90
  - [Sta @ Bit192](https://twitter.com/naclbbr)
91
 
feature_extractor/preprocessor_config.json CHANGED
@@ -1,12 +1,8 @@
1
  {
2
- "crop_size": {
3
- "height": 224,
4
- "width": 224
5
- },
6
  "do_center_crop": true,
7
  "do_convert_rgb": true,
8
  "do_normalize": true,
9
- "do_rescale": true,
10
  "do_resize": true,
11
  "feature_extractor_type": "CLIPFeatureExtractor",
12
  "image_mean": [
@@ -14,15 +10,11 @@
14
  0.4578275,
15
  0.40821073
16
  ],
17
- "image_processor_type": "CLIPImageProcessor",
18
  "image_std": [
19
  0.26862954,
20
  0.26130258,
21
  0.27577711
22
  ],
23
  "resample": 3,
24
- "rescale_factor": 0.00392156862745098,
25
- "size": {
26
- "shortest_edge": 224
27
- }
28
  }
 
1
  {
2
+ "crop_size": 224,
 
 
 
3
  "do_center_crop": true,
4
  "do_convert_rgb": true,
5
  "do_normalize": true,
 
6
  "do_resize": true,
7
  "feature_extractor_type": "CLIPFeatureExtractor",
8
  "image_mean": [
 
10
  0.4578275,
11
  0.40821073
12
  ],
 
13
  "image_std": [
14
  0.26862954,
15
  0.26130258,
16
  0.27577711
17
  ],
18
  "resample": 3,
19
+ "size": 224
 
 
 
20
  }
model_index.json CHANGED
@@ -1,23 +1,22 @@
1
  {
2
  "_class_name": "StableDiffusionPipeline",
3
- "_diffusers_version": "0.10.2",
4
  "feature_extractor": [
5
  "transformers",
6
- "CLIPImageProcessor"
7
  ],
8
- "requires_safety_checker": true,
9
  "safety_checker": [
10
  "stable_diffusion",
11
  "StableDiffusionSafetyChecker"
12
  ],
13
- "scheduler": [
14
- "diffusers",
15
- "PNDMScheduler"
16
- ],
17
  "text_encoder": [
18
  "transformers",
19
  "CLIPTextModel"
20
  ],
 
 
 
 
21
  "tokenizer": [
22
  "transformers",
23
  "CLIPTokenizer"
 
1
  {
2
  "_class_name": "StableDiffusionPipeline",
3
+ "_diffusers_version": "0.2.4",
4
  "feature_extractor": [
5
  "transformers",
6
+ "CLIPFeatureExtractor"
7
  ],
 
8
  "safety_checker": [
9
  "stable_diffusion",
10
  "StableDiffusionSafetyChecker"
11
  ],
 
 
 
 
12
  "text_encoder": [
13
  "transformers",
14
  "CLIPTextModel"
15
  ],
16
+ "scheduler": [
17
+ "diffusers",
18
+ "DDIMScheduler"
19
+ ],
20
  "tokenizer": [
21
  "transformers",
22
  "CLIPTokenizer"
safety_checker/config.json CHANGED
@@ -1,6 +1,5 @@
1
  {
2
- "_commit_hash": "cb41f3a270d63d454d385fc2e4f571c487c253c5",
3
- "_name_or_path": "CompVis/stable-diffusion-safety-checker",
4
  "architectures": [
5
  "StableDiffusionSafetyChecker"
6
  ],
@@ -14,7 +13,6 @@
14
  "architectures": null,
15
  "attention_dropout": 0.0,
16
  "bad_words_ids": null,
17
- "begin_suppress_tokens": null,
18
  "bos_token_id": 0,
19
  "chunk_size_feed_forward": 0,
20
  "cross_attention_hidden_size": null,
@@ -62,17 +60,14 @@
62
  "pad_token_id": 1,
63
  "prefix": null,
64
  "problem_type": null,
65
- "projection_dim": 512,
66
  "pruned_heads": {},
67
  "remove_invalid_values": false,
68
  "repetition_penalty": 1.0,
69
  "return_dict": true,
70
  "return_dict_in_generate": false,
71
  "sep_token_id": null,
72
- "suppress_tokens": null,
73
  "task_specific_params": null,
74
  "temperature": 1.0,
75
- "tf_legacy_loss": false,
76
  "tie_encoder_decoder": false,
77
  "tie_word_embeddings": true,
78
  "tokenizer_class": null,
@@ -80,7 +75,7 @@
80
  "top_p": 1.0,
81
  "torch_dtype": null,
82
  "torchscript": false,
83
- "transformers_version": "4.25.1",
84
  "typical_p": 1.0,
85
  "use_bfloat16": false,
86
  "vocab_size": 49408
@@ -99,7 +94,6 @@
99
  "architectures": null,
100
  "attention_dropout": 0.0,
101
  "bad_words_ids": null,
102
- "begin_suppress_tokens": null,
103
  "bos_token_id": null,
104
  "chunk_size_feed_forward": 0,
105
  "cross_attention_hidden_size": null,
@@ -139,7 +133,6 @@
139
  "num_attention_heads": 16,
140
  "num_beam_groups": 1,
141
  "num_beams": 1,
142
- "num_channels": 3,
143
  "num_hidden_layers": 24,
144
  "num_return_sequences": 1,
145
  "output_attentions": false,
@@ -149,17 +142,14 @@
149
  "patch_size": 14,
150
  "prefix": null,
151
  "problem_type": null,
152
- "projection_dim": 512,
153
  "pruned_heads": {},
154
  "remove_invalid_values": false,
155
  "repetition_penalty": 1.0,
156
  "return_dict": true,
157
  "return_dict_in_generate": false,
158
  "sep_token_id": null,
159
- "suppress_tokens": null,
160
  "task_specific_params": null,
161
  "temperature": 1.0,
162
- "tf_legacy_loss": false,
163
  "tie_encoder_decoder": false,
164
  "tie_word_embeddings": true,
165
  "tokenizer_class": null,
@@ -167,7 +157,7 @@
167
  "top_p": 1.0,
168
  "torch_dtype": null,
169
  "torchscript": false,
170
- "transformers_version": "4.25.1",
171
  "typical_p": 1.0,
172
  "use_bfloat16": false
173
  },
 
1
  {
2
+ "_name_or_path": "./safety_module",
 
3
  "architectures": [
4
  "StableDiffusionSafetyChecker"
5
  ],
 
13
  "architectures": null,
14
  "attention_dropout": 0.0,
15
  "bad_words_ids": null,
 
16
  "bos_token_id": 0,
17
  "chunk_size_feed_forward": 0,
18
  "cross_attention_hidden_size": null,
 
60
  "pad_token_id": 1,
61
  "prefix": null,
62
  "problem_type": null,
 
63
  "pruned_heads": {},
64
  "remove_invalid_values": false,
65
  "repetition_penalty": 1.0,
66
  "return_dict": true,
67
  "return_dict_in_generate": false,
68
  "sep_token_id": null,
 
69
  "task_specific_params": null,
70
  "temperature": 1.0,
 
71
  "tie_encoder_decoder": false,
72
  "tie_word_embeddings": true,
73
  "tokenizer_class": null,
 
75
  "top_p": 1.0,
76
  "torch_dtype": null,
77
  "torchscript": false,
78
+ "transformers_version": "4.21.0.dev0",
79
  "typical_p": 1.0,
80
  "use_bfloat16": false,
81
  "vocab_size": 49408
 
94
  "architectures": null,
95
  "attention_dropout": 0.0,
96
  "bad_words_ids": null,
 
97
  "bos_token_id": null,
98
  "chunk_size_feed_forward": 0,
99
  "cross_attention_hidden_size": null,
 
133
  "num_attention_heads": 16,
134
  "num_beam_groups": 1,
135
  "num_beams": 1,
 
136
  "num_hidden_layers": 24,
137
  "num_return_sequences": 1,
138
  "output_attentions": false,
 
142
  "patch_size": 14,
143
  "prefix": null,
144
  "problem_type": null,
 
145
  "pruned_heads": {},
146
  "remove_invalid_values": false,
147
  "repetition_penalty": 1.0,
148
  "return_dict": true,
149
  "return_dict_in_generate": false,
150
  "sep_token_id": null,
 
151
  "task_specific_params": null,
152
  "temperature": 1.0,
 
153
  "tie_encoder_decoder": false,
154
  "tie_word_embeddings": true,
155
  "tokenizer_class": null,
 
157
  "top_p": 1.0,
158
  "torch_dtype": null,
159
  "torchscript": false,
160
+ "transformers_version": "4.21.0.dev0",
161
  "typical_p": 1.0,
162
  "use_bfloat16": false
163
  },
safety_checker/model.fp16.safetensors DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:08902f19b1cfebd7c989f152fc0507bef6898c706a91d666509383122324b511
3
- size 608018440
 
 
 
 
safety_checker/model.safetensors DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:9d6a233ff6fd5ccb9f76fd99618d73369c52dd3d8222376384d0e601911089e8
3
- size 1215981830
 
 
 
 
safety_checker/pytorch_model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:16d28f2b37109f222cdc33620fdd262102ac32112be0352a7f77e9614b35a394
3
- size 1216064769
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:193490b58ef62739077262e833bf091c66c29488058681ac25cf7df3d8190974
3
+ size 1216061799
safety_checker/pytorch_model.fp16.bin DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:22ba87205445ad5def13e54919b038dcfb7321ec1c3f4b12487d4fba6036125f
3
- size 608103564
 
 
 
 
scheduler/scheduler_config.json CHANGED
@@ -1,14 +1,12 @@
1
  {
2
- "_class_name": "PNDMScheduler",
3
- "_diffusers_version": "0.10.2",
4
  "beta_end": 0.012,
5
  "beta_schedule": "scaled_linear",
6
  "beta_start": 0.00085,
7
  "clip_sample": false,
8
  "num_train_timesteps": 1000,
9
- "prediction_type": "epsilon",
10
  "set_alpha_to_one": false,
11
- "skip_prk_steps": true,
12
- "steps_offset": 1,
13
  "trained_betas": null
14
  }
 
1
  {
2
+ "_class_name": "DDIMScheduler",
3
+ "_diffusers_version": "0.2.4",
4
  "beta_end": 0.012,
5
  "beta_schedule": "scaled_linear",
6
  "beta_start": 0.00085,
7
  "clip_sample": false,
8
  "num_train_timesteps": 1000,
 
9
  "set_alpha_to_one": false,
10
+ "timestep_values": null,
 
11
  "trained_betas": null
12
  }
text_encoder/config.json CHANGED
@@ -1,5 +1,5 @@
1
  {
2
- "_name_or_path": "/mnt/sd-finetune-data/finetunes/step_57000",
3
  "architectures": [
4
  "CLIPTextModel"
5
  ],
@@ -7,19 +7,18 @@
7
  "bos_token_id": 0,
8
  "dropout": 0.0,
9
  "eos_token_id": 2,
10
- "hidden_act": "gelu",
11
- "hidden_size": 1024,
12
  "initializer_factor": 1.0,
13
  "initializer_range": 0.02,
14
- "intermediate_size": 4096,
15
  "layer_norm_eps": 1e-05,
16
  "max_position_embeddings": 77,
17
  "model_type": "clip_text_model",
18
- "num_attention_heads": 16,
19
- "num_hidden_layers": 23,
20
  "pad_token_id": 1,
21
- "projection_dim": 512,
22
  "torch_dtype": "float32",
23
- "transformers_version": "4.25.1",
24
  "vocab_size": 49408
25
  }
 
1
  {
2
+ "_name_or_path": "openai/clip-vit-large-patch14",
3
  "architectures": [
4
  "CLIPTextModel"
5
  ],
 
7
  "bos_token_id": 0,
8
  "dropout": 0.0,
9
  "eos_token_id": 2,
10
+ "hidden_act": "quick_gelu",
11
+ "hidden_size": 768,
12
  "initializer_factor": 1.0,
13
  "initializer_range": 0.02,
14
+ "intermediate_size": 3072,
15
  "layer_norm_eps": 1e-05,
16
  "max_position_embeddings": 77,
17
  "model_type": "clip_text_model",
18
+ "num_attention_heads": 12,
19
+ "num_hidden_layers": 12,
20
  "pad_token_id": 1,
 
21
  "torch_dtype": "float32",
22
+ "transformers_version": "4.21.3",
23
  "vocab_size": 49408
24
  }
text_encoder/model.fp16.safetensors DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:22bc8e104d064b678ef7d2d2b217d4a8c9bfb79fb35792417cdf228e70adc7fb
3
- size 680821096
 
 
 
 
text_encoder/model.safetensors DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:661a5d7f8e19fce696aa9d932ab97b546b4d4a2a2d87238a17761bef2704269f
3
- size 1361597016
 
 
 
 
text_encoder/pytorch_model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:040fc6498aa3cdbb926dc2d01c3d6629521e5f085d901d5e8d8c2b0e0aa2b1ce
3
- size 1361679905
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:770a47a9ffdcfda0b05506a7888ed714d06131d60267e6cf52765d61cf59fd67
3
+ size 492305335
text_encoder/pytorch_model.fp16.bin DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:ba47a6c751cce5b6c4d5c79c8cd63bab63bfce3539e7805a70b8dca9d1f2f151
3
- size 680900852
 
 
 
 
tokenizer/special_tokens_map.json CHANGED
@@ -13,7 +13,7 @@
13
  "rstrip": false,
14
  "single_word": false
15
  },
16
- "pad_token": "!",
17
  "unk_token": {
18
  "content": "<|endoftext|>",
19
  "lstrip": false,
 
13
  "rstrip": false,
14
  "single_word": false
15
  },
16
+ "pad_token": "<|endoftext|>",
17
  "unk_token": {
18
  "content": "<|endoftext|>",
19
  "lstrip": false,
tokenizer/tokenizer_config.json CHANGED
@@ -19,7 +19,7 @@
19
  },
20
  "errors": "replace",
21
  "model_max_length": 77,
22
- "name_or_path": "/mnt/sd-finetune-data/finetunes/step_57000",
23
  "pad_token": "<|endoftext|>",
24
  "special_tokens_map_file": "./special_tokens_map.json",
25
  "tokenizer_class": "CLIPTokenizer",
 
19
  },
20
  "errors": "replace",
21
  "model_max_length": 77,
22
+ "name_or_path": "openai/clip-vit-large-patch14",
23
  "pad_token": "<|endoftext|>",
24
  "special_tokens_map_file": "./special_tokens_map.json",
25
  "tokenizer_class": "CLIPTokenizer",
unet/config.json CHANGED
@@ -1,14 +1,8 @@
1
  {
2
  "_class_name": "UNet2DConditionModel",
3
- "_diffusers_version": "0.10.2",
4
- "_name_or_path": "/mnt/sd-finetune-data/finetunes/step_57000",
5
  "act_fn": "silu",
6
- "attention_head_dim": [
7
- 5,
8
- 10,
9
- 20,
10
- 20
11
- ],
12
  "block_out_channels": [
13
  320,
14
  640,
@@ -16,7 +10,7 @@
16
  1280
17
  ],
18
  "center_input_sample": false,
19
- "cross_attention_dim": 1024,
20
  "down_block_types": [
21
  "CrossAttnDownBlock2D",
22
  "CrossAttnDownBlock2D",
@@ -24,7 +18,6 @@
24
  "DownBlock2D"
25
  ],
26
  "downsample_padding": 1,
27
- "dual_cross_attention": false,
28
  "flip_sin_to_cos": true,
29
  "freq_shift": 0,
30
  "in_channels": 4,
@@ -32,16 +25,12 @@
32
  "mid_block_scale_factor": 1,
33
  "norm_eps": 1e-05,
34
  "norm_num_groups": 32,
35
- "num_class_embeds": null,
36
- "only_cross_attention": false,
37
  "out_channels": 4,
38
- "sample_size": 64,
39
  "up_block_types": [
40
  "UpBlock2D",
41
  "CrossAttnUpBlock2D",
42
  "CrossAttnUpBlock2D",
43
  "CrossAttnUpBlock2D"
44
- ],
45
- "upcast_attention": false,
46
- "use_linear_projection": true
47
  }
 
1
  {
2
  "_class_name": "UNet2DConditionModel",
3
+ "_diffusers_version": "0.2.4",
 
4
  "act_fn": "silu",
5
+ "attention_head_dim": 8,
 
 
 
 
 
6
  "block_out_channels": [
7
  320,
8
  640,
 
10
  1280
11
  ],
12
  "center_input_sample": false,
13
+ "cross_attention_dim": 768,
14
  "down_block_types": [
15
  "CrossAttnDownBlock2D",
16
  "CrossAttnDownBlock2D",
 
18
  "DownBlock2D"
19
  ],
20
  "downsample_padding": 1,
 
21
  "flip_sin_to_cos": true,
22
  "freq_shift": 0,
23
  "in_channels": 4,
 
25
  "mid_block_scale_factor": 1,
26
  "norm_eps": 1e-05,
27
  "norm_num_groups": 32,
 
 
28
  "out_channels": 4,
29
+ "sample_size": 32,
30
  "up_block_types": [
31
  "UpBlock2D",
32
  "CrossAttnUpBlock2D",
33
  "CrossAttnUpBlock2D",
34
  "CrossAttnUpBlock2D"
35
+ ]
 
 
36
  }
unet/diffusion_pytorch_model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:24d2d0a39a4cd06869c91173d507cb153f272a1a328514f70b7ce9b48cab7e2b
3
- size 3463934693
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9879a41e1f8b02bbe3937110c4f4b0171e3c04f9c6f02817cde986a3c4d09afe
3
+ size 3438354725
unet/diffusion_pytorch_model.fp16.bin DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:96edda2701914e1e248197bb205305e6aa9cfc776c3372cff9eaf62d4706a3cf
3
- size 1732107093
 
 
 
 
unet/diffusion_pytorch_model.fp16.safetensors DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:7b5cdd1c15f025166ded673f898e09e621c9ff828d6508a81e83378a6d0ba8dd
3
- size 1731904736
 
 
 
 
unet/diffusion_pytorch_model.safetensors DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:dda5a15fe85e6ea7fe0e21d06264611246ab60bdbf7001daa1e48028a49cd2e3
3
- size 3463726500
 
 
 
 
vae/config.json CHANGED
@@ -1,7 +1,6 @@
1
  {
2
  "_class_name": "AutoencoderKL",
3
- "_diffusers_version": "0.10.2",
4
- "_name_or_path": "/mnt/sd-finetune-data/base/vae",
5
  "act_fn": "silu",
6
  "block_out_channels": [
7
  128,
@@ -18,7 +17,6 @@
18
  "in_channels": 3,
19
  "latent_channels": 4,
20
  "layers_per_block": 2,
21
- "norm_num_groups": 32,
22
  "out_channels": 3,
23
  "sample_size": 512,
24
  "up_block_types": [
 
1
  {
2
  "_class_name": "AutoencoderKL",
3
+ "_diffusers_version": "0.2.4",
 
4
  "act_fn": "silu",
5
  "block_out_channels": [
6
  128,
 
17
  "in_channels": 3,
18
  "latent_channels": 4,
19
  "layers_per_block": 2,
 
20
  "out_channels": 3,
21
  "sample_size": 512,
22
  "up_block_types": [
vae/diffusion_pytorch_model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:3e174991e5609bc5c2b3995e3f223fb2c5f0ae3be307fa9591b351d837a08770
3
- size 334711857
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1b134cded8eb78b184aefb8805b6b572f36fa77b255c483665dda931fa0130c5
3
+ size 334707217
vae/diffusion_pytorch_model.fp16.bin DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:d207e4928394a2002bcb7fff829e93bbd2a44bc323e597fdb690d3fc2d064de2
3
- size 167405651
 
 
 
 
vae/diffusion_pytorch_model.fp16.safetensors DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:9d8fb415ab4f9782232e7bb82e618e2c0cef0be3593c77a35f5733d8fdd3530f
3
- size 167335342
 
 
 
 
vae/diffusion_pytorch_model.safetensors DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:d55443a2d9d4d9decdbe669c51cc6d91eb6a2297477624e2e16a3054f30c2f5a
3
- size 334643276