Ojimi commited on
Commit
a21bee3
1 Parent(s): 3cf8f3b

Upload Kawai Diffusion

Browse files
README.md CHANGED
@@ -1,25 +1,14 @@
1
- ---
2
- license: creativeml-openrail-m
3
- language:
4
- - en
5
- library_name: diffusers
6
- pipeline_tag: text-to-image
7
- tags:
8
- - art
9
- - pytorch
10
- ---
11
- ## Announcement: My new model called Kawai Diffusion SD1.5 has been uploaded to CivitAI, you can check it out here: https://civitai.com/models/21138/kawai-diffusion-sd15
12
- # Important: since there are already more quality models than mine, subsequent upgrades are unnecessary for an anonymous and uninterested model. Thank you for your interest, my biggest ambition is that we will be able to do Anime, Have fun and see you soon! My big family.
13
- # Waifumake (●'◡'●) AI Art model.
14
-
15
- ![](logo.png)
16
-
17
- A single student training an AI model that generates art.
18
-
19
- ## **New model avalable** : [waifumake-full-v2](waifumake-full-v2.safetensors)!
20
- ## What's new in v2:
21
  - Fix color loss.
22
- - Increase image quality.
 
 
 
 
23
 
24
  ## Introduction:
25
  - It's an AI art model for converting text to images, images to images, inpainting, and outpainting using Stable Diffusion.
@@ -28,68 +17,54 @@ A single student training an AI model that generates art.
28
  - Create an image from a sketch you created from a pure drawing program. (MS Paint)
29
  - The model is aimed at everyone and has limitless usage potential.
30
 
31
- ## Used:
32
- - For 🧨 Diffusers Library:
33
  ```python
34
  from diffusers import DiffusionPipeline
35
 
36
- pipe = DiffusionPipeline.from_pretrained("Ojimi/waifumake-full")
37
  pipe = pipe.to("cuda")
38
 
39
  prompt = "1girl, animal ears, long hair, solo, cat ears, choker, bare shoulders, red eyes, fang, looking at viewer, animal ear fluff, upper body, black hair, blush, closed mouth, off shoulder, bangs, bow, collarbone"
40
  image = pipe(prompt, negative_prompt="lowres, bad anatomy").images[0]
41
  ```
 
 
42
 
43
- - For Web UI by Automatic1111:
44
- ```bash
45
- #Install Web UI.
46
- git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui.git
47
- cd /content/stable-diffusion-webui/
48
- pip install -qq -r requirements.txt
49
- pip install -U xformers #Install `xformes` for better performance.
50
- ```
51
-
52
- ```bash
53
- #Download model.
54
- wget https://huggingface.co/Ojimi/waifumake-full/resolve/main/waifumake-full-v2.safetensors -O /content/stable-diffusion-webui/models/Stable-diffusion/waifumake-full-v2.safetensors
55
- ```
56
-
57
- ```bash
58
- #Run and enjoy ☕.
59
- cd /content/stable-diffusion-webui
60
- python launch.py --xformers
61
  ```
62
-
63
- - Try it in Google Colab [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1D6LNtXrpD2QfUx-d_yztWZVgTiDAyyAT?usp=sharing)
64
  ## Tips:
65
  - The `masterpiece` and `best quality` tags are not necessary, as it sometimes leads to contradictory results, but if it is distorted or discolored, add them now.
66
  - The CGF scale should be 7.5 and the step count 28 for the best quality and best performance.
67
  - Use a sample photo for your idea. `Interrogate DeepBooru` and change the prompts to suit what you want.
68
  - You should use it as a supportive tool for creating works of art, and not rely on it completely.
69
-
70
- ## Preview: v2 model
71
- ![](preview1.png)
72
- ![](preview2.png)
73
- ![](preview3.png)
74
- ![](preview4.png)
75
- - Enchance and Upscale using Stable Diffusion - Waifumake model v2 (⚠️Performance warning): I recommend against making the image too big as it can lead to unexpected problems.
76
- ![](preview5.png)
77
- ## Training:
78
- - **Data**: The model is trained based on a database of various sources from the Internet provided by my friend and images created by another AI.
79
- - **Schedule**: Euler Ancestral Discrete.
80
- - **Optimizer**: AdamW, xFormers
81
- - **Precision**: BF16.
82
- - **Hardware**: Google Colaboratory Pro - NVIDIA A100 40GB VRAM.
 
 
83
 
84
  ## **Limitations:**
 
85
  - Loss of detail, errors, bad human-like (six-fingered hand) details, deformation, blurring, and unclear images are inevitable.
86
- - Complex tasks cannot be handled.
87
  - ⚠️Content may not be appropriate for all ages: As it is trained on data that includes adult content, the generated images may contain content not suitable for children (depending on your country there will be a specific regulation about it). If you do not want to appear adult content, make sure you have additional safety measures in place, such as adding "nsfw" to the negative prompt.
88
  - The results generated by the model are considered impressive. But unfortunately, currently, it only supports the English language, to use multilingual, consider using third-party translation programs.
89
  - The model is trained on the `Danbooru` and `Nai` tagging system, so the long text may result in poor results.
90
  - My amount of money: 0 USD =((.
91
-
92
- ![](money-wallet.gif)
93
 
94
  ## **Desires:**
95
  As it is a version made only by myself and my small associates, the model will not be perfect and may differ from what people expect. Any contributions from everyone will be respected.
@@ -99,27 +74,29 @@ Want to support me? Thank you, please help me make it better. ❤️
99
  ## Special Thank:
100
  This wouldn't have happened if they hadn't made a breakthrough.
101
  - [Runwayml](https://huggingface.co/runwayml/): Base model.
102
- - [d8ahazard](https://github.com/d8ahazard/sd_dreambooth_extension) : Dreambooth.
103
  - [Automatic1111](https://github.com/AUTOMATIC1111/) : Web UI.
104
  - [Mikubill](https://github.com/Mikubill/): Where my ideas started.
105
  - Chat-GPT: Help me do crazy things that I thought I would never do.
106
- - Novel AI: Dataset images. An AI made me thousands of pictures without worrying about copyright or dispute.
107
  - Danbooru: Help me write the correct tag.
108
- - My friend and others.
109
  - And You 🫵❤️
110
 
111
  ## Copyright:
112
 
113
- This license allows anyone to copy, modify, publish, and commercialize the model, but please follow the terms of the CreativeML Open RAIL-M. You can learn more about the CreativeML Open RAIL-M at [here](LICENSE.txt).
114
 
115
- If any part of the model does not comply with the terms of the CreativeML Open RAIL-M, the copyright and other rights of the model will still be valid.
116
 
117
  All AI-generated images are yours, you can do whatever you want, but please obey the laws of your country. We will not be responsible for any problems you cause.
118
 
 
 
119
  Don't forget me.
120
 
121
  # Have fun with your waifu! (●'◡'●)
122
 
123
- ![](cry.png)
124
 
125
- Like it?
 
1
+
2
+ # Kawai Diffusion (anime-base) v3.0 LTS Big Update (≧∇≦)ノ
3
+ See more in CivitAI : https://civitai.com/models/21138/kawai-diffusion-sd15
4
+ ![](asset/preview.png)
5
+ ## What's new in Kawai v3.0 LTS:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
  - Fix color loss.
7
+ - Image quality is greatly enhanced. Thank you my friend.
8
+ - Kawai Diffusion's most powerful ability is "enhance" (img2img). It will make a bad photo look better.
9
+ - True "kawaii"... Haizzzzzzzzz
10
+ - Two versions: the [ema-only](kawai-diffusion-sdv15_ema-only.safetensors) model (5.28GB), and [pruned model](kawai-base-sd-v15_pruned-full.safetensors) (8.49GB). Come on, don't be surprised by it, even I was surprised.
11
+ - Can work on some VAE. But the pruned model does not require any VAE.
12
 
13
  ## Introduction:
14
  - It's an AI art model for converting text to images, images to images, inpainting, and outpainting using Stable Diffusion.
 
17
  - Create an image from a sketch you created from a pure drawing program. (MS Paint)
18
  - The model is aimed at everyone and has limitless usage potential.
19
 
20
+ ## Use:
21
+ - For 🧨Diffusers:
22
  ```python
23
  from diffusers import DiffusionPipeline
24
 
25
+ pipe = DiffusionPipeline.from_pretrained("Ojimi/anime-kawai-diffusion")
26
  pipe = pipe.to("cuda")
27
 
28
  prompt = "1girl, animal ears, long hair, solo, cat ears, choker, bare shoulders, red eyes, fang, looking at viewer, animal ear fluff, upper body, black hair, blush, closed mouth, off shoulder, bangs, bow, collarbone"
29
  image = pipe(prompt, negative_prompt="lowres, bad anatomy").images[0]
30
  ```
31
+ - Try it in Google Colab [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1D6LNtXrpD2QfUx-d_yztWZVgTiDAyyAT?usp=sharing)
32
+ - Chat GPT with Kawai Diffusion (or any model if you like.)
33
 
34
+ ```code
35
+ Read the following instructions, and if you understand, say "I understand": Command prompt structure: includes descriptions of shape, perspective, posture, and landscape,... Keywords are written briefly in the form of tags. For example "1girl, blonde hair, sitting, dress, red eyes, small breasts, star, night sky, moon"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
36
  ```
 
 
37
  ## Tips:
38
  - The `masterpiece` and `best quality` tags are not necessary, as it sometimes leads to contradictory results, but if it is distorted or discolored, add them now.
39
  - The CGF scale should be 7.5 and the step count 28 for the best quality and best performance.
40
  - Use a sample photo for your idea. `Interrogate DeepBooru` and change the prompts to suit what you want.
41
  - You should use it as a supportive tool for creating works of art, and not rely on it completely.
42
+ - The Clip skip should be 2.
43
+
44
+ ## **Training**:
45
+ - **Data**: Created by another AI.
46
+ - **Schedule**: DDIM.
47
+ - **Optimizer**: AdamW.
48
+ - **Precision**: FP32.
49
+ - **Hardware**: Google Colaboratory Pro - NVIDIA A100 40GB VRAM, TESLA V100-SXM2 16GB.
50
+ ## **Model Unit Test:**
51
+ This is a program written by my friend to check model quality.
52
+ - Examiner: OpenAI ChatGPT-3.5-Turbo.
53
+ - Test: kawai-anime-sd.
54
+ - Schedule: DPM++ 2M Karras.
55
+ - Steps: 22.
56
+ - Guard: Guard Prompt 1.5.
57
+ - Test Report: [Here](https://civitai.com/gallery/299771?modelId=21138&modelVersionId=27219&infinite=false&returnUrl=%2Fmodels%2F21138%2Fkawai-diffusion-sd15).
58
 
59
  ## **Limitations:**
60
+ - The drawing is hard, not soft.
61
  - Loss of detail, errors, bad human-like (six-fingered hand) details, deformation, blurring, and unclear images are inevitable.
 
62
  - ⚠️Content may not be appropriate for all ages: As it is trained on data that includes adult content, the generated images may contain content not suitable for children (depending on your country there will be a specific regulation about it). If you do not want to appear adult content, make sure you have additional safety measures in place, such as adding "nsfw" to the negative prompt.
63
  - The results generated by the model are considered impressive. But unfortunately, currently, it only supports the English language, to use multilingual, consider using third-party translation programs.
64
  - The model is trained on the `Danbooru` and `Nai` tagging system, so the long text may result in poor results.
65
  - My amount of money: 0 USD =((.
66
+
67
+ ![](money-wallet.gif)
68
 
69
  ## **Desires:**
70
  As it is a version made only by myself and my small associates, the model will not be perfect and may differ from what people expect. Any contributions from everyone will be respected.
 
74
  ## Special Thank:
75
  This wouldn't have happened if they hadn't made a breakthrough.
76
  - [Runwayml](https://huggingface.co/runwayml/): Base model.
77
+ - [d8ahazard](https://github.com/d8ahazard/.sd_dreambooth_extension) : Dreambooth.
78
  - [Automatic1111](https://github.com/AUTOMATIC1111/) : Web UI.
79
  - [Mikubill](https://github.com/Mikubill/): Where my ideas started.
80
  - Chat-GPT: Help me do crazy things that I thought I would never do.
81
+ - Novel AI, Anything Model, Abyss Orange Model: Dataset images. An AI made me thousands of pictures without worrying about copyright or dispute.
82
  - Danbooru: Help me write the correct tag.
83
+ - My friend and others: Get quality images.
84
  - And You 🫵❤️
85
 
86
  ## Copyright:
87
 
88
+ This license allows anyone to copy, and modify the model, but please follow the terms of the CreativeML Open RAIL-M. You can learn more about the CreativeML Open RAIL-M [here](https://huggingface.co/spaces/CompVis/stable-diffusion-license).
89
 
90
+ If any part of the model does not comply with the terms of the GNU General Public License, the copyright and other rights of the model will still be valid.
91
 
92
  All AI-generated images are yours, you can do whatever you want, but please obey the laws of your country. We will not be responsible for any problems you cause.
93
 
94
+ We allow you to merge with another model, but if you share that merge model, don't forget to add me to the credits.
95
+
96
  Don't forget me.
97
 
98
  # Have fun with your waifu! (●'◡'●)
99
 
100
+ I have a hero, but I can't say his name and we've never met. But he was the one who laid the foundation for Kawai Diffusion. Although the model is not very popular, I love that hero very much. Thank you for your interest in my model. Thank you very much!
101
 
102
+ Like it? Buy me ko-fi: https://ko-fi.com/ojimi (≧∇≦)ノ
asset/preview.png ADDED
kawai-base-sd-v15_pruned-full.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:db4859a71b85711c2c274b4a25e542e35b2f53c4a967abeb4111bea5c1740691
3
+ size 9110913190
kawai-base-sd-v15_pruned-full.yaml ADDED
@@ -0,0 +1,70 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ model:
2
+ base_learning_rate: 1.0e-04
3
+ target: ldm.models.diffusion.ddpm.LatentDiffusion
4
+ params:
5
+ linear_start: 0.00085
6
+ linear_end: 0.0120
7
+ num_timesteps_cond: 1
8
+ log_every_t: 200
9
+ timesteps: 1000
10
+ first_stage_key: "image"
11
+ cond_stage_key: "caption"
12
+ image_size: 64
13
+ channels: 4
14
+ cond_stage_trainable: false # Note: different from the one we trained before
15
+ conditioning_key: crossattn
16
+ monitor: val/loss_simple_ema
17
+ scale_factor: 0.18215
18
+ use_ema: False
19
+
20
+ scheduler_config: # 10000 warmup steps
21
+ target: ldm.lr_scheduler.LambdaLinearScheduler
22
+ params:
23
+ warm_up_steps: [ 10000 ]
24
+ cycle_lengths: [ 10000000000000 ] # incredibly large number to prevent corner cases
25
+ f_start: [ 1.e-6 ]
26
+ f_max: [ 1. ]
27
+ f_min: [ 1. ]
28
+
29
+ unet_config:
30
+ target: ldm.modules.diffusionmodules.openaimodel.UNetModel
31
+ params:
32
+ image_size: 32 # unused
33
+ in_channels: 4
34
+ out_channels: 4
35
+ model_channels: 320
36
+ attention_resolutions: [ 4, 2, 1 ]
37
+ num_res_blocks: 2
38
+ channel_mult: [ 1, 2, 4, 4 ]
39
+ num_heads: 8
40
+ use_spatial_transformer: True
41
+ transformer_depth: 1
42
+ context_dim: 768
43
+ use_checkpoint: True
44
+ legacy: False
45
+
46
+ first_stage_config:
47
+ target: ldm.models.autoencoder.AutoencoderKL
48
+ params:
49
+ embed_dim: 4
50
+ monitor: val/rec_loss
51
+ ddconfig:
52
+ double_z: true
53
+ z_channels: 4
54
+ resolution: 256
55
+ in_channels: 3
56
+ out_ch: 3
57
+ ch: 128
58
+ ch_mult:
59
+ - 1
60
+ - 2
61
+ - 4
62
+ - 4
63
+ num_res_blocks: 2
64
+ attn_resolutions: [ ]
65
+ dropout: 0.0
66
+ lossconfig:
67
+ target: torch.nn.Identity
68
+
69
+ cond_stage_config:
70
+ target: ldm.modules.encoders.modules.FrozenCLIPEmbedder
kawai-diffusion-sdv15_ema-only.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cb932447733cb1b49afa25488bd2a001b7c9874e3db970dc972bb962c5fa330a
3
+ size 5672745097
kawai-origin/feature_extractor/preprocessor_config.json ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "crop_size": {
3
+ "height": 224,
4
+ "width": 224
5
+ },
6
+ "do_center_crop": true,
7
+ "do_convert_rgb": true,
8
+ "do_normalize": true,
9
+ "do_rescale": true,
10
+ "do_resize": true,
11
+ "feature_extractor_type": "CLIPFeatureExtractor",
12
+ "image_mean": [
13
+ 0.48145466,
14
+ 0.4578275,
15
+ 0.40821073
16
+ ],
17
+ "image_processor_type": "CLIPFeatureExtractor",
18
+ "image_std": [
19
+ 0.26862954,
20
+ 0.26130258,
21
+ 0.27577711
22
+ ],
23
+ "resample": 3,
24
+ "rescale_factor": 0.00392156862745098,
25
+ "size": {
26
+ "shortest_edge": 224
27
+ }
28
+ }
kawai-origin/model_index.json ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_class_name": "StableDiffusionPipeline",
3
+ "_diffusers_version": "0.15.0.dev0",
4
+ "feature_extractor": [
5
+ "transformers",
6
+ "CLIPFeatureExtractor"
7
+ ],
8
+ "requires_safety_checker": true,
9
+ "safety_checker": [
10
+ "stable_diffusion",
11
+ "StableDiffusionSafetyChecker"
12
+ ],
13
+ "scheduler": [
14
+ "diffusers",
15
+ "DDIMScheduler"
16
+ ],
17
+ "text_encoder": [
18
+ "transformers",
19
+ "CLIPTextModel"
20
+ ],
21
+ "tokenizer": [
22
+ "transformers",
23
+ "CLIPTokenizer"
24
+ ],
25
+ "unet": [
26
+ "diffusers",
27
+ "UNet2DConditionModel"
28
+ ],
29
+ "vae": [
30
+ "diffusers",
31
+ "AutoencoderKL"
32
+ ]
33
+ }
kawai-origin/safety_checker/config.json ADDED
@@ -0,0 +1,181 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_commit_hash": "cb41f3a270d63d454d385fc2e4f571c487c253c5",
3
+ "_name_or_path": "CompVis/stable-diffusion-safety-checker",
4
+ "architectures": [
5
+ "StableDiffusionSafetyChecker"
6
+ ],
7
+ "initializer_factor": 1.0,
8
+ "logit_scale_init_value": 2.6592,
9
+ "model_type": "clip",
10
+ "projection_dim": 768,
11
+ "text_config": {
12
+ "_name_or_path": "",
13
+ "add_cross_attention": false,
14
+ "architectures": null,
15
+ "attention_dropout": 0.0,
16
+ "bad_words_ids": null,
17
+ "begin_suppress_tokens": null,
18
+ "bos_token_id": 0,
19
+ "chunk_size_feed_forward": 0,
20
+ "cross_attention_hidden_size": null,
21
+ "decoder_start_token_id": null,
22
+ "diversity_penalty": 0.0,
23
+ "do_sample": false,
24
+ "dropout": 0.0,
25
+ "early_stopping": false,
26
+ "encoder_no_repeat_ngram_size": 0,
27
+ "eos_token_id": 2,
28
+ "exponential_decay_length_penalty": null,
29
+ "finetuning_task": null,
30
+ "forced_bos_token_id": null,
31
+ "forced_eos_token_id": null,
32
+ "hidden_act": "quick_gelu",
33
+ "hidden_size": 768,
34
+ "id2label": {
35
+ "0": "LABEL_0",
36
+ "1": "LABEL_1"
37
+ },
38
+ "initializer_factor": 1.0,
39
+ "initializer_range": 0.02,
40
+ "intermediate_size": 3072,
41
+ "is_decoder": false,
42
+ "is_encoder_decoder": false,
43
+ "label2id": {
44
+ "LABEL_0": 0,
45
+ "LABEL_1": 1
46
+ },
47
+ "layer_norm_eps": 1e-05,
48
+ "length_penalty": 1.0,
49
+ "max_length": 20,
50
+ "max_position_embeddings": 77,
51
+ "min_length": 0,
52
+ "model_type": "clip_text_model",
53
+ "no_repeat_ngram_size": 0,
54
+ "num_attention_heads": 12,
55
+ "num_beam_groups": 1,
56
+ "num_beams": 1,
57
+ "num_hidden_layers": 12,
58
+ "num_return_sequences": 1,
59
+ "output_attentions": false,
60
+ "output_hidden_states": false,
61
+ "output_scores": false,
62
+ "pad_token_id": 1,
63
+ "prefix": null,
64
+ "problem_type": null,
65
+ "projection_dim": 512,
66
+ "pruned_heads": {},
67
+ "remove_invalid_values": false,
68
+ "repetition_penalty": 1.0,
69
+ "return_dict": true,
70
+ "return_dict_in_generate": false,
71
+ "sep_token_id": null,
72
+ "suppress_tokens": null,
73
+ "task_specific_params": null,
74
+ "temperature": 1.0,
75
+ "tf_legacy_loss": false,
76
+ "tie_encoder_decoder": false,
77
+ "tie_word_embeddings": true,
78
+ "tokenizer_class": null,
79
+ "top_k": 50,
80
+ "top_p": 1.0,
81
+ "torch_dtype": null,
82
+ "torchscript": false,
83
+ "transformers_version": "4.26.1",
84
+ "typical_p": 1.0,
85
+ "use_bfloat16": false,
86
+ "vocab_size": 49408
87
+ },
88
+ "text_config_dict": {
89
+ "hidden_size": 768,
90
+ "intermediate_size": 3072,
91
+ "num_attention_heads": 12,
92
+ "num_hidden_layers": 12
93
+ },
94
+ "torch_dtype": "float32",
95
+ "transformers_version": null,
96
+ "vision_config": {
97
+ "_name_or_path": "",
98
+ "add_cross_attention": false,
99
+ "architectures": null,
100
+ "attention_dropout": 0.0,
101
+ "bad_words_ids": null,
102
+ "begin_suppress_tokens": null,
103
+ "bos_token_id": null,
104
+ "chunk_size_feed_forward": 0,
105
+ "cross_attention_hidden_size": null,
106
+ "decoder_start_token_id": null,
107
+ "diversity_penalty": 0.0,
108
+ "do_sample": false,
109
+ "dropout": 0.0,
110
+ "early_stopping": false,
111
+ "encoder_no_repeat_ngram_size": 0,
112
+ "eos_token_id": null,
113
+ "exponential_decay_length_penalty": null,
114
+ "finetuning_task": null,
115
+ "forced_bos_token_id": null,
116
+ "forced_eos_token_id": null,
117
+ "hidden_act": "quick_gelu",
118
+ "hidden_size": 1024,
119
+ "id2label": {
120
+ "0": "LABEL_0",
121
+ "1": "LABEL_1"
122
+ },
123
+ "image_size": 224,
124
+ "initializer_factor": 1.0,
125
+ "initializer_range": 0.02,
126
+ "intermediate_size": 4096,
127
+ "is_decoder": false,
128
+ "is_encoder_decoder": false,
129
+ "label2id": {
130
+ "LABEL_0": 0,
131
+ "LABEL_1": 1
132
+ },
133
+ "layer_norm_eps": 1e-05,
134
+ "length_penalty": 1.0,
135
+ "max_length": 20,
136
+ "min_length": 0,
137
+ "model_type": "clip_vision_model",
138
+ "no_repeat_ngram_size": 0,
139
+ "num_attention_heads": 16,
140
+ "num_beam_groups": 1,
141
+ "num_beams": 1,
142
+ "num_channels": 3,
143
+ "num_hidden_layers": 24,
144
+ "num_return_sequences": 1,
145
+ "output_attentions": false,
146
+ "output_hidden_states": false,
147
+ "output_scores": false,
148
+ "pad_token_id": null,
149
+ "patch_size": 14,
150
+ "prefix": null,
151
+ "problem_type": null,
152
+ "projection_dim": 512,
153
+ "pruned_heads": {},
154
+ "remove_invalid_values": false,
155
+ "repetition_penalty": 1.0,
156
+ "return_dict": true,
157
+ "return_dict_in_generate": false,
158
+ "sep_token_id": null,
159
+ "suppress_tokens": null,
160
+ "task_specific_params": null,
161
+ "temperature": 1.0,
162
+ "tf_legacy_loss": false,
163
+ "tie_encoder_decoder": false,
164
+ "tie_word_embeddings": true,
165
+ "tokenizer_class": null,
166
+ "top_k": 50,
167
+ "top_p": 1.0,
168
+ "torch_dtype": null,
169
+ "torchscript": false,
170
+ "transformers_version": "4.26.1",
171
+ "typical_p": 1.0,
172
+ "use_bfloat16": false
173
+ },
174
+ "vision_config_dict": {
175
+ "hidden_size": 1024,
176
+ "intermediate_size": 4096,
177
+ "num_attention_heads": 16,
178
+ "num_hidden_layers": 24,
179
+ "patch_size": 14
180
+ }
181
+ }
kawai-origin/safety_checker/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9d6a233ff6fd5ccb9f76fd99618d73369c52dd3d8222376384d0e601911089e8
3
+ size 1215981830
kawai-origin/scheduler/scheduler_config.json ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_class_name": "DDIMScheduler",
3
+ "_diffusers_version": "0.15.0.dev0",
4
+ "beta_end": 0.012,
5
+ "beta_schedule": "scaled_linear",
6
+ "beta_start": 0.00085,
7
+ "clip_sample": false,
8
+ "clip_sample_range": 1.0,
9
+ "dynamic_thresholding_ratio": 0.995,
10
+ "num_train_timesteps": 1000,
11
+ "prediction_type": "epsilon",
12
+ "sample_max_value": 1.0,
13
+ "set_alpha_to_one": false,
14
+ "steps_offset": 1,
15
+ "thresholding": false,
16
+ "trained_betas": null
17
+ }
kawai-origin/text_encoder/config.json ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "openai/clip-vit-large-patch14",
3
+ "architectures": [
4
+ "CLIPTextModel"
5
+ ],
6
+ "attention_dropout": 0.0,
7
+ "bos_token_id": 0,
8
+ "dropout": 0.0,
9
+ "eos_token_id": 2,
10
+ "hidden_act": "quick_gelu",
11
+ "hidden_size": 768,
12
+ "initializer_factor": 1.0,
13
+ "initializer_range": 0.02,
14
+ "intermediate_size": 3072,
15
+ "layer_norm_eps": 1e-05,
16
+ "max_position_embeddings": 77,
17
+ "model_type": "clip_text_model",
18
+ "num_attention_heads": 12,
19
+ "num_hidden_layers": 12,
20
+ "pad_token_id": 1,
21
+ "projection_dim": 768,
22
+ "torch_dtype": "float32",
23
+ "transformers_version": "4.26.1",
24
+ "vocab_size": 49408
25
+ }
kawai-origin/text_encoder/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:180c897d4c12afb44895fcb22e6789d1164b1fd2cac65907ea5d34202d59f998
3
+ size 492265874
kawai-origin/tokenizer/merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
kawai-origin/tokenizer/special_tokens_map.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<|startoftext|>",
4
+ "lstrip": false,
5
+ "normalized": true,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "<|endoftext|>",
11
+ "lstrip": false,
12
+ "normalized": true,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": "<|endoftext|>",
17
+ "unk_token": {
18
+ "content": "<|endoftext|>",
19
+ "lstrip": false,
20
+ "normalized": true,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ }
24
+ }
kawai-origin/tokenizer/tokenizer_config.json ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "bos_token": {
4
+ "__type": "AddedToken",
5
+ "content": "<|startoftext|>",
6
+ "lstrip": false,
7
+ "normalized": true,
8
+ "rstrip": false,
9
+ "single_word": false
10
+ },
11
+ "do_lower_case": true,
12
+ "eos_token": {
13
+ "__type": "AddedToken",
14
+ "content": "<|endoftext|>",
15
+ "lstrip": false,
16
+ "normalized": true,
17
+ "rstrip": false,
18
+ "single_word": false
19
+ },
20
+ "errors": "replace",
21
+ "model_max_length": 77,
22
+ "name_or_path": "openai/clip-vit-large-patch14",
23
+ "pad_token": "<|endoftext|>",
24
+ "special_tokens_map_file": "./special_tokens_map.json",
25
+ "tokenizer_class": "CLIPTokenizer",
26
+ "unk_token": {
27
+ "__type": "AddedToken",
28
+ "content": "<|endoftext|>",
29
+ "lstrip": false,
30
+ "normalized": true,
31
+ "rstrip": false,
32
+ "single_word": false
33
+ }
34
+ }
kawai-origin/tokenizer/vocab.json ADDED
The diff for this file is too large to render. See raw diff
 
kawai-origin/unet/config.json ADDED
@@ -0,0 +1,50 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_class_name": "UNet2DConditionModel",
3
+ "_diffusers_version": "0.15.0.dev0",
4
+ "act_fn": "silu",
5
+ "attention_head_dim": 8,
6
+ "block_out_channels": [
7
+ 320,
8
+ 640,
9
+ 1280,
10
+ 1280
11
+ ],
12
+ "center_input_sample": false,
13
+ "class_embed_type": null,
14
+ "conv_in_kernel": 3,
15
+ "conv_out_kernel": 3,
16
+ "cross_attention_dim": 768,
17
+ "down_block_types": [
18
+ "CrossAttnDownBlock2D",
19
+ "CrossAttnDownBlock2D",
20
+ "CrossAttnDownBlock2D",
21
+ "DownBlock2D"
22
+ ],
23
+ "downsample_padding": 1,
24
+ "dual_cross_attention": false,
25
+ "flip_sin_to_cos": true,
26
+ "freq_shift": 0,
27
+ "in_channels": 4,
28
+ "layers_per_block": 2,
29
+ "mid_block_scale_factor": 1,
30
+ "mid_block_type": "UNetMidBlock2DCrossAttn",
31
+ "norm_eps": 1e-05,
32
+ "norm_num_groups": 32,
33
+ "num_class_embeds": null,
34
+ "only_cross_attention": false,
35
+ "out_channels": 4,
36
+ "projection_class_embeddings_input_dim": null,
37
+ "resnet_time_scale_shift": "default",
38
+ "sample_size": 64,
39
+ "time_cond_proj_dim": null,
40
+ "time_embedding_type": "positional",
41
+ "timestep_post_act": null,
42
+ "up_block_types": [
43
+ "UpBlock2D",
44
+ "CrossAttnUpBlock2D",
45
+ "CrossAttnUpBlock2D",
46
+ "CrossAttnUpBlock2D"
47
+ ],
48
+ "upcast_attention": false,
49
+ "use_linear_projection": false
50
+ }
kawai-origin/unet/diffusion_pytorch_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d13530a9ff30fcf50fdddd32d7f66efe72058358117e13948f843017a907acee
3
+ size 3438167540
kawai-origin/vae/config.json ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_class_name": "AutoencoderKL",
3
+ "_diffusers_version": "0.15.0.dev0",
4
+ "act_fn": "silu",
5
+ "block_out_channels": [
6
+ 128,
7
+ 256,
8
+ 512,
9
+ 512
10
+ ],
11
+ "down_block_types": [
12
+ "DownEncoderBlock2D",
13
+ "DownEncoderBlock2D",
14
+ "DownEncoderBlock2D",
15
+ "DownEncoderBlock2D"
16
+ ],
17
+ "in_channels": 3,
18
+ "latent_channels": 4,
19
+ "layers_per_block": 2,
20
+ "norm_num_groups": 32,
21
+ "out_channels": 3,
22
+ "sample_size": 512,
23
+ "scaling_factor": 0.18215,
24
+ "up_block_types": [
25
+ "UpDecoderBlock2D",
26
+ "UpDecoderBlock2D",
27
+ "UpDecoderBlock2D",
28
+ "UpDecoderBlock2D"
29
+ ]
30
+ }
kawai-origin/vae/diffusion_pytorch_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a16aadd9501c371b6ef94257a01596e461b13637fd984588175821c77c5a74df
3
+ size 334643276