Ojimi commited on Mar 22, 2023

Commit

a21bee3

•

1 Parent(s): 3cf8f3b

Upload Kawai Diffusion

Browse files

Files changed (20) hide show

README.md +45 -68
asset/preview.png +0 -0
kawai-base-sd-v15_pruned-full.safetensors +3 -0
kawai-base-sd-v15_pruned-full.yaml +70 -0
kawai-diffusion-sdv15_ema-only.safetensors +3 -0
kawai-origin/feature_extractor/preprocessor_config.json +28 -0
kawai-origin/model_index.json +33 -0
kawai-origin/safety_checker/config.json +181 -0
kawai-origin/safety_checker/model.safetensors +3 -0
kawai-origin/scheduler/scheduler_config.json +17 -0
kawai-origin/text_encoder/config.json +25 -0
kawai-origin/text_encoder/model.safetensors +3 -0
kawai-origin/tokenizer/merges.txt +0 -0
kawai-origin/tokenizer/special_tokens_map.json +24 -0
kawai-origin/tokenizer/tokenizer_config.json +34 -0
kawai-origin/tokenizer/vocab.json +0 -0
kawai-origin/unet/config.json +50 -0
kawai-origin/unet/diffusion_pytorch_model.safetensors +3 -0
kawai-origin/vae/config.json +30 -0
kawai-origin/vae/diffusion_pytorch_model.safetensors +3 -0

README.md CHANGED Viewed

@@ -1,25 +1,14 @@
----
-license: creativeml-openrail-m
-language:
-- en
-library_name: diffusers
-pipeline_tag: text-to-image
-tags:
-- art
-- pytorch
----
-## Announcement: My new model called Kawai Diffusion SD1.5 has been uploaded to CivitAI, you can check it out here: https://civitai.com/models/21138/kawai-diffusion-sd15
-# Important: since there are already more quality models than mine, subsequent upgrades are unnecessary for an anonymous and uninterested model. Thank you for your interest, my biggest ambition is that we will be able to do Anime, Have fun and see you soon! My big family.
-# Waifumake (●'◡'●) AI Art model.
-![](logo.png)
-A single student training an AI model that generates art.
-## **New model avalable** : [waifumake-full-v2](waifumake-full-v2.safetensors)!
-## What's new in v2:
 - Fix color loss.
-- Increase image quality.
 ## Introduction:
 - It's an AI art model for converting text to images, images to images, inpainting, and outpainting using Stable Diffusion.
@@ -28,68 +17,54 @@ A single student training an AI model that generates art.
 - Create an image from a sketch you created from a pure drawing program. (MS Paint)
 - The model is aimed at everyone and has limitless usage potential.
-## Used:
-- For 🧨 Diffusers Library:
 ```python
 from diffusers import DiffusionPipeline
-pipe = DiffusionPipeline.from_pretrained("Ojimi/waifumake-full")
 pipe = pipe.to("cuda")
 prompt = "1girl, animal ears, long hair, solo, cat ears, choker, bare shoulders, red eyes, fang, looking at viewer, animal ear fluff, upper body, black hair, blush, closed mouth, off shoulder, bangs, bow, collarbone"
 image = pipe(prompt, negative_prompt="lowres, bad anatomy").images[0]
 ```
-- For Web UI by Automatic1111:
-```bash
-#Install Web UI.
-git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui.git
-cd /content/stable-diffusion-webui/
-pip install -qq -r requirements.txt
-pip install -U xformers #Install `xformes` for better performance.
-```
-```bash
-#Download model.
-wget https://huggingface.co/Ojimi/waifumake-full/resolve/main/waifumake-full-v2.safetensors -O /content/stable-diffusion-webui/models/Stable-diffusion/waifumake-full-v2.safetensors
-```
-```bash
-#Run and enjoy ☕.
-cd /content/stable-diffusion-webui
-python launch.py --xformers
 ```
-- Try it in Google Colab [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1D6LNtXrpD2QfUx-d_yztWZVgTiDAyyAT?usp=sharing)
 ## Tips:
 - The `masterpiece` and `best quality` tags are not necessary, as it sometimes leads to contradictory results, but if it is distorted or discolored, add them now.
 - The CGF scale should be 7.5 and the step count 28 for the best quality and best performance.
 - Use a sample photo for your idea. `Interrogate DeepBooru` and change the prompts to suit what you want.
 - You should use it as a supportive tool for creating works of art, and not rely on it completely.
-## Preview: v2 model
-![](preview1.png)
-![](preview2.png)
-![](preview3.png)
-![](preview4.png)
-- Enchance and Upscale using Stable Diffusion - Waifumake model v2 (⚠️Performance warning): I recommend against making the image too big as it can lead to unexpected problems.
-![](preview5.png)
-## Training:
-- **Data**: The model is trained based on a database of various sources from the Internet provided by my friend and images created by another AI.
-- **Schedule**: Euler Ancestral Discrete.
-- **Optimizer**: AdamW, xFormers
-- **Precision**: BF16.
-- **Hardware**: Google Colaboratory Pro - NVIDIA A100 40GB VRAM.
 ## **Limitations:**
 - Loss of detail, errors, bad human-like (six-fingered hand) details, deformation, blurring, and unclear images are inevitable.
-- Complex tasks cannot be handled.
 - ⚠️Content may not be appropriate for all ages: As it is trained on data that includes adult content, the generated images may contain content not suitable for children (depending on your country there will be a specific regulation about it). If you do not want to appear adult content, make sure you have additional safety measures in place, such as adding "nsfw" to the negative prompt.
 - The results generated by the model are considered impressive. But unfortunately, currently, it only supports the English language, to use multilingual, consider using third-party translation programs.
 - The model is trained on the `Danbooru` and `Nai` tagging system, so the long text may result in poor results.
 - My amount of money: 0 USD =((.
-    ![](money-wallet.gif)
 ## **Desires:**
 As it is a version made only by myself and my small associates, the model will not be perfect and may differ from what people expect. Any contributions from everyone will be respected.
@@ -99,27 +74,29 @@ Want to support me? Thank you, please help me make it better. ❤️
 ## Special Thank:
 This wouldn't have happened if they hadn't made a breakthrough.
 - [Runwayml](https://huggingface.co/runwayml/):  Base model.
-- [d8ahazard](https://github.com/d8ahazard/sd_dreambooth_extension) : Dreambooth.
 - [Automatic1111](https://github.com/AUTOMATIC1111/) : Web UI.
 - [Mikubill](https://github.com/Mikubill/): Where my ideas started.
 - Chat-GPT: Help me do crazy things that I thought I would never do.
-- Novel AI: Dataset images. An AI made me thousands of pictures without worrying about copyright or dispute.
 - Danbooru: Help me write the correct tag.
-- My friend and others.
 - And You 🫵❤️
 ## Copyright:
-This license allows anyone to copy, modify, publish, and commercialize the model, but please follow the terms of the CreativeML Open RAIL-M. You can learn more about the CreativeML Open RAIL-M at [here](LICENSE.txt).
-If any part of the model does not comply with the terms of the CreativeML Open RAIL-M, the copyright and other rights of the model will still be valid.
 All AI-generated images are yours, you can do whatever you want, but please obey the laws of your country. We will not be responsible for any problems you cause.
 Don't forget me.
 # Have fun with your waifu! (●'◡'●)
-![](cry.png)
-Like it?

+# Kawai Diffusion (anime-base) v3.0 LTS Big Update (≧∇≦)ﾉ
+See more in CivitAI : https://civitai.com/models/21138/kawai-diffusion-sd15
+![](asset/preview.png)
+## What's new in Kawai v3.0 LTS:
 - Fix color loss.
+- Image quality is greatly enhanced. Thank you my friend.
+- Kawai Diffusion's most powerful ability is "enhance" (img2img). It will make a bad photo look better.
+- True "kawaii"... Haizzzzzzzzz
+- Two versions: the [ema-only](kawai-diffusion-sdv15_ema-only.safetensors) model (5.28GB), and [pruned model](kawai-base-sd-v15_pruned-full.safetensors) (8.49GB). Come on, don't be surprised by it, even I was surprised.
+- Can work on some VAE. But the pruned model does not require any VAE.
 ## Introduction:
 - It's an AI art model for converting text to images, images to images, inpainting, and outpainting using Stable Diffusion.
 - Create an image from a sketch you created from a pure drawing program. (MS Paint)
 - The model is aimed at everyone and has limitless usage potential.
+## Use:
+- For 🧨Diffusers:
 ```python
 from diffusers import DiffusionPipeline
+pipe = DiffusionPipeline.from_pretrained("Ojimi/anime-kawai-diffusion")
 pipe = pipe.to("cuda")
 prompt = "1girl, animal ears, long hair, solo, cat ears, choker, bare shoulders, red eyes, fang, looking at viewer, animal ear fluff, upper body, black hair, blush, closed mouth, off shoulder, bangs, bow, collarbone"
 image = pipe(prompt, negative_prompt="lowres, bad anatomy").images[0]
 ```
+- Try it in Google Colab [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1D6LNtXrpD2QfUx-d_yztWZVgTiDAyyAT?usp=sharing)
+- Chat GPT with Kawai Diffusion (or any model if you like.)
+```code
+Read the following instructions, and if you understand, say "I understand": Command prompt structure: includes descriptions of shape, perspective, posture, and landscape,... Keywords are written briefly in the form of tags. For example "1girl, blonde hair, sitting, dress, red eyes, small breasts, star, night sky, moon"
 ```
 ## Tips:
 - The `masterpiece` and `best quality` tags are not necessary, as it sometimes leads to contradictory results, but if it is distorted or discolored, add them now.
 - The CGF scale should be 7.5 and the step count 28 for the best quality and best performance.
 - Use a sample photo for your idea. `Interrogate DeepBooru` and change the prompts to suit what you want.
 - You should use it as a supportive tool for creating works of art, and not rely on it completely.
+- The Clip skip should be 2.
+## **Training**:
+- **Data**: Created by another AI.
+- **Schedule**: DDIM.
+- **Optimizer**: AdamW.
+- **Precision**: FP32.
+- **Hardware**: Google Colaboratory Pro - NVIDIA A100 40GB VRAM, TESLA V100-SXM2 16GB.
+## **Model Unit Test:**
+This is a program written by my friend to check model quality.
+- Examiner: OpenAI ChatGPT-3.5-Turbo.
+- Test: kawai-anime-sd.
+- Schedule: DPM++ 2M Karras.
+- Steps: 22.
+- Guard: Guard Prompt 1.5.
+- Test Report: [Here](https://civitai.com/gallery/299771?modelId=21138&modelVersionId=27219&infinite=false&returnUrl=%2Fmodels%2F21138%2Fkawai-diffusion-sd15).
 ## **Limitations:**
+- The drawing is hard, not soft.
 - Loss of detail, errors, bad human-like (six-fingered hand) details, deformation, blurring, and unclear images are inevitable.
 - ⚠️Content may not be appropriate for all ages: As it is trained on data that includes adult content, the generated images may contain content not suitable for children (depending on your country there will be a specific regulation about it). If you do not want to appear adult content, make sure you have additional safety measures in place, such as adding "nsfw" to the negative prompt.
 - The results generated by the model are considered impressive. But unfortunately, currently, it only supports the English language, to use multilingual, consider using third-party translation programs.
 - The model is trained on the `Danbooru` and `Nai` tagging system, so the long text may result in poor results.
 - My amount of money: 0 USD =((.
+![](money-wallet.gif)
 ## **Desires:**
 As it is a version made only by myself and my small associates, the model will not be perfect and may differ from what people expect. Any contributions from everyone will be respected.
 ## Special Thank:
 This wouldn't have happened if they hadn't made a breakthrough.
 - [Runwayml](https://huggingface.co/runwayml/):  Base model.
+- [d8ahazard](https://github.com/d8ahazard/.sd_dreambooth_extension) : Dreambooth.
 - [Automatic1111](https://github.com/AUTOMATIC1111/) : Web UI.
 - [Mikubill](https://github.com/Mikubill/): Where my ideas started.
 - Chat-GPT: Help me do crazy things that I thought I would never do.
+- Novel AI, Anything Model, Abyss Orange Model: Dataset images. An AI made me thousands of pictures without worrying about copyright or dispute.
 - Danbooru: Help me write the correct tag.
+- My friend and others: Get quality images.
 - And You 🫵❤️
 ## Copyright:
+This license allows anyone to copy, and modify the model, but please follow the terms of the CreativeML Open RAIL-M. You can learn more about the CreativeML Open RAIL-M [here](https://huggingface.co/spaces/CompVis/stable-diffusion-license).
+If any part of the model does not comply with the terms of the GNU General Public License, the copyright and other rights of the model will still be valid.
 All AI-generated images are yours, you can do whatever you want, but please obey the laws of your country. We will not be responsible for any problems you cause.
+We allow you to merge with another model, but if you share that merge model, don't forget to add me to the credits.
 Don't forget me.
 # Have fun with your waifu! (●'◡'●)
+I have a hero, but I can't say his name and we've never met. But he was the one who laid the foundation for Kawai Diffusion. Although the model is not very popular, I love that hero very much. Thank you for your interest in my model. Thank you very much!
+Like it? Buy me ko-fi: https://ko-fi.com/ojimi (≧∇≦)ﾉ

asset/preview.png ADDED Viewed

kawai-base-sd-v15_pruned-full.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:db4859a71b85711c2c274b4a25e542e35b2f53c4a967abeb4111bea5c1740691
+size 9110913190

kawai-base-sd-v15_pruned-full.yaml ADDED Viewed

	@@ -0,0 +1,70 @@

+model:
+  base_learning_rate: 1.0e-04
+  target: ldm.models.diffusion.ddpm.LatentDiffusion
+  params:
+    linear_start: 0.00085
+    linear_end: 0.0120
+    num_timesteps_cond: 1
+    log_every_t: 200
+    timesteps: 1000
+    first_stage_key: "image"
+    cond_stage_key: "caption"
+    image_size: 64
+    channels: 4
+    cond_stage_trainable: false   # Note: different from the one we trained before
+    conditioning_key: crossattn
+    monitor: val/loss_simple_ema
+    scale_factor: 0.18215
+    use_ema: False
+    scheduler_config: # 10000 warmup steps
+      target: ldm.lr_scheduler.LambdaLinearScheduler
+      params:
+        warm_up_steps: [ 10000 ]
+        cycle_lengths: [ 10000000000000 ] # incredibly large number to prevent corner cases
+        f_start: [ 1.e-6 ]
+        f_max: [ 1. ]
+        f_min: [ 1. ]
+    unet_config:
+      target: ldm.modules.diffusionmodules.openaimodel.UNetModel
+      params:
+        image_size: 32 # unused
+        in_channels: 4
+        out_channels: 4
+        model_channels: 320
+        attention_resolutions: [ 4, 2, 1 ]
+        num_res_blocks: 2
+        channel_mult: [ 1, 2, 4, 4 ]
+        num_heads: 8
+        use_spatial_transformer: True
+        transformer_depth: 1
+        context_dim: 768
+        use_checkpoint: True
+        legacy: False
+    first_stage_config:
+      target: ldm.models.autoencoder.AutoencoderKL
+      params:
+        embed_dim: 4
+        monitor: val/rec_loss
+        ddconfig:
+          double_z: true
+          z_channels: 4
+          resolution: 256
+          in_channels: 3
+          out_ch: 3
+          ch: 128
+          ch_mult:
+            - 1
+            - 2
+            - 4
+            - 4
+          num_res_blocks: 2
+          attn_resolutions: [ ]
+          dropout: 0.0
+        lossconfig:
+          target: torch.nn.Identity
+    cond_stage_config:
+      target: ldm.modules.encoders.modules.FrozenCLIPEmbedder

kawai-diffusion-sdv15_ema-only.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:cb932447733cb1b49afa25488bd2a001b7c9874e3db970dc972bb962c5fa330a
+size 5672745097

kawai-origin/feature_extractor/preprocessor_config.json ADDED Viewed

	@@ -0,0 +1,28 @@

+{
+  "crop_size": {
+    "height": 224,
+    "width": 224
+  },
+  "do_center_crop": true,
+  "do_convert_rgb": true,
+  "do_normalize": true,
+  "do_rescale": true,
+  "do_resize": true,
+  "feature_extractor_type": "CLIPFeatureExtractor",
+  "image_mean": [
+    0.48145466,
+    0.4578275,
+    0.40821073
+  ],
+  "image_processor_type": "CLIPFeatureExtractor",
+  "image_std": [
+    0.26862954,
+    0.26130258,
+    0.27577711
+  ],
+  "resample": 3,
+  "rescale_factor": 0.00392156862745098,
+  "size": {
+    "shortest_edge": 224
+  }
+}

kawai-origin/model_index.json ADDED Viewed

	@@ -0,0 +1,33 @@

+{
+  "_class_name": "StableDiffusionPipeline",
+  "_diffusers_version": "0.15.0.dev0",
+  "feature_extractor": [
+    "transformers",
+    "CLIPFeatureExtractor"
+  ],
+  "requires_safety_checker": true,
+  "safety_checker": [
+    "stable_diffusion",
+    "StableDiffusionSafetyChecker"
+  ],
+  "scheduler": [
+    "diffusers",
+    "DDIMScheduler"
+  ],
+  "text_encoder": [
+    "transformers",
+    "CLIPTextModel"
+  ],
+  "tokenizer": [
+    "transformers",
+    "CLIPTokenizer"
+  ],
+  "unet": [
+    "diffusers",
+    "UNet2DConditionModel"
+  ],
+  "vae": [
+    "diffusers",
+    "AutoencoderKL"
+  ]
+}

kawai-origin/safety_checker/config.json ADDED Viewed

	@@ -0,0 +1,181 @@

+{
+  "_commit_hash": "cb41f3a270d63d454d385fc2e4f571c487c253c5",
+  "_name_or_path": "CompVis/stable-diffusion-safety-checker",
+  "architectures": [
+    "StableDiffusionSafetyChecker"
+  ],
+  "initializer_factor": 1.0,
+  "logit_scale_init_value": 2.6592,
+  "model_type": "clip",
+  "projection_dim": 768,
+  "text_config": {
+    "_name_or_path": "",
+    "add_cross_attention": false,
+    "architectures": null,
+    "attention_dropout": 0.0,
+    "bad_words_ids": null,
+    "begin_suppress_tokens": null,
+    "bos_token_id": 0,
+    "chunk_size_feed_forward": 0,
+    "cross_attention_hidden_size": null,
+    "decoder_start_token_id": null,
+    "diversity_penalty": 0.0,
+    "do_sample": false,
+    "dropout": 0.0,
+    "early_stopping": false,
+    "encoder_no_repeat_ngram_size": 0,
+    "eos_token_id": 2,
+    "exponential_decay_length_penalty": null,
+    "finetuning_task": null,
+    "forced_bos_token_id": null,
+    "forced_eos_token_id": null,
+    "hidden_act": "quick_gelu",
+    "hidden_size": 768,
+    "id2label": {
+      "0": "LABEL_0",
+      "1": "LABEL_1"
+    },
+    "initializer_factor": 1.0,
+    "initializer_range": 0.02,
+    "intermediate_size": 3072,
+    "is_decoder": false,
+    "is_encoder_decoder": false,
+    "label2id": {
+      "LABEL_0": 0,
+      "LABEL_1": 1
+    },
+    "layer_norm_eps": 1e-05,
+    "length_penalty": 1.0,
+    "max_length": 20,
+    "max_position_embeddings": 77,
+    "min_length": 0,
+    "model_type": "clip_text_model",
+    "no_repeat_ngram_size": 0,
+    "num_attention_heads": 12,
+    "num_beam_groups": 1,
+    "num_beams": 1,
+    "num_hidden_layers": 12,
+    "num_return_sequences": 1,
+    "output_attentions": false,
+    "output_hidden_states": false,
+    "output_scores": false,
+    "pad_token_id": 1,
+    "prefix": null,
+    "problem_type": null,
+    "projection_dim": 512,
+    "pruned_heads": {},
+    "remove_invalid_values": false,
+    "repetition_penalty": 1.0,
+    "return_dict": true,
+    "return_dict_in_generate": false,
+    "sep_token_id": null,
+    "suppress_tokens": null,
+    "task_specific_params": null,
+    "temperature": 1.0,
+    "tf_legacy_loss": false,
+    "tie_encoder_decoder": false,
+    "tie_word_embeddings": true,
+    "tokenizer_class": null,
+    "top_k": 50,
+    "top_p": 1.0,
+    "torch_dtype": null,
+    "torchscript": false,
+    "transformers_version": "4.26.1",
+    "typical_p": 1.0,
+    "use_bfloat16": false,
+    "vocab_size": 49408
+  },
+  "text_config_dict": {
+    "hidden_size": 768,
+    "intermediate_size": 3072,
+    "num_attention_heads": 12,
+    "num_hidden_layers": 12
+  },
+  "torch_dtype": "float32",
+  "transformers_version": null,
+  "vision_config": {
+    "_name_or_path": "",
+    "add_cross_attention": false,
+    "architectures": null,
+    "attention_dropout": 0.0,
+    "bad_words_ids": null,
+    "begin_suppress_tokens": null,
+    "bos_token_id": null,
+    "chunk_size_feed_forward": 0,
+    "cross_attention_hidden_size": null,
+    "decoder_start_token_id": null,
+    "diversity_penalty": 0.0,
+    "do_sample": false,
+    "dropout": 0.0,
+    "early_stopping": false,
+    "encoder_no_repeat_ngram_size": 0,
+    "eos_token_id": null,
+    "exponential_decay_length_penalty": null,
+    "finetuning_task": null,
+    "forced_bos_token_id": null,
+    "forced_eos_token_id": null,
+    "hidden_act": "quick_gelu",
+    "hidden_size": 1024,
+    "id2label": {
+      "0": "LABEL_0",
+      "1": "LABEL_1"
+    },
+    "image_size": 224,
+    "initializer_factor": 1.0,
+    "initializer_range": 0.02,
+    "intermediate_size": 4096,
+    "is_decoder": false,
+    "is_encoder_decoder": false,
+    "label2id": {
+      "LABEL_0": 0,
+      "LABEL_1": 1
+    },
+    "layer_norm_eps": 1e-05,
+    "length_penalty": 1.0,
+    "max_length": 20,
+    "min_length": 0,
+    "model_type": "clip_vision_model",
+    "no_repeat_ngram_size": 0,
+    "num_attention_heads": 16,
+    "num_beam_groups": 1,
+    "num_beams": 1,
+    "num_channels": 3,
+    "num_hidden_layers": 24,
+    "num_return_sequences": 1,
+    "output_attentions": false,
+    "output_hidden_states": false,
+    "output_scores": false,
+    "pad_token_id": null,
+    "patch_size": 14,
+    "prefix": null,
+    "problem_type": null,
+    "projection_dim": 512,
+    "pruned_heads": {},
+    "remove_invalid_values": false,
+    "repetition_penalty": 1.0,
+    "return_dict": true,
+    "return_dict_in_generate": false,
+    "sep_token_id": null,
+    "suppress_tokens": null,
+    "task_specific_params": null,
+    "temperature": 1.0,
+    "tf_legacy_loss": false,
+    "tie_encoder_decoder": false,
+    "tie_word_embeddings": true,
+    "tokenizer_class": null,
+    "top_k": 50,
+    "top_p": 1.0,
+    "torch_dtype": null,
+    "torchscript": false,
+    "transformers_version": "4.26.1",
+    "typical_p": 1.0,
+    "use_bfloat16": false
+  },
+  "vision_config_dict": {
+    "hidden_size": 1024,
+    "intermediate_size": 4096,
+    "num_attention_heads": 16,
+    "num_hidden_layers": 24,
+    "patch_size": 14
+  }
+}

kawai-origin/safety_checker/model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9d6a233ff6fd5ccb9f76fd99618d73369c52dd3d8222376384d0e601911089e8
+size 1215981830

kawai-origin/scheduler/scheduler_config.json ADDED Viewed

	@@ -0,0 +1,17 @@

+{
+  "_class_name": "DDIMScheduler",
+  "_diffusers_version": "0.15.0.dev0",
+  "beta_end": 0.012,
+  "beta_schedule": "scaled_linear",
+  "beta_start": 0.00085,
+  "clip_sample": false,
+  "clip_sample_range": 1.0,
+  "dynamic_thresholding_ratio": 0.995,
+  "num_train_timesteps": 1000,
+  "prediction_type": "epsilon",
+  "sample_max_value": 1.0,
+  "set_alpha_to_one": false,
+  "steps_offset": 1,
+  "thresholding": false,
+  "trained_betas": null
+}

kawai-origin/text_encoder/config.json ADDED Viewed

	@@ -0,0 +1,25 @@

+{
+  "_name_or_path": "openai/clip-vit-large-patch14",
+  "architectures": [
+    "CLIPTextModel"
+  ],
+  "attention_dropout": 0.0,
+  "bos_token_id": 0,
+  "dropout": 0.0,
+  "eos_token_id": 2,
+  "hidden_act": "quick_gelu",
+  "hidden_size": 768,
+  "initializer_factor": 1.0,
+  "initializer_range": 0.02,
+  "intermediate_size": 3072,
+  "layer_norm_eps": 1e-05,
+  "max_position_embeddings": 77,
+  "model_type": "clip_text_model",
+  "num_attention_heads": 12,
+  "num_hidden_layers": 12,
+  "pad_token_id": 1,
+  "projection_dim": 768,
+  "torch_dtype": "float32",
+  "transformers_version": "4.26.1",
+  "vocab_size": 49408
+}

kawai-origin/text_encoder/model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:180c897d4c12afb44895fcb22e6789d1164b1fd2cac65907ea5d34202d59f998
+size 492265874

kawai-origin/tokenizer/merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

kawai-origin/tokenizer/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,24 @@

+{
+  "bos_token": {
+    "content": "<|startoftext|>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": "<|endoftext|>",
+  "unk_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  }
+}

kawai-origin/tokenizer/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,34 @@

+{
+  "add_prefix_space": false,
+  "bos_token": {
+    "__type": "AddedToken",
+    "content": "<|startoftext|>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "do_lower_case": true,
+  "eos_token": {
+    "__type": "AddedToken",
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "errors": "replace",
+  "model_max_length": 77,
+  "name_or_path": "openai/clip-vit-large-patch14",
+  "pad_token": "<|endoftext|>",
+  "special_tokens_map_file": "./special_tokens_map.json",
+  "tokenizer_class": "CLIPTokenizer",
+  "unk_token": {
+    "__type": "AddedToken",
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  }
+}

kawai-origin/tokenizer/vocab.json ADDED Viewed

The diff for this file is too large to render. See raw diff

kawai-origin/unet/config.json ADDED Viewed

	@@ -0,0 +1,50 @@

+{
+  "_class_name": "UNet2DConditionModel",
+  "_diffusers_version": "0.15.0.dev0",
+  "act_fn": "silu",
+  "attention_head_dim": 8,
+  "block_out_channels": [
+    320,
+    640,
+    1280,
+    1280
+  ],
+  "center_input_sample": false,
+  "class_embed_type": null,
+  "conv_in_kernel": 3,
+  "conv_out_kernel": 3,
+  "cross_attention_dim": 768,
+  "down_block_types": [
+    "CrossAttnDownBlock2D",
+    "CrossAttnDownBlock2D",
+    "CrossAttnDownBlock2D",
+    "DownBlock2D"
+  ],
+  "downsample_padding": 1,
+  "dual_cross_attention": false,
+  "flip_sin_to_cos": true,
+  "freq_shift": 0,
+  "in_channels": 4,
+  "layers_per_block": 2,
+  "mid_block_scale_factor": 1,
+  "mid_block_type": "UNetMidBlock2DCrossAttn",
+  "norm_eps": 1e-05,
+  "norm_num_groups": 32,
+  "num_class_embeds": null,
+  "only_cross_attention": false,
+  "out_channels": 4,
+  "projection_class_embeddings_input_dim": null,
+  "resnet_time_scale_shift": "default",
+  "sample_size": 64,
+  "time_cond_proj_dim": null,
+  "time_embedding_type": "positional",
+  "timestep_post_act": null,
+  "up_block_types": [
+    "UpBlock2D",
+    "CrossAttnUpBlock2D",
+    "CrossAttnUpBlock2D",
+    "CrossAttnUpBlock2D"
+  ],
+  "upcast_attention": false,
+  "use_linear_projection": false
+}

kawai-origin/unet/diffusion_pytorch_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d13530a9ff30fcf50fdddd32d7f66efe72058358117e13948f843017a907acee
+size 3438167540

kawai-origin/vae/config.json ADDED Viewed

	@@ -0,0 +1,30 @@

+{
+  "_class_name": "AutoencoderKL",
+  "_diffusers_version": "0.15.0.dev0",
+  "act_fn": "silu",
+  "block_out_channels": [
+    128,
+    256,
+    512,
+    512
+  ],
+  "down_block_types": [
+    "DownEncoderBlock2D",
+    "DownEncoderBlock2D",
+    "DownEncoderBlock2D",
+    "DownEncoderBlock2D"
+  ],
+  "in_channels": 3,
+  "latent_channels": 4,
+  "layers_per_block": 2,
+  "norm_num_groups": 32,
+  "out_channels": 3,
+  "sample_size": 512,
+  "scaling_factor": 0.18215,
+  "up_block_types": [
+    "UpDecoderBlock2D",
+    "UpDecoderBlock2D",
+    "UpDecoderBlock2D",
+    "UpDecoderBlock2D"
+  ]
+}

kawai-origin/vae/diffusion_pytorch_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a16aadd9501c371b6ef94257a01596e461b13637fd984588175821c77c5a74df
+size 334643276