Upload 14 files

Browse files

Files changed (14) hide show

README.md +61 -0
examples/castle.jpeg +0 -0
examples/cat.jpeg +0 -0
examples/cyberpunk.jpeg +0 -0
examples/ds.jpeg +0 -0
examples/gf.jpeg +0 -0
examples/robot.jpeg +0 -0
examples/shiba.jpeg +0 -0
examples/ssh.jpeg +0 -0
examples/waitan.jpeg +0 -0
feature_extractor/preprocessor_config.json +20 -0
model_index.json +33 -0
vae/config.json +30 -0
vae/diffusion_pytorch_model.bin +3 -0

README.md CHANGED Viewed

@@ -1,3 +1,64 @@
 ---
 license: creativeml-openrail-m
 ---

 ---
+language: zh
 license: creativeml-openrail-m
+tags:
+- diffusion
+- zh
+- Chinese
 ---
+# Midu-Stable-Diffusion-2-Chinese-Style-v0.1
+## Brief Introduction
+| ![cyberpunk](examples/cyberpunk.jpeg) | ![shiba](examples/shiba.jpeg) | ![ds](examples/ds.jpeg)         |
+| ------------------------------------- | ----------------------------- | ------------------------------- |
+| ![waitan](examples/waitan.jpeg)       | ![gf](examples/gf.jpeg)       | ![ssh](examples/ssh.jpeg)       |
+| ![cat](examples/cat.jpeg)             | ![robot](examples/robot.jpeg) | ![castle](examples/castle.jpeg) |
+大概是huggingface 社区首个开源的Stable diffusion 2 中文模型。该模型基于stable diffusion V2.1模型，在约500万条的中国风格特挑中文数据上进行微调，数据来源于多个开源数据集如[LAION-5B](https://laion.ai/blog/laion-5b/),  [Noah-Wukong](https://wukong-dataset.github.io/wukong-dataset/), [Zero](https://zero.so.com/)和一些网络数据。
+Probably the first open sourced Chinese Stable Diffusion 2 model. This model is finetuned based on stable diffusion V2.1 with 5M chinese style filtered data. Dataset is composed of several different chinese open source dataset such as [LAION-5B](https://laion.ai/blog/laion-5b/),  [Noah-Wukong](https://wukong-dataset.github.io/wukong-dataset/), [Zero](https://zero.so.com/) and some web data.
+## Model Details
+### 文本编码器
+文本编码器使用冻结参数的[lyua1225/clip-huge-zh-75k-steps-bs4096](https://huggingface.co/lyua1225/clip-huge-zh-75k-steps-bs4096)。
+Text encoder is frozen [lyua1225/clip-huge-zh-75k-steps-bs4096](https://huggingface.co/lyua1225/clip-huge-zh-75k-steps-bs4096) .
+### Unet
+在特挑的500万中文数据集上训练了150K steps，使用指数移动平均值(EMA)做原绘画能力保留，使模型能够在中文风格和原绘画能力之间获得权衡。
+Training on 5M chinese style filtered data for 150k steps. Exponential moving average(EMA) is applied to keep the original Stable Diffusion 2 drawing capability and reach a balance between chinese style and original drawing capability.
+## Usage
+因为使用了customed tokenizer, 所以需要优先加载一下tokenizer
+```py
+# !pip install git+https://github.com/huggingface/accelerate
+import torch
+from diffusers import StableDiffusionPipeline
+torch.backends.cudnn.benchmark = True
+pipe = StableDiffusionPipeline.from_pretrained("IDEA-CCNL/Taiyi-Stable-Diffusion-1B-Chinese-v0.1", torch_dtype=torch.float16)
+pipe.to('cuda')
+prompt = '飞流直下三千尺，油画'
+image = pipe(prompt, guidance_scale=7.5).images[0]
+image.save("飞流.png")
+```

examples/castle.jpeg ADDED Viewed

examples/cat.jpeg ADDED Viewed

examples/cyberpunk.jpeg ADDED Viewed

examples/ds.jpeg ADDED Viewed

examples/gf.jpeg ADDED Viewed

examples/robot.jpeg ADDED Viewed

examples/shiba.jpeg ADDED Viewed

examples/ssh.jpeg ADDED Viewed

examples/waitan.jpeg ADDED Viewed

feature_extractor/preprocessor_config.json ADDED Viewed

	@@ -0,0 +1,20 @@

+{
+  "crop_size": 224,
+  "do_center_crop": true,
+  "do_convert_rgb": true,
+  "do_normalize": true,
+  "do_resize": true,
+  "feature_extractor_type": "CLIPFeatureExtractor",
+  "image_mean": [
+    0.48145466,
+    0.4578275,
+    0.40821073
+  ],
+  "image_std": [
+    0.26862954,
+    0.26130258,
+    0.27577711
+  ],
+  "resample": 3,
+  "size": 224
+}

model_index.json ADDED Viewed

	@@ -0,0 +1,33 @@

+{
+  "_class_name": "StableDiffusionPipeline",
+  "_diffusers_version": "0.9.0",
+  "feature_extractor": [
+    null,
+    null
+  ],
+  "requires_safety_checker": false,
+  "safety_checker": [
+    null,
+    null
+  ],
+  "scheduler": [
+    "diffusers",
+    "DDIMScheduler"
+  ],
+  "text_encoder": [
+    "transformers",
+    "CLIPTextModel"
+  ],
+  "tokenizer": [
+    "transformers",
+    "CLIPTokenizer"
+  ],
+  "unet": [
+    "diffusers",
+    "UNet2DConditionModel"
+  ],
+  "vae": [
+    "diffusers",
+    "AutoencoderKL"
+  ]
+}

vae/config.json ADDED Viewed

	@@ -0,0 +1,30 @@

+{
+  "_class_name": "AutoencoderKL",
+  "_diffusers_version": "0.9.0",
+  "_name_or_path": "/data/pretrained_weights/stable-diffusion-2-1-zh-v0",
+  "act_fn": "silu",
+  "block_out_channels": [
+    128,
+    256,
+    512,
+    512
+  ],
+  "down_block_types": [
+    "DownEncoderBlock2D",
+    "DownEncoderBlock2D",
+    "DownEncoderBlock2D",
+    "DownEncoderBlock2D"
+  ],
+  "in_channels": 3,
+  "latent_channels": 4,
+  "layers_per_block": 2,
+  "norm_num_groups": 32,
+  "out_channels": 3,
+  "sample_size": 768,
+  "up_block_types": [
+    "UpDecoderBlock2D",
+    "UpDecoderBlock2D",
+    "UpDecoderBlock2D",
+    "UpDecoderBlock2D"
+  ]
+}

vae/diffusion_pytorch_model.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:11bc15ceb385823b4adb68bd5bdd7568d0c706c3de5ea9ebcb0b807092fc9030
+size 167407601