File size: 3,051 Bytes
8e01959
dd52bb4
8e01959
dd52bb4
 
 
 
 
 
8e01959
dd52bb4
 
 
8ede60a
dd52bb4
 
 
 
 
 
0cc37b4
dd52bb4
0cc37b4
dd52bb4
 
 
4133b15
dd52bb4
3b96387
dd52bb4
 
 
 
 
4133b15
dd52bb4
0cc37b4
dd52bb4
 
 
 
4133b15
dd52bb4
e17c029
8ede60a
e17c029
dd52bb4
 
 
 
8ede60a
 
 
 
 
 
 
 
 
 
dd52bb4
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
---
language: zh
license: creativeml-openrail-m

tags:

- diffusion
- zh
- Chinese
---



# Chinese-Style-Stable-Diffusion-2-v0.1

| ![cyberpunk](examples/cyberpunk.jpeg) | ![shiba](examples/shiba.jpeg) | ![ds](examples/ds.jpeg)         |
| ------------------------------------- | ----------------------------- | ------------------------------- |
| ![waitan](examples/waitan.jpeg)       | ![gf](examples/gf.jpeg)       | ![ssh](examples/ssh.jpeg)       |
| ![cat](examples/cat.jpeg)             | ![robot](examples/robot.jpeg) | ![castle](examples/castle.jpeg) |

大概是Huggingface 🤗社区首个开源的Stable diffusion 2 中文模型。该模型基于[stable diffusion V2.1](https://huggingface.co/stabilityai/stable-diffusion-2-1)模型,在约500万条的中国风格筛选过的中文数据上进行微调,数据来源于多个开源数据集如[LAION-5B](https://laion.ai/blog/laion-5b/),  [Noah-Wukong](https://wukong-dataset.github.io/wukong-dataset/), [Zero](https://zero.so.com/)和一些网络数据。

Probably the first open sourced Chinese Stable Diffusion 2 model in Huggingface🤗 community. This model is finetuned based on [stable diffusion V2.1](https://huggingface.co/stabilityai/stable-diffusion-2-1) with 5M chinese style filtered data. Dataset is composed of several different chinese open source dataset such as [LAION-5B](https://laion.ai/blog/laion-5b/),  [Noah-Wukong](https://wukong-dataset.github.io/wukong-dataset/), [Zero](https://zero.so.com/) and some web data.



# <u>Model Details</u>

#### Text Encoder

文本编码器使用冻结参数的[lyua1225/clip-huge-zh-75k-steps-bs4096](https://huggingface.co/lyua1225/clip-huge-zh-75k-steps-bs4096)。

Text encoder is frozen [lyua1225/clip-huge-zh-75k-steps-bs4096](https://huggingface.co/lyua1225/clip-huge-zh-75k-steps-bs4096) .

#### Unet

在筛选过的的500万中文数据集上训练了150K steps,使用指数移动平均值(EMA)做原绘画能力保留,使模型能够在中文风格和原绘画能力之间获得权衡。

Training on 5M chinese style filtered data for 150k steps. Exponential moving average(EMA) is applied to keep the original Stable Diffusion 2 drawing capability and reach a balance between chinese style and original drawing capability.


## <u>Usage</u>

因为使用了customized tokenizer, 所以需要优先加载一下tokenizer, 并传入trust_remote_code=True

Customized Tokenizer should be loaded first with 'trust_remote_code=True'.

```py
import torch
from diffusers import StableDiffusionPipeline
from transformers import AutoTokenizer

tokenizer_id = "lyua1225/clip-huge-zh-75k-steps-bs4096"
sd2_id = "Midu/chinese-style-stable-diffusion-2-v0.1"
tokenizer = AutoTokenizer.from_pretrained(tokenizer_id, trust_remote_code=True)
pipe = StableDiffusionPipeline.from_pretrained(sd2_id, torch_dtype=torch.float16, tokenizer=tokenizer)
pipe.to("cuda")

image = pipe("赛博朋克风格的城市街道,8K分辨率,CG渲染", guidance_scale=10, num_inference_steps=20).images[0]
image.save("cyberpunk.jpeg")

```