kelseye commited on
Commit
f38ad1e
·
verified ·
1 Parent(s): fa95d4a

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -33,3 +33,22 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ assets/image_1_0_0.png filter=lfs diff=lfs merge=lfs -text
37
+ assets/image_1_1_0.png filter=lfs diff=lfs merge=lfs -text
38
+ assets/image_1_2_0.png filter=lfs diff=lfs merge=lfs -text
39
+ assets/image_1_3_0.png filter=lfs diff=lfs merge=lfs -text
40
+ assets/image_1_4_0.png filter=lfs diff=lfs merge=lfs -text
41
+ assets/image_1_5_0.png filter=lfs diff=lfs merge=lfs -text
42
+ assets/image_1_6_0.png filter=lfs diff=lfs merge=lfs -text
43
+ assets/image_1_7_0.png filter=lfs diff=lfs merge=lfs -text
44
+ assets/image_1_input.png filter=lfs diff=lfs merge=lfs -text
45
+ assets/image_2_0_0.png filter=lfs diff=lfs merge=lfs -text
46
+ assets/image_2_1_0.png filter=lfs diff=lfs merge=lfs -text
47
+ assets/image_2_2_0.png filter=lfs diff=lfs merge=lfs -text
48
+ assets/image_2_3_0.png filter=lfs diff=lfs merge=lfs -text
49
+ assets/image_2_input.png filter=lfs diff=lfs merge=lfs -text
50
+ assets/image_3_0_0.png filter=lfs diff=lfs merge=lfs -text
51
+ assets/image_3_1_0.png filter=lfs diff=lfs merge=lfs -text
52
+ assets/image_3_2_0.png filter=lfs diff=lfs merge=lfs -text
53
+ assets/image_3_3_0.png filter=lfs diff=lfs merge=lfs -text
54
+ assets/image_3_input.png filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,134 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+ # Qwen-Image-Layered
5
+
6
+ ## Model Introduction
7
+
8
+ This model is trained based on [Qwen/Qwen-Image-Layered](https://modelscope.cn/models/Qwen/Qwen-Image-Layered) using the dataset [artplus/PrismLayersPro](https://modelscope.cn/datasets/artplus/PrismLayersPro), enabling text-controlled extraction of segmented image layers.
9
+
10
+ ## Usage Tips
11
+
12
+ * The model architecture has been modified from multi-image output to single-image output, producing only the layer relevant to the textual description.
13
+ * The model was trained exclusively on English text but inherits Chinese language understanding capabilities from the base model.
14
+ * The native training resolution is 1024x1024; however, inference at other resolutions is supported.
15
+ * The model struggles to separate multiple overlapping entities (e.g., the cartoon skeleton and hat in the examples).
16
+ * The model excels at decomposing poster-like images but performs poorly on photographic images, especially those involving complex lighting and shadows.
17
+ * Negative prompts are supported—use them to specify content you want excluded from the output.
18
+
19
+ ## Demo Examples
20
+
21
+ **Some images contain white text on light backgrounds. Users of ModelScope community should click the "☀︎" icon at the top-right corner to switch to dark mode for better visibility.**
22
+
23
+ ### Example 1
24
+
25
+ <div style="display: flex; justify-content: space-between;">
26
+
27
+ <div style="width: 30%;">
28
+
29
+ |Input Image|
30
+ |-|
31
+ |![](./assets/image_1_input.png)|
32
+
33
+ </div>
34
+
35
+ <div style="width: 66%;">
36
+
37
+ |Prompt|Output Image|Prompt|Output Image|
38
+ |-|-|-|-|
39
+ |A solid, uniform color with no distinguishable features or objects|![](./assets/image_1_0_0.png)|Text 'TRICK'|![](./assets/image_1_4_0.png)|
40
+ |Cloud|![](./assets/image_1_1_0.png)|Text 'TRICK OR TREAT'|![](./assets/image_1_3_0.png)|
41
+ |A cartoon skeleton character wearing a purple hat and holding a gift box|![](./assets/image_1_2_0.png)|Text 'TRICK OR'|![](./assets/image_1_7_0.png)|
42
+ |A purple hat and a head|![](./assets/image_1_5_0.png)|A gift box|![](./assets/image_1_6_0.png)|
43
+
44
+ </div>
45
+
46
+ </div>
47
+
48
+ ### Example 2
49
+
50
+ <div style="display: flex; justify-content: space-between;">
51
+
52
+ <div style="width: 30%;">
53
+
54
+ |Input Image|
55
+ |-|
56
+ |![](./assets/image_2_input.png)|
57
+
58
+ </div>
59
+
60
+ <div style="width: 66%;">
61
+
62
+ |Prompt|Output Image|Prompt|Output Image|
63
+ |-|-|-|-|
64
+ |蓝天,白云,一片花园,花园里有五颜六色的花|![](./assets/image_2_0_0.png)|五彩的精致花环|![](./assets/image_2_2_0.png)|
65
+ |少女、花环、小猫|![](./assets/image_2_1_0.png)|少女、小猫|![](./assets/image_2_3_0.png)|
66
+
67
+ </div>
68
+
69
+ </div>
70
+
71
+ ### Example 3
72
+
73
+ <div style="display: flex; justify-content: space-between;">
74
+
75
+ <div style="width: 30%;">
76
+
77
+ |Input Image|
78
+ |-|
79
+ |![](./assets/image_3_input.png)|
80
+
81
+ </div>
82
+
83
+ <div style="width: 66%;">
84
+
85
+ |Prompt|Output Image|Prompt|Output Image|
86
+ |-|-|-|-|
87
+ |一片湛蓝的天空和波涛汹涌的大海|![](./assets/image_3_0_0.png)|文字“向往的生活”|![](./assets/image_3_2_0.png)|
88
+ |一只海鸥|![](./assets/image_3_1_0.png)|文字“生活”|![](./assets/image_3_3_0.png)|
89
+
90
+ </div>
91
+
92
+ </div>
93
+
94
+ ## Inference Code
95
+
96
+ Install DiffSynth-Studio:
97
+
98
+ ```
99
+ git clone https://github.com/modelscope/DiffSynth-Studio.git
100
+ cd DiffSynth-Studio
101
+ pip install -e .
102
+ ```
103
+
104
+ Model Inference:
105
+
106
+ ```python
107
+ from diffsynth.pipelines.qwen_image import QwenImagePipeline, ModelConfig
108
+ from PIL import Image
109
+ import torch, requests
110
+
111
+ pipe = QwenImagePipeline.from_pretrained(
112
+ torch_dtype=torch.bfloat16,
113
+ device="cuda",
114
+ model_configs=[
115
+ ModelConfig(model_id="DiffSynth-Studio/Qwen-Image-Layered-Control", origin_file_pattern="transformer/diffusion_pytorch_model*.safetensors"),
116
+ ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="text_encoder/model*.safetensors"),
117
+ ModelConfig(model_id="Qwen/Qwen-Image-Layered", origin_file_pattern="vae/diffusion_pytorch_model.safetensors"),
118
+ ],
119
+ processor_config=ModelConfig(model_id="Qwen/Qwen-Image-Edit", origin_file_pattern="processor/"),
120
+ )
121
+ prompt = "A cartoon skeleton character wearing a purple hat and holding a gift box"
122
+ input_image = requests.get("https://modelscope.oss-cn-beijing.aliyuncs.com/resource/images/trick_or_treat.png", stream=True).raw
123
+ input_image = Image.open(input_image).convert("RGBA").resize((1024, 1024))
124
+ input_image.save("image_input.png")
125
+ images = pipe(
126
+ prompt,
127
+ seed=0,
128
+ num_inference_steps=30, cfg_scale=4,
129
+ height=1024, width=1024,
130
+ layer_input_image=input_image,
131
+ layer_num=0,
132
+ )
133
+ images[0].save("image.png")
134
+ ```
README_from_modelscope.md ADDED
@@ -0,0 +1,141 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ frameworks: PyTorch
3
+ license: Apache License 2.0
4
+ tags: []
5
+ tasks:
6
+ - text-to-image-synthesis
7
+ base_model:
8
+ - Qwen/Qwen-Image-Layered
9
+ base_model_relation: finetune
10
+ ---
11
+ # Qwen-Image-Layered
12
+
13
+ ## 模型介绍
14
+
15
+ 本模型基于模型 [Qwen/Qwen-Image-Layered](https://modelscope.cn/models/Qwen/Qwen-Image-Layered) 在数据集 [artplus/PrismLayersPro](https://modelscope.cn/datasets/artplus/PrismLayersPro) 上进行了训练,可以通过文本控制拆分的图层内容。
16
+
17
+ ## 使用技巧
18
+
19
+ * 模型结构从多图输出改为了单图输出,仅输出与文本描述相关的图层
20
+ * 模型只用英文文本训练过,但仍从基础模型继承了中文理解能力
21
+ * 模型训练的原生分辨率是1024x1024,支持以其他分辨率进行推理
22
+ * 模型难以拆分“互相遮挡”的多个实体,例如样例中的卡通骷髅头和帽子
23
+ * 模型擅长拆分海报图层,不擅长拆分摄影图像,尤其是存在光影的照片
24
+ * 模型支持负向提示词,可以通过负向提示词描述不希望出现在结果的内容
25
+
26
+ ## 效果展示
27
+
28
+ **部分图片为纯白色文本,魔搭社区用户请点击页面右上角的“☀︎”切换到暗色模式**
29
+
30
+ ### 样例1
31
+
32
+ <div style="display: flex; justify-content: space-between;">
33
+
34
+ <div style="width: 30%;">
35
+
36
+ |输入图|
37
+ |-|
38
+ |![](./assets/image_1_input.png)|
39
+
40
+ </div>
41
+
42
+ <div style="width: 66%;">
43
+
44
+ |提示词|输出图|提示词|输出图|
45
+ |-|-|-|-|
46
+ |A solid, uniform color with no distinguishable features or objects|![](./assets/image_1_0_0.png)|Text 'TRICK'|![](./assets/image_1_4_0.png)|
47
+ |Cloud|![](./assets/image_1_1_0.png)|Text 'TRICK OR TREAT'|![](./assets/image_1_3_0.png)|
48
+ |A cartoon skeleton character wearing a purple hat and holding a gift box|![](./assets/image_1_2_0.png)|Text 'TRICK OR'|![](./assets/image_1_7_0.png)|
49
+ |A purple hat and a head|![](./assets/image_1_5_0.png)|A gift box|![](./assets/image_1_6_0.png)|
50
+
51
+ </div>
52
+
53
+ </div>
54
+
55
+ ### 样例2
56
+
57
+ <div style="display: flex; justify-content: space-between;">
58
+
59
+ <div style="width: 30%;">
60
+
61
+ |输入图|
62
+ |-|
63
+ |![](./assets/image_2_input.png)|
64
+
65
+ </div>
66
+
67
+ <div style="width: 66%;">
68
+
69
+ |提示词|输出图|提示词|输出图|
70
+ |-|-|-|-|
71
+ |蓝天,白云,一片花园,花园里有五颜六色的花|![](./assets/image_2_0_0.png)|五彩的精致花环|![](./assets/image_2_2_0.png)|
72
+ |少女、花环、小猫|![](./assets/image_2_1_0.png)|少女、小猫|![](./assets/image_2_3_0.png)|
73
+
74
+ </div>
75
+
76
+ </div>
77
+
78
+ ### 样例3
79
+
80
+ <div style="display: flex; justify-content: space-between;">
81
+
82
+ <div style="width: 30%;">
83
+
84
+ |输入图|
85
+ |-|
86
+ |![](./assets/image_3_input.png)|
87
+
88
+ </div>
89
+
90
+ <div style="width: 66%;">
91
+
92
+ |提示词|输出图|提示词|输出图|
93
+ |-|-|-|-|
94
+ |一片湛蓝的天空和波涛汹涌的大海|![](./assets/image_3_0_0.png)|文字“向往的生活”|![](./assets/image_3_2_0.png)|
95
+ |一只海鸥|![](./assets/image_3_1_0.png)|文字“生活”|![](./assets/image_3_3_0.png)|
96
+
97
+ </div>
98
+
99
+ </div>
100
+
101
+ ## 推理代码
102
+
103
+ 安装 DiffSynth-Studio:
104
+
105
+ ```
106
+ git clone https://github.com/modelscope/DiffSynth-Studio.git
107
+ cd DiffSynth-Studio
108
+ pip install -e .
109
+ ```
110
+
111
+ 模型推理:
112
+
113
+ ```python
114
+ from diffsynth.pipelines.qwen_image import QwenImagePipeline, ModelConfig
115
+ from PIL import Image
116
+ import torch, requests
117
+
118
+ pipe = QwenImagePipeline.from_pretrained(
119
+ torch_dtype=torch.bfloat16,
120
+ device="cuda",
121
+ model_configs=[
122
+ ModelConfig(model_id="DiffSynth-Studio/Qwen-Image-Layered-Control", origin_file_pattern="transformer/diffusion_pytorch_model*.safetensors"),
123
+ ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="text_encoder/model*.safetensors"),
124
+ ModelConfig(model_id="Qwen/Qwen-Image-Layered", origin_file_pattern="vae/diffusion_pytorch_model.safetensors"),
125
+ ],
126
+ processor_config=ModelConfig(model_id="Qwen/Qwen-Image-Edit", origin_file_pattern="processor/"),
127
+ )
128
+ prompt = "A cartoon skeleton character wearing a purple hat and holding a gift box"
129
+ input_image = requests.get("https://modelscope.oss-cn-beijing.aliyuncs.com/resource/images/trick_or_treat.png", stream=True).raw
130
+ input_image = Image.open(input_image).convert("RGBA").resize((1024, 1024))
131
+ input_image.save("image_input.png")
132
+ images = pipe(
133
+ prompt,
134
+ seed=0,
135
+ num_inference_steps=30, cfg_scale=4,
136
+ height=1024, width=1024,
137
+ layer_input_image=input_image,
138
+ layer_num=0,
139
+ )
140
+ images[0].save("image.png")
141
+ ```
assets/image_1_0_0.png ADDED

Git LFS Details

  • SHA256: 7571c7a59e6a301c2909978baeffa4c2d25aa31103dc026e702e6e0b77f4d545
  • Pointer size: 131 Bytes
  • Size of remote file: 766 kB
assets/image_1_1_0.png ADDED

Git LFS Details

  • SHA256: cf931fc683c3b51aea11d0cc18bcb3e108fea6e157a60f9aa64d3e9316edb67b
  • Pointer size: 131 Bytes
  • Size of remote file: 764 kB
assets/image_1_2_0.png ADDED

Git LFS Details

  • SHA256: e5e1e50a3549c9a88fac681ede16d5682e6bfc52bc584276fef9fd4b1439dda8
  • Pointer size: 131 Bytes
  • Size of remote file: 880 kB
assets/image_1_3_0.png ADDED

Git LFS Details

  • SHA256: 695c67883053681cbde394e0189cfc31c2a45d5c9b44887e9872dca8b4ec20b3
  • Pointer size: 131 Bytes
  • Size of remote file: 720 kB
assets/image_1_4_0.png ADDED

Git LFS Details

  • SHA256: fb02a4888540a023af32cb13c52f8883bc83f436544a3d9dec3c07a9c59578ca
  • Pointer size: 131 Bytes
  • Size of remote file: 650 kB
assets/image_1_5_0.png ADDED

Git LFS Details

  • SHA256: 9cc2f7958c5c27cdefa7309112d831435ac5b05d075bde7b4a6571e6a81e5f40
  • Pointer size: 131 Bytes
  • Size of remote file: 714 kB
assets/image_1_6_0.png ADDED

Git LFS Details

  • SHA256: 8c243e61ce6f592e936013fa33c8825edf544a9ddc31cdf3e65a7fedfc857741
  • Pointer size: 131 Bytes
  • Size of remote file: 637 kB
assets/image_1_7_0.png ADDED

Git LFS Details

  • SHA256: a15ad9e370a58b5e77f608affaf44870888e0081a2294f04119ca98131561ea4
  • Pointer size: 131 Bytes
  • Size of remote file: 660 kB
assets/image_1_input.png ADDED

Git LFS Details

  • SHA256: 0bf0cf15ba21de772f11eb11bf9fa9f62a4d2467347c98559b1d257220bd50ef
  • Pointer size: 131 Bytes
  • Size of remote file: 902 kB
assets/image_2_0_0.png ADDED

Git LFS Details

  • SHA256: f72f561ea8b1a20ab9215ef1285d5a767867d63a79b8384cdcb65ab281e3cca5
  • Pointer size: 132 Bytes
  • Size of remote file: 1.11 MB
assets/image_2_1_0.png ADDED

Git LFS Details

  • SHA256: 21615ea7ff938ba73922c36daac996da4efa97984bfd72f42c4cab73c04e864a
  • Pointer size: 132 Bytes
  • Size of remote file: 1.27 MB
assets/image_2_2_0.png ADDED

Git LFS Details

  • SHA256: f387a8f1646ce99b06156596fa0210fdfbb5b71c349427eb8e848b2722bfe569
  • Pointer size: 131 Bytes
  • Size of remote file: 761 kB
assets/image_2_3_0.png ADDED

Git LFS Details

  • SHA256: 149bc856488fe40d485d93e5788c3ea66ebab22cf0faa5bd5b11e10080602441
  • Pointer size: 132 Bytes
  • Size of remote file: 1.17 MB
assets/image_2_input.png ADDED

Git LFS Details

  • SHA256: ba1980967215c5090e26673dd38805b6d140662a9fff6f4e3fe2422485723c9a
  • Pointer size: 132 Bytes
  • Size of remote file: 1.32 MB
assets/image_3_0_0.png ADDED

Git LFS Details

  • SHA256: bcebe462984c8df120eddc998f7277f3c226dd717d3270b9b0cdba9154d5b65e
  • Pointer size: 132 Bytes
  • Size of remote file: 1.31 MB
assets/image_3_1_0.png ADDED

Git LFS Details

  • SHA256: fac7be288f3c4ead811edc2a388651424a07ac5ce6ef9f278af0861589bf5c01
  • Pointer size: 131 Bytes
  • Size of remote file: 613 kB
assets/image_3_2_0.png ADDED

Git LFS Details

  • SHA256: 168cff1bc58b7ef2e98dee24686ec9cf4923c79910c5728a3c0366307fbe5214
  • Pointer size: 131 Bytes
  • Size of remote file: 671 kB
assets/image_3_3_0.png ADDED

Git LFS Details

  • SHA256: e8e98774b8dd5afad15d12ef7f5895c5b0280391f6f26b4a8ec736356c602e49
  • Pointer size: 131 Bytes
  • Size of remote file: 627 kB
assets/image_3_input.png ADDED

Git LFS Details

  • SHA256: 17af2255d4311cc9a9bf96b3c5650a7754a74ccbb6fc677487f5c16de7264d91
  • Pointer size: 132 Bytes
  • Size of remote file: 1.37 MB
configuration.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"framework":"Pytorch","task":"text-to-image-synthesis"}
transformer/config.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_class_name": "QwenImageTransformer2DModel",
3
+ "_diffusers_version": "0.36.0.dev0",
4
+ "use_additional_t_cond": true,
5
+ "attention_head_dim": 128,
6
+ "axes_dims_rope": [
7
+ 16,
8
+ 56,
9
+ 56
10
+ ],
11
+ "guidance_embeds": false,
12
+ "in_channels": 64,
13
+ "joint_attention_dim": 3584,
14
+ "num_attention_heads": 24,
15
+ "num_layers": 60,
16
+ "out_channels": 16,
17
+ "patch_size": 2,
18
+ "use_layer3d_rope": true,
19
+ "zero_cond_t": false
20
+ }
transformer/diffusion_pytorch_model-00001-of-00005.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e5353f1dbff8445840012bd2aff2fd209034aa42d0ce623a55f3f542036244a2
3
+ size 9973590960
transformer/diffusion_pytorch_model-00002-of-00005.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:957d266a7ccdcc9d3f225c82b0afa831ba5084c851b86934b9e4e9f10163b985
3
+ size 9987326040
transformer/diffusion_pytorch_model-00003-of-00005.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1f0e2bec2869de66f02b53bda77bc11618aba229453be56170209a654ddff0c0
3
+ size 9987307408
transformer/diffusion_pytorch_model-00004-of-00005.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5244cf56dd45667fc8f373d43550bc187909bc48489f380fa3dcbb02901e7dcf
3
+ size 9930685680
transformer/diffusion_pytorch_model-00005-of-00005.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:45ecb944aad539ceaae9e3ba99dc9f2d650ba034cf4b305b0e83ebce0bb7b55c
3
+ size 982130448