Reza Shirkavand commited on
Commit
38345d7
β€’
1 Parent(s): ab2434b

add model card

Browse files
Files changed (2) hide show
  1. LICENSE +21 -0
  2. README.md +113 -3
LICENSE ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ MIT License
2
+
3
+ Copyright (c) 2023 rezashkv
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
README.md CHANGED
@@ -2,6 +2,116 @@
2
  license: mit
3
  language:
4
  - en
5
- library_name: diffusers
6
- pipeline_tag: text-to-image
7
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  license: mit
3
  language:
4
  - en
5
+
6
+ tags:
7
+ - text-to-image
8
+ - stable-diffusion
9
+ - diffusers
10
+ ---
11
+
12
+
13
+ # APTP: Adaptive Prompt-Tailored Pruning of T2I Diffusion Models
14
+ [![arXiv](https://img.shields.io/badge/Paper-arXiv-red?style=for-the-badge)]()
15
+ [![Github](https://img.shields.io/badge/Gihub-Code-succees?style=for-the-badge&logo=GitHub)](https://github.com/rezashkv/diffusion_pruning)
16
+
17
+ The implementation of the paper ["Not All Prompts Are Made Equal: Prompt-based Pruning of Text-to-Image Diffusion Models"](https://openreview.net/forum?id=ekR510QsYF)
18
+
19
+ ## Abstract
20
+
21
+ Text-to-image (T2I) diffusion models have demonstrated impressive image generation capabilities. Still, their computational intensity prohibits
22
+ resource-constrained organizations from deploying T2I models after fine-tuning them on their internal target data. While pruning
23
+ techniques offer a potential solution to reduce the computational burden of T2I models, static pruning methods use the same pruned
24
+ model for all input prompts, overlooking the varying capacity requirements of different prompts. Dynamic pruning addresses this issue by utilizing
25
+ a separate sub-network for each prompt, but it prevents batch parallelism on GPUs. To overcome these limitations, we introduce
26
+ Adaptive Prompt-Tailored Pruning (APTP), a novel prompt-based pruning method designed for T2I diffusion models. Central to our approach is a
27
+ prompt router model, which learns to determine the required capacity for an input text prompt and routes it to an architecture code, given a
28
+ total desired compute budget for prompts. Each architecture code represents a specialized model tailored to the prompts assigned to it, and the
29
+ number of codes is a hyperparameter. We train the prompt router and architecture codes using contrastive learning, ensuring that similar prompts
30
+ are mapped to nearby codes. Further, we employ optimal transport to prevent the codes from collapsing into a single one. We demonstrate APTP's
31
+ effectiveness by pruning Stable Diffusion (SD) V2.1 using CC3M and COCO as target datasets. APTP outperforms the
32
+ single-model pruning baselines in terms of FID, CLIP, and CMMD scores. Our analysis of the clusters learned by APTP reveals they
33
+ are semantically meaningful. We also show that APTP can automatically discover previously empirically found challenging prompts for SD, e.g., prompts for generating text images, assigning them to higher capacity codes.
34
+
35
+
36
+ <p align="center">
37
+ <img src="assets/fig_1.gif" alt="APTP Overview" width="600" />
38
+ </p>
39
+ <p align="left">
40
+ <em>APTP: We prune a text-to-image diffusion model like Stable Diffusion (left) into a mixture of efficient experts (right) in a prompt-based manner. Our prompt router routes distinct types of prompts to different experts, allowing experts' architectures to be separately specialized by removing layers or channels.</em>
41
+ </p>
42
+
43
+ <p align="center">
44
+ <img src="assets/fig_2.gif" alt="APTP Pruning Scheme" width="600" />
45
+ </p>
46
+ <p align="left">
47
+ <em>APTP pruning scheme. We train the prompt router and the set of architecture codes to prune a T2I diffusion model into a mixture of experts. The prompt router consists of three modules. We use a Sentence Transformer as the prompt encoder to encode the input prompt into a representation z. Then, the architecture predictor transforms z into the architecture embedding e that has the same dimensionality as architecture codes. Finally, the router routes the embedding e into an architecture code a(i). We use optimal transport to evenly distribute the prompts in a training batch among the architecture codes. The architecture code a(i) = (u(i), v(i)) determines pruning the model’s width and depth. We train the prompt router’s parameters and architecture codes in an end-to-end manner using the denoising objective of the pruned model L<sub>DDPM</sub>, distillation loss between the pruned and original models L<sub>distill</sub>, average resource usage for the samples in the batch R, and contrastive objective L<sub>cont</sub>, encouraging embeddings e preserving semantic similarity of the representations z.</em>
48
+ </p>
49
+
50
+
51
+ ### Model Description
52
+
53
+ - **Developed by:** UMD Heng Lab
54
+ - **Model type:** Text-to-Image Diffusion Model
55
+ - **Model Description:** APTP is a pruning scheme for text-to-image diffusion models like Stable Diffusion, resulting in a mixture of efficient experts specialized for different prompt types.
56
+
57
+ ### License
58
+
59
+ APTP is released under the MIT License. Please see the [LICENSE](LICENSE) file for details.
60
+
61
+ ### Model Sources
62
+
63
+ For local or self-hosted use, follow the instructions in the [Github Repository](https://github.com/rezashkv/diffusion_pruning)
64
+
65
+
66
+ ## Training Dataset
67
+
68
+ We used Conceptual Captions and MS-COCO 2014 datasets for training the models. Details for downloading and preparing these datasets are provided in the [Github Repository](https://github.com/rezashkv/diffusion_pruning).
69
+
70
+ ## File Structure
71
+
72
+ ```
73
+ APTP
74
+ β”œβ”€β”€ APTP-Base-CC3M
75
+ β”‚ β”œβ”€β”€ arch0
76
+ β”‚ β”œβ”€β”€ ...
77
+ β”‚ └── arch15
78
+ β”œβ”€β”€ APTP-Small-CC3M
79
+ β”‚ β”œβ”€β”€ arch0
80
+ β”‚ β”œβ”€β”€ ...
81
+ β”‚ └── arch7
82
+ β”œβ”€β”€ APTP-Base-COCO
83
+ β”‚ β”œβ”€β”€ arch0
84
+ β”‚ β”œβ”€β”€ ...
85
+ β”‚ └── arch7
86
+ └── APTP-Small-COCO
87
+ β”œβ”€β”€ arch0
88
+ β”œβ”€β”€ ...
89
+ └── arch7*
90
+ ```
91
+
92
+
93
+ ## Uses
94
+
95
+ This model is designed for academic and research purposes, specifically for exploring the efficiency of text-to-image diffusion models through prompt-based pruning. Potential applications include:
96
+
97
+ 1. **Research:** Researchers can use the model to study prompt-based pruning techniques and their impact on the performance and efficiency of text-to-image generation models.
98
+ 2. **Education:** Educators and students can use this model as a learning tool for understanding advanced concepts in neural network pruning, diffusion models, and prompt engineering.
99
+ 3. **Benchmarking:** The model can be used for benchmarking against other text-to-image generation models to assess the trade-offs between computational efficiency and output quality.
100
+
101
+
102
+ ## Safety
103
+
104
+ When using these models, it is important to consider the following safety and ethical guidelines:
105
+
106
+ 1. **Content Generation:** The model can generate a wide range of images based on text prompts. Users should ensure that the generated content adheres to ethical guidelines and does not produce harmful, offensive, or inappropriate images.
107
+ 2. **Bias and Fairness:** Like other AI models, APTP may exhibit biases present in the training data. Users should be aware of these potential biases and take steps to mitigate their impact, particularly when the model is used in sensitive or critical applications.
108
+ 3. **Data Privacy:** Ensure that any data used with the model complies with data privacy regulations. Avoid using personally identifiable information (PII) or sensitive data without proper consent.
109
+ 4**Responsible Use:** Users are encouraged to use the model responsibly, considering the potential social and ethical implications of their work. This includes avoiding the generation of misleading or false information and respecting the rights and dignity of individuals depicted in generated images.
110
+
111
+ By adhering to these guidelines, users can help ensure the responsible and ethical use of the APTP model.
112
+
113
+ ## Contact
114
+ In case of any questions or issues, please contact the authors of the paper:
115
+
116
+ * [Reza Shirkavand](mailto:rezashkv@umd.edu)
117
+ * [Alireza Ganjdanesh](mailto:aliganj@umd.edu)