Warlord-K commited on
Commit
ecfa8a6
·
1 Parent(s): 2d1bd50

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +212 -0
README.md ADDED
@@ -0,0 +1,212 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - text-to-image
5
+ - ultra-realistic
6
+ - text-to-image
7
+ - stable-diffusion
8
+ - distilled-model
9
+ - knowledge-distillation
10
+ pinned: true
11
+ datasets:
12
+ - zzliang/GRIT
13
+ - wanng/midjourney-v5-202304-clean
14
+ library_name: diffusers
15
+ ---
16
+
17
+ # SSD-Tiny Model Card
18
+
19
+
20
+ ## 🔥🔥Join our [Discord](https://discord.gg/rF44ueRG) to give feedback on our models and get early access🔥🔥
21
+
22
+ ## Demo
23
+
24
+ Try out the SSD-Tiny model at [Segmind SSD-Tiny]() for ⚡ fastest inference. You can also explore it on [🤗 Spaces](https://huggingface.co/spaces/segmind/SSD-Tiny)
25
+
26
+ ## Model Description
27
+
28
+ The SSD-Tiny Model is a distilled version of the Stable Diffusion XL (SDXL), offering a remarkable **70% reduction in size** and an impressive **80% speedup** while retaining high-quality text-to-image generation capabilities. Trained on diverse datasets, including Grit and Midjourney scrape data, it excels at creating a wide range of visual content based on textual prompts.
29
+
30
+ Employing a knowledge distillation strategy, SSD-Tiny leverages the teachings of several expert models, including SDXL, ZavyChromaXL, and JuggernautXL, to combine their strengths and produce compelling visual outputs.
31
+
32
+ Special thanks to the HF team 🤗, especially [Sayak](https://huggingface.co/sayakpaul), [Patrick](https://github.com/patrickvonplaten), and [Poli](https://huggingface.co/multimodalart), for their collaboration and guidance on this work.
33
+
34
+ ## Image Comparison (SDXL-1.0 vs SSD-Tiny)
35
+
36
+ ## Usage:
37
+ This model can be used via the 🧨 Diffusers library.
38
+
39
+ Make sure to install diffusers by running
40
+ ```bash
41
+ pip install diffusers
42
+ ```
43
+
44
+ In addition, please install `transformers`, `safetensors`, and `accelerate`:
45
+ ```bash
46
+ pip install transformers accelerate safetensors
47
+ ```
48
+
49
+ To use the model, you can run the following:
50
+
51
+ ```python
52
+ from diffusers import StableDiffusionXLPipeline
53
+ import torch
54
+
55
+ pipe = StableDiffusionXLPipeline.from_pretrained("segmind/SSD-Tiny", torch_dtype=torch.float16, use_safetensors=True, variant="fp16")
56
+ pipe.to("cuda")
57
+ # if using torch < 2.0
58
+ # pipe.enable_xformers_memory_efficient_attention()
59
+ prompt = "An astronaut riding a green horse" # Your prompt here
60
+ neg_prompt = "ugly, blurry, poor quality" # Negative prompt here
61
+ image = pipe(prompt=prompt, negative_prompt=neg_prompt).images[0]
62
+ ```
63
+
64
+ ### Please do use negative prompting and a CFG around 9.0 for the best quality!
65
+
66
+ ### Model Description
67
+
68
+ - **Developed by:** [Segmind](https://www.segmind.com/)
69
+ - **Developers:** [Yatharth Gupta](https://huggingface.co/Warlord-K) and [Vishnu Jaddipal](https://huggingface.co/Icar).
70
+ - **Model type:** Diffusion-based text-to-image generative model
71
+ - **License:** Apache 2.0
72
+ - **Distilled From:** [stabilityai/stable-diffusion-xl-base-1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0)
73
+
74
+ ### Key Features
75
+
76
+ - **Text-to-Image Generation:** The SSD-Tiny model excels at generating images from text prompts, enabling a wide range of creative applications.
77
+
78
+ - **Distilled for Speed:** Designed for efficiency, this model offers an impressive 80% speedup, making it suitable for real-time applications and scenarios where rapid image generation is essential.
79
+
80
+ - **Diverse Training Data:** Trained on diverse datasets, the model can handle a variety of textual prompts and generate corresponding images effectively.
81
+
82
+ - **Knowledge Distillation:** By distilling knowledge from multiple expert models, the SSD-Tiny Model combines their strengths and minimizes their limitations, resulting in improved performance.
83
+
84
+ ### Model Architecture
85
+
86
+ The SSD-Tiny Model is a compact version with a remarkable 70% reduction in size compared to the Base SDXL Model.
87
+
88
+ ### Training Info
89
+
90
+ These are the key hyperparameters used during training:
91
+
92
+ - Steps: 540,000
93
+ - Learning rate: 1e-5
94
+ - Batch size: 16
95
+ - Gradient accumulation steps: 8
96
+ - Image resolution: 1024
97
+ - Mixed-precision: fp16
98
+
99
+ ### Speed Comparison
100
+
101
+ SSD-Tiny has demonstrated an impressive 80% speedup compared to the Base SDXL Model. Below is a comparison on an A100 80GB.
102
+
103
+ Below are the speed-up metrics on an RTX 4090 GPU.
104
+
105
+
106
+ ### Model Sources
107
+
108
+ For research and development purposes, the SSD-Tiny Model can be accessed via the Segmind AI platform. For more information and access details, please visit [Segmind](https://www.segmind.com/models/ssd-tiny).
109
+
110
+ ## Uses
111
+
112
+ ### Direct Use
113
+
114
+ The SSD-Tiny Model is suitable for research and practical applications in various domains, including:
115
+
116
+ - **Art and Design:** It can be used to generate artworks, designs, and other creative content, providing inspiration and enhancing the creative process.
117
+
118
+ - **Education:** The model can be applied in educational tools to create visual content for teaching and learning purposes.
119
+
120
+ - **Research:** Researchers can use the model to explore generative models, evaluate its performance, and push the boundaries of text-to-image generation.
121
+
122
+ - **Safe Content Generation:** It offers a safe and controlled way to generate content, reducing the risk of harmful or inappropriate outputs.
123
+
124
+ - **Bias and Limitation Analysis:** Researchers and developers can use the model to probe its limitations and biases, contributing to a better understanding of generative models' behavior.
125
+
126
+ ### Downstream Use
127
+
128
+ The SSD-Tiny Model can also be used directly with the 🧨 Diffusers library training scripts for further training, including:
129
+
130
+ - **[LoRA](https://github.com/huggingface/diffusers/blob/main/examples/text_to_image/train_text_to_image_lora_sdxl.py):**
131
+ ```bash
132
+ export MODEL_NAME="segmind/SSD-Tiny"
133
+ export VAE_NAME="madebyollin/sdxl-vae-fp16-fix"
134
+ export DATASET_NAME="lambdalabs/pokemon-blip-captions"
135
+
136
+ accelerate launch train_text_to_image_lora_sdxl.py \
137
+ --pretrained_model_name_or_path=$MODEL_NAME \
138
+ --pretrained_vae_model_name_or_path=$VAE_NAME \
139
+ --dataset_name=$DATASET_NAME --caption_column="text" \
140
+ --resolution=1024 --random_flip \
141
+ --train_batch_size=1 \
142
+ --num_train_epochs=2 --checkpointing_steps=500 \
143
+ --learning_rate=1e-04 --lr_scheduler="constant" --lr_warmup_steps=0 \
144
+ --mixed_precision="fp16" \
145
+ --seed=42 \
146
+ --output_dir="sd-pokemon-model-lora-tiny" \
147
+ --validation_prompt="cute dragon creature" --report_to="wandb" \
148
+ --push_to_hub
149
+ ```
150
+
151
+ - **[Fine-Tune](https://github.com/huggingface/diffusers/blob/main/examples/text_to_image/train_text_to_image_sdxl.py):**
152
+ ```bash
153
+ export MODEL_NAME="segmind/SSD-Tiny"
154
+ export VAE_NAME="madebyollin/sdxl-vae-fp16-fix"
155
+ export DATASET_NAME="lambdalabs/pokemon-blip-captions"
156
+
157
+ accelerate launch train_text_to_image_sdxl.py \
158
+ --pretrained_model_name_or_path=$MODEL_NAME \
159
+ --pretrained_vae_model_name_or_path=$VAE_NAME \
160
+ --dataset_name=$DATASET_NAME \
161
+ --enable_xformers_memory_efficient_attention \
162
+ --resolution=1024 --center_crop --random_flip \
163
+ --proportion_empty_prompts=0.2 \
164
+ --train_batch_size=1 \
165
+ --gradient_accumulation_steps=4 --gradient_checkpointing \
166
+ --max_train_steps=10000 \
167
+ --use_8bit_adam \
168
+ --learning_rate=1e-06 --lr_scheduler="constant" --lr_warmup_steps=0 \
169
+ --mixed_precision="fp16" \
170
+ --report_to="wandb" \
171
+ --validation_prompt="a cute Sundar Pichai creature" --validation_epochs 5 \
172
+ --checkpointing_steps=5000 \
173
+ --output_dir="ssd-pokemon-model-tiny" \
174
+ --push_to_hub
175
+ ```
176
+
177
+ - **[Dreambooth LoRA](https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/train_dreambooth_lora_sdxl.py):**
178
+ ```bash
179
+ export MODEL_NAME="segmind/SSD-Tiny"
180
+ export INSTANCE_DIR="dog"
181
+ export OUTPUT_DIR="lora-trained-tiny"
182
+ export VAE_PATH="madebyollin/sdxl-vae-fp16-fix"
183
+
184
+ accelerate launch train_dreambooth_lora_sdxl.py \
185
+ --pretrained_model_name_or_path=$MODEL_NAME \
186
+ --instance_data_dir=$INSTANCE_DIR \
187
+ --pretrained_vae_model_name_or_path=$VAE_PATH \
188
+ --output_dir=$OUTPUT_DIR \
189
+ --mixed_precision="fp16" \
190
+ --instance_prompt="a photo of sks dog" \
191
+ --resolution=1024 \
192
+ --train_batch_size=1 \
193
+ --gradient_accumulation_steps=4 \
194
+ --learning_rate=1e-5 \
195
+ --report_to="wandb" \
196
+ --lr_scheduler="constant" \
197
+ --lr_warmup_steps=0 \
198
+ --max_train_steps=500 \
199
+ --validation_prompt="A photo of sks dog in a bucket" \
200
+ --validation_epochs=25 \
201
+ --seed="0" \
202
+ --push_to_hub
203
+ ```
204
+
205
+ ### Out-of-Scope Use
206
+
207
+ The SSD-Tiny Model is not suitable for creating factual or accurate representations of people, events, or real-world information. It is not intended for tasks requiring high precision and accuracy.
208
+
209
+ ## Limitations and Bias
210
+
211
+ **Limitations & Bias:**
212
+ The SSD-Tiny Model faces challenges in achieving absolute photorealism, especially in human depictions. While it may encounter difficulties in incorporating clear text and maintaining the fidelity of complex compositions due to its autoencoding approach, these challenges present opportunities for future enhancements. Importantly, the model's exposure to a diverse dataset, though not a cure-all for ingrained societal and digital biases, represents a foundational step toward more equitable technology. Users are encouraged to interact with this pioneering tool with an understanding of its current limitations, fostering an environment of conscious engagement and anticipation for its continued evolution.