tomerkeren42 commited on
Commit
2f5fed5
1 Parent(s): 484a10e

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +164 -0
README.md ADDED
@@ -0,0 +1,164 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ pipeline_tag: text-to-image
3
+ inference: true
4
+ license: openrail++
5
+ language:
6
+ - en
7
+ tags:
8
+ - Deci AI
9
+ - DeciDiffusion
10
+ ---
11
+ # DeciDiffusion 1.0
12
+
13
+ DeciDiffusion 1.0 is an 820 million parameter text-to-image latent diffusion model trained on the LAION-v2 dataset and fine-tuned on the LAION-ART dataset. Advanced training techniques were used to speed up training, improve training performance, and achieve better inference quality.
14
+
15
+ ## Model Details
16
+
17
+ - **Developed by:** Deci
18
+ - **Model type:** Diffusion-based text-to-image generation model
19
+ - **Language(s) (NLP):** English
20
+ - **Code License:** The code in this repository is released under the [Apache 2.0 License](https://huggingface.co/Deci/DeciDiffusion-1.0/blob/main/LICENSE-MODEL)
21
+ - **Weights License:** The weights are released under the [CreativeML Open RAIL++-M License](https://huggingface.co/Deci/DeciDiffusion-1.0/blob/main/LICENSE-WEIGHTS)
22
+
23
+ ### Model Sources
24
+
25
+ - **Blog:** [A technical overview and comparison to Stable Diffusion 1.5](link)
26
+ - **Demo:** [Experience DeciDiffusion in action](link)
27
+ - **Notebook:** [Learn how to use the model](link)
28
+
29
+ ## Model Architecture
30
+
31
+ DeciDiffusion 1.0 is a diffusion-based text-to-image generation model. While it maintains foundational architecture elements from Stable Diffusion, such as the Variational Autoencoder (VAE) and CLIP's pre-trained Text Encoder, DeciDiffusion introduces significant enhancements. The primary innovation is the substitution of U-Net with the Efficient U-Net—a design pioneered by Deci. This novel component streamlines the model by reducing the number of parameters, leading to superior computational efficiency.
32
+
33
+
34
+ ## Training Details
35
+
36
+ ### Training Procedure
37
+
38
+ The model was trained in 4 phases:
39
+
40
+ - **Phase 1:** Trained from scratch 1.28 million steps at resolution 256x256 on a 320 million sample subset of LAION-v2.
41
+ - **Phase 2:** Trained from 870k steps at resolution 512x512 on the same dataset to learn more fine-detailed information.
42
+ - **Phase 3:** Trained 65k steps with EMA, another learning rate scheduler, and more "qualitative" data.
43
+ - **Phase 4:** Fine-tuning on a 2M sample subset of LAION-ART.
44
+
45
+ ### Training Techniques
46
+
47
+ DeciDiffusion 1.0 was trained to be sample efficient, i.e. to produces high-quality results using fewer diffusion timesteps during inference.
48
+ The following training techniques were used to that end:
49
+
50
+ - **[V-prediction](https://arxiv.org/pdf/2202.00512.pdf)**
51
+ - **[Enforcing zero terminal SNR during training](https://arxiv.org/pdf/2305.08891.pdf)**
52
+ - **[Employing a cosine variance schedule](https://arxiv.org/pdf/2102.09672.pdf)**
53
+ - **[Using a Min-SNR loss weighting strategy](https://arxiv.org/abs/2303.09556)**
54
+ - **[Employing Rescale Classifier-Free Guidance during inference](https://arxiv.org/pdf/2305.08891.pdf)**
55
+ - **[Sampling from the last timestep](https://arxiv.org/pdf/2305.08891.pdf)**
56
+ - **Training from 870k steps at resolution 512x512 on the same dataset to learn more fine-detailed information.**
57
+ - **[Utilizing LAMB optimizer with large batch](https://arxiv.org/abs/1904.00962)**
58
+ -
59
+ The following techniques were used to shorten training time:
60
+
61
+ - **Using precomputed VAE and CLIP latents**
62
+ - **Using EMA only in the last phase of training**
63
+
64
+ ### Additional Details
65
+
66
+ - **Hardware:** 8xA100 (80gb), 8xH100 (80gb)
67
+ - **Optimizer:** AdamW/LAMB (phase1/phase2-4)
68
+ - **Batch:** 8192/6144
69
+ - **Learning rate:** 1e-4/5e-3 (phase1/phase2-4)
70
+
71
+ ## Evaluation
72
+
73
+ On average, DeciDiffusion’s generated images after 30 iterations achieve comparable Frechet Inception Distance (FID) scores to those generated by Stable Diffusion 1.5 after 50 iterations.
74
+ However, many recent articles question the reliability of FID scores, warning that FID results [tend to be fragile](https://huggingface.co/docs/diffusers/conceptual/evaluation), that they are [inconsistent with human judgments on MNIST](https://arxiv.org/pdf/1803.07474.pdf) and [subjective evaluation](https://arxiv.org/pdf/2307.01952.pdf), that they are [statistically biased](https://arxiv.org/pdf/1911.07023.pdf), and that they [give better scores](https://arxiv.org/pdf/2001.03653.pdf) to memorization of the dataset than to generalization beyond it.
75
+
76
+ Given this skepticism about FID’s reliability, we chose to assess DeciDiffusion 1.0's sample efficiency by performing a user study against Stable Diffusion 1.5. Our source for image captions was the [PartiPrompts](https://arxiv.org/pdf/2206.10789.pdf) benchmark, which was introduced to compare large text-to-image models on various challenging prompts.
77
+
78
+ For our study we chose 10 random prompts and for each prompt generated 3 images
79
+ by Stable Diffusion 1.5 configured to run for 50 iterations and 3 images by DeciDiffusion configured to run for 30 iterations.
80
+
81
+ We then presented 30 side by side comparisons to 300 random individuals, who voted based on adherence to the prompt and aesthetic value. The results of these votes are illustrated below.
82
+
83
+ According to the results, DeciDiffusion at 30 iterations exhibits an edge in aesthetics, but when it comes to prompt alignment, it’s on par with Stable Diffusion at 50 iterations.
84
+
85
+ ## Runtime Benchmarks
86
+
87
+ The following tables provide an image latency comparison between DeciDiffusion 1.0 and Stable Diffusion 1.5.
88
+
89
+ DeciDiffusion 1.0 vs. Stable Diffusion 1.5 at FP16 precision
90
+ |Inference Tool + Iterations| DeciDiffusion 1.0 on A10 (seconds/image) | Stable Diffusion 1.5 on A10 (seconds/image) |
91
+ |:----------|:----------|:----------|
92
+ | HF 50 Iterations | 2.11 | 2.95 |
93
+ | Infery 50 Iterations | 1.55 |2.08 |
94
+ | HF 35 Iterations | 1.52 |- |
95
+ | Infery 35 Iterations | 1.07 | -|
96
+ | HF 30 Iterations | 1.29 | -|
97
+ | Infery 30 Iterations | 0.98 | - |
98
+
99
+ ## How to Use
100
+
101
+ ```bibtex
102
+ # pip install diffusers transformers torch
103
+
104
+ from diffusers import StableDiffusionPipeline
105
+ import torch
106
+
107
+ device = 'cuda' if torch.cuda.is_available() else 'cpu'
108
+
109
+ checkpoint = "Deci/DeciDiffusion-v1-0"
110
+ pipeline = StableDiffusionPipeline.from_pretrained(checkpoint, custom_pipeline=checkpoint, torch_dtype=torch.float16)
111
+ pipeline.unet = pipeline.unet.from_pretrained(checkpoint, subfolder='flexible_unet', torch_dtype=torch.float16)
112
+
113
+ pipeline = pipeline.to(device)
114
+
115
+ img = pipeline(prompt=['A photo of an astronaut riding a horse on Mars']).images[0]
116
+ ```
117
+
118
+ # Uses
119
+
120
+ ### Misuse, Malicious Use, and Out-of-Scope Use
121
+ The model must not be employed to deliberately produce or spread images that foster hostile or unwelcoming settings for individuals. This encompasses generating visuals that might be predictably upsetting, distressing, or inappropriate, as well as content that perpetuates existing or historical biases.
122
+
123
+ #### Out-of-Scope Use
124
+ The model isn't designed to produce accurate or truthful depictions of people or events. Thus, using it for such purposes exceeds its intended capabilities.
125
+
126
+ #### Misuse and Malicious Use
127
+ Misusing the model to produce content that harms or maligns individuals is strictly discouraged. Such misuses include, but aren't limited to:
128
+
129
+ - Creating offensive, degrading, or damaging portrayals of individuals, their cultures, religions, or surroundings.- Intentionally promoting or propagating discriminatory content or harmful stereotypes.Deliberately endorsing or disseminating prejudiced content or harmful stereotypes.
130
+ - Deliberately endorsing or disseminating prejudiced content or harmful stereotypes.
131
+ - Posing as someone else without their agreement.
132
+ - Generating explicit content without the knowledge or agreement of potential viewers.
133
+ - Distributing copyrighted or licensed content against its usage terms.
134
+ - Sharing modified versions of copyrighted or licensed content in breach of its usage guidelines.
135
+
136
+ ## Limitations and Bias
137
+
138
+ ### Limitations
139
+
140
+ The model has certain limitations and may not function optimally in the following scenarios:
141
+
142
+ - It doesn't produce completely photorealistic images.
143
+ - Rendering legible text is beyond its capability.
144
+ - Complex compositions, like visualizing “A green sphere to the left of a blue square”, are challenging for the model.
145
+ - Generation of faces and human figures may be imprecise.
146
+ - It is primarily optimized for English captions and might not be as effective with other languages.
147
+ - The autoencoding component of the model is lossy.
148
+
149
+ ### Bias
150
+ The remarkable abilities of image generation models can unintentionally amplify societal biases. DeciDiffusion was mainly trained on subsets of LAION-v2, focused on English descriptions. Consequently, non-English communities and cultures might be underrepresented, leading to a bias towards white and western norms. Outputs from non-English prompts are notably less accurate. Given these biases, users should approach DeciDiffusion with discretion, regardless of input.
151
+
152
+
153
+ ## How to Cite
154
+
155
+ Please cite this model using this format.
156
+
157
+ ```bibtex
158
+ @misc{DeciFoundationModels,
159
+ title = {DeciDiffusion 1.0},
160
+ author = {DeciAI Research Team},
161
+ year = {2023}
162
+ url={[https://huggingface.co/deci/decidiffusion-v1-0](https://huggingface.co/deci/decidiffusion-v1-0)},
163
+ }
164
+ ```