patrickvonplaten commited on
Commit
c71cdb2
1 Parent(s): 03f4f17

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -2
README.md CHANGED
@@ -140,12 +140,12 @@ ability of the model to generate content with non-English prompts is significant
140
 
141
  ## Training
142
 
143
- **Training Data**
144
  The model developers used the following dataset for training the model:
145
 
146
  - LAION-2B (en) and subsets thereof (see next section)
147
 
148
- **Training Procedure**
149
  Stable Diffusion v1-4 is a latent diffusion model which combines an autoencoder with a diffusion model that is trained in the latent space of the autoencoder. During training,
150
 
151
  - Images are encoded through an encoder, which turns images into latent representations. The autoencoder uses a relative downsampling factor of 8 and maps images of shape H x W x 3 to latents of shape H/f x W/f x 4
@@ -162,6 +162,8 @@ filtered to images with an original size `>= 512x512`, estimated aesthetics scor
162
  - [`stable-diffusion-v1-3`](https://huggingface.co/CompVis/stable-diffusion-v1-3): Resumed from `stable-diffusion-v1-2`. 195,000 steps at resolution `512x512` on "laion-improved-aesthetics" and 10 % dropping of the text-conditioning to improve [classifier-free guidance sampling](https://arxiv.org/abs/2207.12598)
163
  - [**`stable-diffusion-v1-4`**](https://huggingface.co/CompVis/stable-diffusion-v1-4) *To-fill-here*
164
 
 
 
165
  - **Hardware:** 32 x 8 x A100 GPUs
166
  - **Optimizer:** AdamW
167
  - **Gradient Accumulations**: 2
 
140
 
141
  ## Training
142
 
143
+ ### Training Data
144
  The model developers used the following dataset for training the model:
145
 
146
  - LAION-2B (en) and subsets thereof (see next section)
147
 
148
+ ### Training Procedure
149
  Stable Diffusion v1-4 is a latent diffusion model which combines an autoencoder with a diffusion model that is trained in the latent space of the autoencoder. During training,
150
 
151
  - Images are encoded through an encoder, which turns images into latent representations. The autoencoder uses a relative downsampling factor of 8 and maps images of shape H x W x 3 to latents of shape H/f x W/f x 4
 
162
  - [`stable-diffusion-v1-3`](https://huggingface.co/CompVis/stable-diffusion-v1-3): Resumed from `stable-diffusion-v1-2`. 195,000 steps at resolution `512x512` on "laion-improved-aesthetics" and 10 % dropping of the text-conditioning to improve [classifier-free guidance sampling](https://arxiv.org/abs/2207.12598)
163
  - [**`stable-diffusion-v1-4`**](https://huggingface.co/CompVis/stable-diffusion-v1-4) *To-fill-here*
164
 
165
+ ### Training details
166
+
167
  - **Hardware:** 32 x 8 x A100 GPUs
168
  - **Optimizer:** AdamW
169
  - **Gradient Accumulations**: 2