patrickvonplaten commited on
Commit
cc66b03
1 Parent(s): acd94cc

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -11
README.md CHANGED
@@ -153,20 +153,14 @@ Stable Diffusion v1-4 is a latent diffusion model which combines an autoencoder
153
  - The non-pooled output of the text encoder is fed into the UNet backbone of the latent diffusion model via cross-attention.
154
  - The loss is a reconstruction objective between the noise that was added to the latent and the prediction made by the UNet.
155
 
156
- We currently provide four checkpoints,
157
- - [`stable-diffusion-v1-1`](https://huggingface.co/CompVis/stable-diffusion-v1-1),
158
- - [`stable-diffusion-v1-2`](https://huggingface.co/CompVis/stable-diffusion-v1-2),
159
- - [`stable-diffusion-v1-3`](https://huggingface.co/CompVis/stable-diffusion-v1-3), and
160
- - [`stable-diffusion-v1-4`](https://huggingface.co/CompVis/stable-diffusion-v1-4).
161
-
162
- The checkpoints were trained as follows:
163
- - `stable-diffusion-v1-1`: 237,000 steps at resolution `256x256` on [laion2B-en](https://huggingface.co/datasets/laion/laion2B-en).
164
  194,000 steps at resolution `512x512` on [laion-high-resolution](https://huggingface.co/datasets/laion/laion-high-resolution) (170M examples from LAION-5B with resolution `>= 1024x1024`).
165
- - `stable-diffusion-v1-2`: Resumed from `stable-diffusion-v1-1`.
166
  515,000 steps at resolution `512x512` on "laion-improved-aesthetics" (a subset of laion2B-en,
167
  filtered to images with an original size `>= 512x512`, estimated aesthetics score `> 5.0`, and an estimated watermark probability `< 0.5`. The watermark estimate is from the LAION-5B metadata, the aesthetics score is estimated using an [improved aesthetics estimator](https://github.com/christophschuhmann/improved-aesthetic-predictor)).
168
- - `stable-diffusion-v1-3`: Resumed from `stable-diffusion-v1-2`. 195,000 steps at resolution `512x512` on "laion-improved-aesthetics" and 10 % dropping of the text-conditioning to improve [classifier-free guidance sampling](https://arxiv.org/abs/2207.12598)
169
- - *`stable-diffusion-v1-4`*: ...
170
 
171
  - **Hardware:** 32 x 8 x A100 GPUs
172
  - **Optimizer:** AdamW
 
153
  - The non-pooled output of the text encoder is fed into the UNet backbone of the latent diffusion model via cross-attention.
154
  - The loss is a reconstruction objective between the noise that was added to the latent and the prediction made by the UNet.
155
 
156
+ We currently provide four checkpoints, which were trained as follows.
157
+ - [`stable-diffusion-v1-1`](https://huggingface.co/CompVis/stable-diffusion-v1-1): 237,000 steps at resolution `256x256` on [laion2B-en](https://huggingface.co/datasets/laion/laion2B-en).
 
 
 
 
 
 
158
  194,000 steps at resolution `512x512` on [laion-high-resolution](https://huggingface.co/datasets/laion/laion-high-resolution) (170M examples from LAION-5B with resolution `>= 1024x1024`).
159
+ - [`stable-diffusion-v1-2`](https://huggingface.co/CompVis/stable-diffusion-v1-2): Resumed from `stable-diffusion-v1-1`.
160
  515,000 steps at resolution `512x512` on "laion-improved-aesthetics" (a subset of laion2B-en,
161
  filtered to images with an original size `>= 512x512`, estimated aesthetics score `> 5.0`, and an estimated watermark probability `< 0.5`. The watermark estimate is from the LAION-5B metadata, the aesthetics score is estimated using an [improved aesthetics estimator](https://github.com/christophschuhmann/improved-aesthetic-predictor)).
162
+ - [`stable-diffusion-v1-3`](https://huggingface.co/CompVis/stable-diffusion-v1-3): Resumed from `stable-diffusion-v1-2`. 195,000 steps at resolution `512x512` on "laion-improved-aesthetics" and 10 % dropping of the text-conditioning to improve [classifier-free guidance sampling](https://arxiv.org/abs/2207.12598)
163
+ - [**`stable-diffusion-v1-4`**](https://huggingface.co/CompVis/stable-diffusion-v1-4) *To-fill-here*
164
 
165
  - **Hardware:** 32 x 8 x A100 GPUs
166
  - **Optimizer:** AdamW