Update README.md
Browse files
README.md
CHANGED
@@ -4,21 +4,21 @@ license: mit
|
|
4 |
<img src="https://cdn-uploads.huggingface.co/production/uploads/634cb5eefb80cc6bcaf63c3e/i-DYpDHw8Pwiy7QBKZVR5.jpeg" width=1500>
|
5 |
|
6 |
## Würstchen - Overview
|
7 |
-
Würstchen is diffusion model, whose text-conditional model works in a highly compressed latent space of images. Why is this important? Compressing data can reduce
|
8 |
-
computational costs for both training and inference by magnitudes. Training on 1024x1024 images
|
9 |
-
use of a relatively small compression, in the range of 4x - 8x spatial compression. Würstchen takes this to an extreme. Through
|
10 |
-
compression. This was unseen before
|
11 |
-
two-stage compression, what we call Stage A and Stage B. Stage A is a VQGAN and Stage B is a Diffusion Autoencoder (more details can be found in the [paper](https://arxiv.org/abs/2306.00637)).
|
12 |
-
A third model, Stage C, is
|
13 |
also cheaper and faster inference.
|
14 |
|
15 |
## Würstchen - Prior
|
16 |
The Prior is what we refer to as "Stage C". It is the text-conditional model, operating in the small latent space that Stage A and Stage B encode images into. During
|
17 |
-
inference
|
18 |
|
19 |
### Prior - Model - Base
|
20 |
This is the base checkpoint for the Prior (Stage C). This means this is only pretrained and generates mostly standard images. We recommend using the [interpolated model](https://huggingface.co/warp-ai/wuerstchen-prior-model-interpolated),
|
21 |
-
as this is our best checkpoint for the Prior (Stage C)
|
22 |
on your own large dataset, as the other checkpoints are already biased towards being more artistic. This checkpoint should provide a fairly standard baseline to finetune
|
23 |
from, as long as your dataset is rather large.
|
24 |
|
@@ -35,7 +35,7 @@ We also observed that the Prior (Stage C) adapts extremely fast to new resolutio
|
|
35 |
<img src="https://cdn-uploads.huggingface.co/production/uploads/634cb5eefb80cc6bcaf63c3e/IfVsUDcP15OY-5wyLYKnQ.jpeg" width=1000>
|
36 |
|
37 |
## How to run
|
38 |
-
This pipeline should be run together with https://huggingface.co/warp-
|
39 |
|
40 |
```py
|
41 |
import torch
|
|
|
4 |
<img src="https://cdn-uploads.huggingface.co/production/uploads/634cb5eefb80cc6bcaf63c3e/i-DYpDHw8Pwiy7QBKZVR5.jpeg" width=1500>
|
5 |
|
6 |
## Würstchen - Overview
|
7 |
+
Würstchen is a diffusion model, whose text-conditional model works in a highly compressed latent space of images. Why is this important? Compressing data can reduce
|
8 |
+
computational costs for both training and inference by magnitudes. Training on 1024x1024 images is way more expensive than training on 32x32. Usually, other works make
|
9 |
+
use of a relatively small compression, in the range of 4x - 8x spatial compression. Würstchen takes this to an extreme. Through its novel design, we achieve a 42x spatial
|
10 |
+
compression. This was unseen before because common methods fail to faithfully reconstruct detailed images after 16x spatial compression. Würstchen employs a
|
11 |
+
two-stage compression, what we call Stage A and Stage B. Stage A is a VQGAN, and Stage B is a Diffusion Autoencoder (more details can be found in the [paper](https://arxiv.org/abs/2306.00637)).
|
12 |
+
A third model, Stage C, is learned in that highly compressed latent space. This training requires fractions of the compute used for current top-performing models, allowing
|
13 |
also cheaper and faster inference.
|
14 |
|
15 |
## Würstchen - Prior
|
16 |
The Prior is what we refer to as "Stage C". It is the text-conditional model, operating in the small latent space that Stage A and Stage B encode images into. During
|
17 |
+
inference, its job is to generate the image latents given text. These image latents are then sent to Stages A & B to decode the latents into pixel space.
|
18 |
|
19 |
### Prior - Model - Base
|
20 |
This is the base checkpoint for the Prior (Stage C). This means this is only pretrained and generates mostly standard images. We recommend using the [interpolated model](https://huggingface.co/warp-ai/wuerstchen-prior-model-interpolated),
|
21 |
+
as this is our best checkpoint for the Prior (Stage C) because it was finetuned on a curated dataset. However, we recommend this checkpoint if you want to finetune Würstchen
|
22 |
on your own large dataset, as the other checkpoints are already biased towards being more artistic. This checkpoint should provide a fairly standard baseline to finetune
|
23 |
from, as long as your dataset is rather large.
|
24 |
|
|
|
35 |
<img src="https://cdn-uploads.huggingface.co/production/uploads/634cb5eefb80cc6bcaf63c3e/IfVsUDcP15OY-5wyLYKnQ.jpeg" width=1000>
|
36 |
|
37 |
## How to run
|
38 |
+
This pipeline should be run together with https://huggingface.co/warp-ai/wuerstchen:
|
39 |
|
40 |
```py
|
41 |
import torch
|