kashif HF staff commited on
Commit
55ecd3b
1 Parent(s): e52e889

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -9
README.md CHANGED
@@ -4,21 +4,21 @@ license: mit
4
  <img src="https://cdn-uploads.huggingface.co/production/uploads/634cb5eefb80cc6bcaf63c3e/i-DYpDHw8Pwiy7QBKZVR5.jpeg" width=1500>
5
 
6
  ## Würstchen - Overview
7
- Würstchen is diffusion model, whose text-conditional model works in a highly compressed latent space of images. Why is this important? Compressing data can reduce
8
- computational costs for both training and inference by magnitudes. Training on 1024x1024 images, is way more expensive than training at 32x32. Usually, other works make
9
- use of a relatively small compression, in the range of 4x - 8x spatial compression. Würstchen takes this to an extreme. Through it's novel design, we achieve a 42x spatial
10
- compression. This was unseen before, because common methods fail to faithfully reconstruct detailed images after 16x spatial compression already. Würstchen employs a
11
- two-stage compression, what we call Stage A and Stage B. Stage A is a VQGAN and Stage B is a Diffusion Autoencoder (more details can be found in the [paper](https://arxiv.org/abs/2306.00637)).
12
- A third model, Stage C, is learnt in that highly compressed latent space. This training requires fractions of the compute used for current top-performing models, allowing
13
  also cheaper and faster inference.
14
 
15
  ## Würstchen - Prior
16
  The Prior is what we refer to as "Stage C". It is the text-conditional model, operating in the small latent space that Stage A and Stage B encode images into. During
17
- inference it's job is to generate the image latents given text. These image latents are then sent to Stage A & B to decode the latents into pixel space.
18
 
19
  ### Prior - Model - Interpolated
20
  The interpolated model is our current best Prior (Stage C) checkpoint. It is an interpolation between our [base model](https://huggingface.co/warp-ai/wuerstchen-prior-model-base) and the [finetuned model](https://huggingface.co/warp-ai/wuerstchen-prior-model-finetuned).
21
- We created this interpolation, because the finetuned model became too artistic and often only generates artistic images. The base model however, usually is very photorealistic.
22
  As a result, we combined both by interpolating their weights by 50%, so the middle between the base and finetuned model (`0.5 * base_weights + 0.5 * finetuned_weights`).
23
  You can also interpolate the [base model](https://huggingface.co/warp-ai/wuerstchen-prior-model-base) and the [finetuned model](https://huggingface.co/warp-ai/wuerstchen-prior-model-finetuned)
24
  as you want and maybe find an interpolation that fits your needs better than this checkpoint.
@@ -29,7 +29,7 @@ We also observed that the Prior (Stage C) adapts extremely fast to new resolutio
29
  <img src="https://cdn-uploads.huggingface.co/production/uploads/634cb5eefb80cc6bcaf63c3e/IfVsUDcP15OY-5wyLYKnQ.jpeg" width=1000>
30
 
31
  ## How to run
32
- This pipeline should be run together with https://huggingface.co/warp-diffusion/wuerstchen:
33
 
34
  ```py
35
  import torch
 
4
  <img src="https://cdn-uploads.huggingface.co/production/uploads/634cb5eefb80cc6bcaf63c3e/i-DYpDHw8Pwiy7QBKZVR5.jpeg" width=1500>
5
 
6
  ## Würstchen - Overview
7
+ Würstchen is a diffusion model, whose text-conditional model works in a highly compressed latent space of images. Why is this important? Compressing data can reduce
8
+ computational costs for both training and inference by magnitudes. Training on 1024x1024 images is way more expensive than training on 32x32. Usually, other works make
9
+ use of a relatively small compression, in the range of 4x - 8x spatial compression. Würstchen takes this to an extreme. Through its novel design, we achieve a 42x spatial
10
+ compression. This was unseen before because common methods fail to faithfully reconstruct detailed images after 16x spatial compression. Würstchen employs a
11
+ two-stage compression, what we call Stage A and Stage B. Stage A is a VQGAN, and Stage B is a Diffusion Autoencoder (more details can be found in the [paper](https://arxiv.org/abs/2306.00637)).
12
+ A third model, Stage C, is learned in that highly compressed latent space. This training requires fractions of the compute used for current top-performing models, allowing
13
  also cheaper and faster inference.
14
 
15
  ## Würstchen - Prior
16
  The Prior is what we refer to as "Stage C". It is the text-conditional model, operating in the small latent space that Stage A and Stage B encode images into. During
17
+ inference, its job is to generate the image latents given text. These image latents are then sent to Stages A & B to decode the latents into pixel space.
18
 
19
  ### Prior - Model - Interpolated
20
  The interpolated model is our current best Prior (Stage C) checkpoint. It is an interpolation between our [base model](https://huggingface.co/warp-ai/wuerstchen-prior-model-base) and the [finetuned model](https://huggingface.co/warp-ai/wuerstchen-prior-model-finetuned).
21
+ We created this interpolation because the finetuned model became too artistic and often only generates artistic images. The base model, however, usually is very photorealistic.
22
  As a result, we combined both by interpolating their weights by 50%, so the middle between the base and finetuned model (`0.5 * base_weights + 0.5 * finetuned_weights`).
23
  You can also interpolate the [base model](https://huggingface.co/warp-ai/wuerstchen-prior-model-base) and the [finetuned model](https://huggingface.co/warp-ai/wuerstchen-prior-model-finetuned)
24
  as you want and maybe find an interpolation that fits your needs better than this checkpoint.
 
29
  <img src="https://cdn-uploads.huggingface.co/production/uploads/634cb5eefb80cc6bcaf63c3e/IfVsUDcP15OY-5wyLYKnQ.jpeg" width=1000>
30
 
31
  ## How to run
32
+ This pipeline should be run together with https://huggingface.co/warp-ai/wuerstchen:
33
 
34
  ```py
35
  import torch