Diffusers
Safetensors
kashif HF staff commited on
Commit
9beb89d
·
1 Parent(s): 34836ae

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -9
README.md CHANGED
@@ -4,21 +4,21 @@ license: mit
4
  <img src="https://cdn-uploads.huggingface.co/production/uploads/634cb5eefb80cc6bcaf63c3e/i-DYpDHw8Pwiy7QBKZVR5.jpeg" width=1500>
5
 
6
  ## Würstchen - Overview
7
- Würstchen is diffusion model, whose text-conditional model works in a highly compressed latent space of images. Why is this important? Compressing data can reduce
8
- computational costs for both training and inference by magnitudes. Training on 1024x1024 images, is way more expensive than training at 32x32. Usually, other works make
9
- use of a relatively small compression, in the range of 4x - 8x spatial compression. Würstchen takes this to an extreme. Through it's novel design, we achieve a 42x spatial
10
- compression. This was unseen before, because common methods fail to faithfully reconstruct detailed images after 16x spatial compression already. Würstchen employs a
11
- two-stage compression, what we call Stage A and Stage B. Stage A is a VQGAN and Stage B is a Diffusion Autoencoder (more details can be found in the [paper](https://arxiv.org/abs/2306.00637)).
12
- A third model, Stage C, is learnt in that highly compressed latent space. This training requires fractions of the compute used for current top-performing models, allowing
13
  also cheaper and faster inference.
14
 
15
  ## Würstchen - Prior
16
  The Prior is what we refer to as "Stage C". It is the text-conditional model, operating in the small latent space that Stage A and Stage B encode images into. During
17
- inference it's job is to generate the image latents given text. These image latents are then sent to Stage A & B to decode the latents into pixel space.
18
 
19
  ### Prior - Model - Base
20
  This is the base checkpoint for the Prior (Stage C). This means this is only pretrained and generates mostly standard images. We recommend using the [interpolated model](https://huggingface.co/warp-ai/wuerstchen-prior-model-interpolated),
21
- as this is our best checkpoint for the Prior (Stage C), because it was finetuned on a curated dataset. However, we recommend this checkpoint if you want to finetune Würstchen
22
  on your own large dataset, as the other checkpoints are already biased towards being more artistic. This checkpoint should provide a fairly standard baseline to finetune
23
  from, as long as your dataset is rather large.
24
 
@@ -35,7 +35,7 @@ We also observed that the Prior (Stage C) adapts extremely fast to new resolutio
35
  <img src="https://cdn-uploads.huggingface.co/production/uploads/634cb5eefb80cc6bcaf63c3e/IfVsUDcP15OY-5wyLYKnQ.jpeg" width=1000>
36
 
37
  ## How to run
38
- This pipeline should be run together with https://huggingface.co/warp-diffusion/wuerstchen:
39
 
40
  ```py
41
  import torch
 
4
  <img src="https://cdn-uploads.huggingface.co/production/uploads/634cb5eefb80cc6bcaf63c3e/i-DYpDHw8Pwiy7QBKZVR5.jpeg" width=1500>
5
 
6
  ## Würstchen - Overview
7
+ Würstchen is a diffusion model, whose text-conditional model works in a highly compressed latent space of images. Why is this important? Compressing data can reduce
8
+ computational costs for both training and inference by magnitudes. Training on 1024x1024 images is way more expensive than training on 32x32. Usually, other works make
9
+ use of a relatively small compression, in the range of 4x - 8x spatial compression. Würstchen takes this to an extreme. Through its novel design, we achieve a 42x spatial
10
+ compression. This was unseen before because common methods fail to faithfully reconstruct detailed images after 16x spatial compression. Würstchen employs a
11
+ two-stage compression, what we call Stage A and Stage B. Stage A is a VQGAN, and Stage B is a Diffusion Autoencoder (more details can be found in the [paper](https://arxiv.org/abs/2306.00637)).
12
+ A third model, Stage C, is learned in that highly compressed latent space. This training requires fractions of the compute used for current top-performing models, allowing
13
  also cheaper and faster inference.
14
 
15
  ## Würstchen - Prior
16
  The Prior is what we refer to as "Stage C". It is the text-conditional model, operating in the small latent space that Stage A and Stage B encode images into. During
17
+ inference, its job is to generate the image latents given text. These image latents are then sent to Stages A & B to decode the latents into pixel space.
18
 
19
  ### Prior - Model - Base
20
  This is the base checkpoint for the Prior (Stage C). This means this is only pretrained and generates mostly standard images. We recommend using the [interpolated model](https://huggingface.co/warp-ai/wuerstchen-prior-model-interpolated),
21
+ as this is our best checkpoint for the Prior (Stage C) because it was finetuned on a curated dataset. However, we recommend this checkpoint if you want to finetune Würstchen
22
  on your own large dataset, as the other checkpoints are already biased towards being more artistic. This checkpoint should provide a fairly standard baseline to finetune
23
  from, as long as your dataset is rather large.
24
 
 
35
  <img src="https://cdn-uploads.huggingface.co/production/uploads/634cb5eefb80cc6bcaf63c3e/IfVsUDcP15OY-5wyLYKnQ.jpeg" width=1000>
36
 
37
  ## How to run
38
+ This pipeline should be run together with https://huggingface.co/warp-ai/wuerstchen:
39
 
40
  ```py
41
  import torch