warp-ai
/

wuerstchen-prior-model-base

Diffusers

Safetensors

Model card Files Files and versions Community

kashif HF staff commited on Sep 9, 2023

Commit

9beb89d

1 Parent(s): 34836ae

Update README.md

Browse files

Files changed (1) hide show

README.md +9 -9

README.md CHANGED Viewed

@@ -4,21 +4,21 @@ license: mit
 <img src="https://cdn-uploads.huggingface.co/production/uploads/634cb5eefb80cc6bcaf63c3e/i-DYpDHw8Pwiy7QBKZVR5.jpeg" width=1500>
 ## Würstchen - Overview
-Würstchen is diffusion model, whose text-conditional model works in a highly compressed latent space of images. Why is this important? Compressing data can reduce
-computational costs for both training and inference by magnitudes. Training on 1024x1024 images, is way more expensive than training at 32x32. Usually, other works make
-use of a relatively small compression, in the range of 4x - 8x spatial compression. Würstchen takes this to an extreme. Through it's novel design, we achieve a 42x spatial
-compression. This was unseen before, because common methods fail to faithfully reconstruct detailed images after 16x spatial compression already. Würstchen employs a
-two-stage compression, what we call Stage A and Stage B. Stage A is a VQGAN and Stage B is a Diffusion Autoencoder (more details can be found in the [paper](https://arxiv.org/abs/2306.00637)).
-A third model, Stage C, is learnt in that highly compressed latent space. This training requires fractions of the compute used for current top-performing models, allowing
 also cheaper and faster inference.
 ## Würstchen - Prior
 The Prior is what we refer to as "Stage C". It is the text-conditional model, operating in the small latent space that Stage A and Stage B encode images into. During
-inference it's job is to generate the image latents given text. These image latents are then sent to Stage A & B to decode the latents into pixel space.
 ### Prior - Model - Base
 This is the base checkpoint for the Prior (Stage C). This means this is only pretrained and generates mostly standard images. We recommend using the [interpolated model](https://huggingface.co/warp-ai/wuerstchen-prior-model-interpolated),
-as this is our best checkpoint for the Prior (Stage C), because it was finetuned on a curated dataset. However, we recommend this checkpoint if you want to finetune Würstchen
 on your own large dataset, as the other checkpoints are already biased towards being more artistic. This checkpoint should provide a fairly standard baseline to finetune
 from, as long as your dataset is rather large.
@@ -35,7 +35,7 @@ We also observed that the Prior (Stage C) adapts extremely fast to new resolutio
 <img src="https://cdn-uploads.huggingface.co/production/uploads/634cb5eefb80cc6bcaf63c3e/IfVsUDcP15OY-5wyLYKnQ.jpeg" width=1000>
 ## How to run
-This pipeline should be run together with https://huggingface.co/warp-diffusion/wuerstchen:
 ```py
 import torch

 <img src="https://cdn-uploads.huggingface.co/production/uploads/634cb5eefb80cc6bcaf63c3e/i-DYpDHw8Pwiy7QBKZVR5.jpeg" width=1500>
 ## Würstchen - Overview
+Würstchen is a diffusion model, whose text-conditional model works in a highly compressed latent space of images. Why is this important? Compressing data can reduce
+computational costs for both training and inference by magnitudes. Training on 1024x1024 images is way more expensive than training on 32x32. Usually, other works make
+use of a relatively small compression, in the range of 4x - 8x spatial compression. Würstchen takes this to an extreme. Through its novel design, we achieve a 42x spatial
+compression. This was unseen before because common methods fail to faithfully reconstruct detailed images after 16x spatial compression. Würstchen employs a
+two-stage compression, what we call Stage A and Stage B. Stage A is a VQGAN, and Stage B is a Diffusion Autoencoder (more details can be found in the [paper](https://arxiv.org/abs/2306.00637)).
+A third model, Stage C, is learned in that highly compressed latent space. This training requires fractions of the compute used for current top-performing models, allowing
 also cheaper and faster inference.
 ## Würstchen - Prior
 The Prior is what we refer to as "Stage C". It is the text-conditional model, operating in the small latent space that Stage A and Stage B encode images into. During
+inference, its job is to generate the image latents given text. These image latents are then sent to Stages A & B to decode the latents into pixel space.
 ### Prior - Model - Base
 This is the base checkpoint for the Prior (Stage C). This means this is only pretrained and generates mostly standard images. We recommend using the [interpolated model](https://huggingface.co/warp-ai/wuerstchen-prior-model-interpolated),
+as this is our best checkpoint for the Prior (Stage C) because it was finetuned on a curated dataset. However, we recommend this checkpoint if you want to finetune Würstchen
 on your own large dataset, as the other checkpoints are already biased towards being more artistic. This checkpoint should provide a fairly standard baseline to finetune
 from, as long as your dataset is rather large.
 <img src="https://cdn-uploads.huggingface.co/production/uploads/634cb5eefb80cc6bcaf63c3e/IfVsUDcP15OY-5wyLYKnQ.jpeg" width=1000>
 ## How to run
+This pipeline should be run together with https://huggingface.co/warp-ai/wuerstchen:
 ```py
 import torch