baofff commited on
Commit
1e22060
1 Parent(s): 973aa86

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -2
README.md CHANGED
@@ -10,11 +10,12 @@ tags:
10
  - generative model
11
  ---
12
 
13
- UniDiffuser is a multi-modal diffusion model with a transformer-based backbone ([U-ViT](https://github.com/baofff/U-ViT)). UniDiffuser is able to perform image, text, text-to-image, image-to-text, and image-text pair generation by setting proper timesteps without additional overhead.
 
14
 
15
 
16
 
17
- The main component of UniDiffuser is [U-ViT](https://github.com/baofff/U-ViT), which parameterizes the joint noise prediction network. Other components perform as encoders and decoders of different modalities, including a pretrained image autoencoder from [Stable Diffusion](https://github.com/CompVis/stable-diffusion), a pretrained [image ViT-B/32 CLIP encoder](https://github.com/openai/CLIP), a pretrained [text ViT-L CLIP encoder](https://huggingface.co/openai/clip-vit-large-patch14), and a [GPT-2](https://github.com/openai/gpt-2) text decoder finetuned by ourselves.
18
 
19
 
20
  We provide two versions of UniDiffuser:
 
10
  - generative model
11
  ---
12
 
13
+ UniDiffuser is a unified diffusion framework to fit all distributions relevant to a set of multi-modal data in one transformer.
14
+ UniDiffuser is able to perform image, text, text-to-image, image-to-text, and image-text pair generation by setting proper timesteps without additional overhead.
15
 
16
 
17
 
18
+ Specifically, UniDiffuser employs a variation of transformer, called [U-ViT](https://github.com/baofff/U-ViT), which parameterizes the joint noise prediction network. Other components perform as encoders and decoders of different modalities, including a pretrained image autoencoder from [Stable Diffusion](https://github.com/CompVis/stable-diffusion), a pretrained [image ViT-B/32 CLIP encoder](https://github.com/openai/CLIP), a pretrained [text ViT-L CLIP encoder](https://huggingface.co/openai/clip-vit-large-patch14), and a [GPT-2](https://github.com/openai/gpt-2) text decoder finetuned by ourselves.
19
 
20
 
21
  We provide two versions of UniDiffuser: