Text-to-Image
PyTorch
huggan
diffusion
gigant commited on
Commit
6e45c2b
1 Parent(s): 24a7893

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -2
README.md CHANGED
@@ -27,11 +27,21 @@ model_student.ckpt: The latent diffusion model checkpoint
27
 
28
  #### How to use
29
 
30
- TODO
 
 
 
 
 
 
 
 
31
 
32
  #### Limitations and bias
33
 
34
- TODO
 
 
35
 
36
  ## Training data
37
 
 
27
 
28
  #### How to use
29
 
30
+ You need some dependancies from multiple repositories linked in this repository : [CLOOB latent diffusion](https://github.com/JD-P/cloob-latent-diffusion) :
31
+
32
+ * [CLIP](https://github.com/openai/CLIP/tree/40f5484c1c74edd83cb9cf687c6ab92b28d8b656)
33
+ * [CLOOB](https://github.com/crowsonkb/cloob-training/tree/136ca7dd69a03eeb6ad525da991d5d7083e44055) : the model to encode images and texts in an unified latent space, used for conditionning the latent diffusion.
34
+ * [Latent Diffusion](https://github.com/CompVis/latent-diffusion/tree/f13bf9bf463d95b5a16aeadd2b02abde31f769f8) : latent diffusion model definition
35
+ * [Taming transformers](https://github.com/CompVis/taming-transformers/tree/24268930bf1dce879235a7fddd0b2355b84d7ea6) : a pretrained convolutional VQGAN is used as an autoencoder to go from image space to the latent space in which the diffusion is done.
36
+ * [v-diffusion](https://github.com/crowsonkb/v-diffusion-pytorch/tree/ffabbb1a897541fa2a3d034f397c224489d97b39) : contains some functions for sampling using a diffusion model with text and/or image prompts.
37
+
38
+ An example code to use the model to sample images from a text prompt can be seen in a [Colab Notebook](https://colab.research.google.com/drive/1XGHdO8IAGajnpb-x4aOb-OMYfZf0WDTi?usp=sharing), or directly in the [app source code](https://huggingface.co/spaces/huggan/wikiart-diffusion-mini/blob/main/app.py) for the Gradio demo on [this Space](https://huggingface.co/spaces/huggan/wikiart-diffusion-mini)
39
 
40
  #### Limitations and bias
41
 
42
+ The student latent diffusion model was trained only on images from the WikiArt dataset, but the VQGAN autoencoder, the CLOOB model and the teacher latent diffusion model all come from pretrained checkpoints and were trained on images and texts from the internet.
43
+
44
+ According to the [Latent Diffusion paper](https://arxiv.org/abs/2112.10752): “Deep learning modules tend to reproduce or exacerbate biases that are already present in the data”.
45
 
46
  ## Training data
47