osanseviero HF staff commited on
Commit
9b554f2
1 Parent(s): 7cc474b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -37
README.md CHANGED
@@ -5,41 +5,6 @@ pipeline_tag: text-to-image
5
  inference: false
6
  ---
7
 
8
- ## DALL·E mini - Generate images from text
9
 
10
- <img style="text-align:center; display:block;" src="https://raw.githubusercontent.com/borisdayma/dalle-mini/main/img/logo.png" width="200">
11
-
12
- * [Technical Report](https://wandb.ai/dalle-mini/dalle-mini/reports/DALL-E-mini--Vmlldzo4NjIxODA)
13
- * [Demo](https://huggingface.co/spaces/flax-community/dalle-mini)
14
-
15
- ### Model Description
16
-
17
- This is an attempt to replicate OpenAI's [DALL·E](https://openai.com/blog/dall-e/), a model capable of generating arbitrary images from a text prompt that describes the desired result.
18
-
19
- ![DALL·E mini demo screenshot](img/demo_screenshot.png)
20
-
21
- This model's architecture is a simplification of the original, and leverages previous open source efforts and available pre-trained models. Results have lower quality than OpenAI's, but the model can be trained and used on less demanding hardware. Our training was performed on a single TPU v3-8 for a few days.
22
-
23
- ### Components of the Architecture
24
-
25
- The system relies on the Flax/JAX infrastructure, which are ideal for TPU training. TPUs are not required, both Flax and JAX run very efficiently on GPU backends.
26
-
27
- The main components of the architecture include:
28
-
29
- * An encoder, based on [BART](https://arxiv.org/abs/1910.13461). The encoder transforms a sequence of input text tokens to a sequence of image tokens. The input tokens are extracted from the text prompt by using the model's tokenizer. The image tokens are a fixed-length sequence, and they represent indices in a VQGAN-based pre-trained codebook.
30
-
31
- * A decoder, which converts the image tokens to image pixels. As mentioned above, the decoder is based on a [VQGAN model](https://compvis.github.io/taming-transformers/).
32
-
33
- The model definition we use for the encoder can be downloaded from our [Github repo](https://github.com/borisdayma/dalle-mini). The encoder is represented by the class `CustomFlaxBartForConditionalGeneration`.
34
-
35
- To use the decoder, you need to follow the instructions in our accompanying VQGAN model in the hub, [flax-community/vqgan_f16_16384](https://huggingface.co/flax-community/vqgan_f16_16384).
36
-
37
- ### How to Use
38
-
39
- The easiest way to get familiar with the code and the models is to follow the inference notebook we provide in our [github repo](https://github.com/borisdayma/dalle-mini/blob/main/dev/inference/inference_pipeline.ipynb). For your convenience, you can open it in Google Colaboratory: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/borisdayma/dalle-mini/blob/main/dev/inference/inference_pipeline.ipynb)
40
-
41
- If you just want to test the trained model and see what it comes up with, please visit [our demo](https://huggingface.co/spaces/flax-community/dalle-mini), available in 🤗 Spaces.
42
-
43
- ### Additional Details
44
-
45
- Our [report](https://wandb.ai/dalle-mini/dalle-mini/reports/DALL-E-mini--Vmlldzo4NjIxODA) contains more details about how the model was trained and shows many examples that demonstrate its capabilities.
 
5
  inference: false
6
  ---
7
 
8
+ ## Fork of DALL·E mini - Generate images from text
9
 
10
+ For the original repo, head to https://huggingface.co/flax-community/dalle-mini