boris commited on
Commit
182f15a
2 Parent(s): bdaeeba bcac695

Merge pull request #3 from khalidsaifullaah/patch-1

Browse files
Files changed (1) hide show
  1. README.md +17 -0
README.md CHANGED
@@ -1,5 +1,22 @@
1
  ## DALL-E Mini - Generate image from text
2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ## TODO
4
 
5
  * experiment with flax/jax and setup of the TPU instance that we should get shortly
 
1
  ## DALL-E Mini - Generate image from text
2
 
3
+ ## Tentative Strategy of training (proposed by Luke and Suraj)
4
+
5
+ ### Data:
6
+ * [Conceptual 12M](https://github.com/google-research-datasets/conceptual-12m) Dataset (already loaded and preprocessed in TPU VM by Luke).
7
+ * [YFCC100M Subset](https://github.com/openai/CLIP/blob/main/data/yfcc100m.md)
8
+ * [Coneptual Captions 3M](https://github.com/google-research-datasets/conceptual-captions)
9
+
10
+ ### Architecture:
11
+ * Use the Taming Transformers VQ-GAN (with 16384 tokens)
12
+ * Use a seq2seq (language encoder --> image decoder) model with a pretrained non-autoregressive encoder (e.g. BERT) and an autoregressive decoder (like GPT).
13
+
14
+ ### Remaining Architecture Questions:
15
+ * Whether to freeze the text encoder?
16
+ * Whether to finetune the VQ-GAN?
17
+ * Which text encoder to use (e.g. BERT, RoBERTa, etc.)?
18
+ * Hyperparameter choices for the decoder (e.g. positional embedding, initialization, etc.)
19
+
20
  ## TODO
21
 
22
  * experiment with flax/jax and setup of the TPU instance that we should get shortly