gchhablani commited on
Commit
ce75b4a
1 Parent(s): fc3079d
Files changed (2) hide show
  1. sections/logs.md +0 -0
  2. sections/pretraining.md +1 -1
sections/logs.md DELETED
File without changes
sections/pretraining.md CHANGED
@@ -7,4 +7,4 @@ The dataset we use for pre-training is a cleaned version of [Conceptual 12M](htt
7
 
8
  **Model**
9
 
10
- The model is shown in the image below. The `Dummy MLM Head` is actually combined with the MLM head but it never contributes to the MLM loss, hence the name (the predictions on these tokens are ignored). We create a custom model in Flax which integerates the CLIP Vision model inside BERT embeddings. We also use custom configs and modules in order to accomodate for these changes, and allow loading from BERT and CLIP Vision checkpoints. The image is fed to the CLIP Vision encoder and the text is fed to the word-embedding layers of BERT model. We use the `bert-base-multilingual-uncased` and `openai/clip-vit-base-patch32` checkpoints for BERT and CLIP Vision models, respectively. All our code and hyperparameters are available on [GitHub](https://github.com/gchhablani/multilingual-vqa).
 
7
 
8
  **Model**
9
 
10
+ The model is shown in the figure below. The `Dummy MLM Head` is actually combined with the MLM head but it never contributes to the MLM loss, hence the name (the predictions on these tokens are ignored). We create a custom model in Flax which integerates the CLIP Vision model inside BERT embeddings. We also use custom configs and modules in order to accomodate for these changes, and allow loading from BERT and CLIP Vision checkpoints. The image is fed to the CLIP Vision encoder and the text is fed to the word-embedding layers of BERT model. We use the `bert-base-multilingual-uncased` and `openai/clip-vit-base-patch32` checkpoints for BERT and CLIP Vision models, respectively. All our code and hyperparameters are available on [GitHub](https://github.com/gchhablani/multilingual-vqa).