gchhablani commited on
Commit
53ddc87
β€’
1 Parent(s): 919efff

Fix markdown paths

Browse files
apps/article.py CHANGED
@@ -2,7 +2,7 @@ import streamlit as st
2
  from apps.utils import read_markdown
3
  from streamlit_tensorboard import st_tensorboard
4
  from .utils import Toc
5
- def app(state):
6
  toc = Toc()
7
  st.info("Welcome to our Multilingual-VQA demo. Please use the navigation sidebar to move to our demo, or scroll below to read all about our project. πŸ€—")
8
 
@@ -47,19 +47,19 @@ def app(state):
47
 
48
  toc.header("Conclusion, Future Work, and Social Impact")
49
  toc.subheader("Conclusion")
50
- st.write(read_markdown("conclusion.md"))
51
  toc.subheader("Future Work")
52
- st.write(read_markdown("future_work.md"))
53
  toc.subheader("Social Impact")
54
- st.write(read_markdown("social_impact.md"))
55
 
56
  toc.header("References")
57
  st.write(read_markdown("references.md"))
58
 
59
  toc.header("Checkpoints")
60
- st.write(read_markdown("checkpoints.md"))
61
  toc.subheader("Other Checkpoints")
62
- st.write(read_markdown("other_checkpoints.md"))
63
 
64
  toc.header("Acknowledgements")
65
  st.write(read_markdown("acknowledgements.md"))
 
2
  from apps.utils import read_markdown
3
  from streamlit_tensorboard import st_tensorboard
4
  from .utils import Toc
5
+ def app(state=None):
6
  toc = Toc()
7
  st.info("Welcome to our Multilingual-VQA demo. Please use the navigation sidebar to move to our demo, or scroll below to read all about our project. πŸ€—")
8
 
 
47
 
48
  toc.header("Conclusion, Future Work, and Social Impact")
49
  toc.subheader("Conclusion")
50
+ st.write(read_markdown("conclusion_future_work/conclusion.md"))
51
  toc.subheader("Future Work")
52
+ st.write(read_markdown("conclusion_future_work/future_work.md"))
53
  toc.subheader("Social Impact")
54
+ st.write(read_markdown("conclusion_future_work/social_impact.md"))
55
 
56
  toc.header("References")
57
  st.write(read_markdown("references.md"))
58
 
59
  toc.header("Checkpoints")
60
+ st.write(read_markdown("checkpoints/checkpoints.md"))
61
  toc.subheader("Other Checkpoints")
62
+ st.write(read_markdown("checkpoints/other_checkpoints.md"))
63
 
64
  toc.header("Acknowledgements")
65
  st.write(read_markdown("acknowledgements.md"))
sections/finetuning/data.md CHANGED
@@ -1,3 +1 @@
1
- **Dataset**
2
-
3
  For fine-tuning, we use the [VQA 2.0](https://visualqa.org/) dataset - particularly, the `train` and `validation` sets. We translate all the questions into the four languages specified above using language-specific MarianMT models. This is because MarianMT models return better labels and are faster, hence, are better for fine-tuning. We get 4x the number of examples in each subset.
 
 
 
1
  For fine-tuning, we use the [VQA 2.0](https://visualqa.org/) dataset - particularly, the `train` and `validation` sets. We translate all the questions into the four languages specified above using language-specific MarianMT models. This is because MarianMT models return better labels and are faster, hence, are better for fine-tuning. We get 4x the number of examples in each subset.
sections/finetuning/model.md CHANGED
@@ -1,3 +1 @@
1
- **Model**
2
-
3
  We use the `SequenceClassification` model as reference to create our own sequence classification model. In this, a classification layer is attached on top of the pre-trained BERT model in order to performance multi-class classification. 3129 answer labels are chosen, as is the convention for the English VQA task, which can be found [here](https://github.com/gchhablani/multilingual-vqa/blob/main/answer_mapping.json). These are the same labels used in fine-tuning of the VisualBERT models. The outputs shown here have been translated using the [`mtranslate`](https://github.com/mouuff/mtranslate) Google Translate API library. Then we use various pre-trained checkpoints and train the sequence classification model for various steps.
 
 
 
1
  We use the `SequenceClassification` model as reference to create our own sequence classification model. In this, a classification layer is attached on top of the pre-trained BERT model in order to performance multi-class classification. 3129 answer labels are chosen, as is the convention for the English VQA task, which can be found [here](https://github.com/gchhablani/multilingual-vqa/blob/main/answer_mapping.json). These are the same labels used in fine-tuning of the VisualBERT models. The outputs shown here have been translated using the [`mtranslate`](https://github.com/mouuff/mtranslate) Google Translate API library. Then we use various pre-trained checkpoints and train the sequence classification model for various steps.
sections/pretraining/data.md CHANGED
@@ -1,3 +1 @@
1
- **Dataset**
2
-
3
  The dataset we use for pre-training is a cleaned version of [Conceptual 12M](https://github.com/google-research-datasets/conceptual-12m). The dataset is downloaded and then broken images are removed which gives us about 10M images. Then we use the MBart50 `mbart-large-50-one-to-many-mmt` checkpoint to translate the dataset into four different languages - English, French, German, and Spanish, keeping 2.5 million examples of each language.
 
 
 
1
  The dataset we use for pre-training is a cleaned version of [Conceptual 12M](https://github.com/google-research-datasets/conceptual-12m). The dataset is downloaded and then broken images are removed which gives us about 10M images. Then we use the MBart50 `mbart-large-50-one-to-many-mmt` checkpoint to translate the dataset into four different languages - English, French, German, and Spanish, keeping 2.5 million examples of each language.
sections/pretraining/model.md CHANGED
@@ -1,3 +1 @@
1
- **Model**
2
-
3
  The model is shown in the figure below. The `Dummy MLM Head` is actually combined with the MLM head but it never contributes to the MLM loss, hence the name (the predictions on these tokens are ignored). We create a custom model in Flax which integerates the CLIP Vision model inside BERT embeddings. We also use custom configs and modules in order to accomodate for these changes, and allow loading from BERT and CLIP Vision checkpoints. The image is fed to the CLIP Vision encoder and the text is fed to the word-embedding layers of BERT model. We use the `bert-base-multilingual-uncased` and `openai/clip-vit-base-patch32` checkpoints for BERT and CLIP Vision models, respectively. All our code and hyperparameters are available on [GitHub](https://github.com/gchhablani/multilingual-vqa).
 
 
 
1
  The model is shown in the figure below. The `Dummy MLM Head` is actually combined with the MLM head but it never contributes to the MLM loss, hence the name (the predictions on these tokens are ignored). We create a custom model in Flax which integerates the CLIP Vision model inside BERT embeddings. We also use custom configs and modules in order to accomodate for these changes, and allow loading from BERT and CLIP Vision checkpoints. The image is fed to the CLIP Vision encoder and the text is fed to the word-embedding layers of BERT model. We use the `bert-base-multilingual-uncased` and `openai/clip-vit-base-patch32` checkpoints for BERT and CLIP Vision models, respectively. All our code and hyperparameters are available on [GitHub](https://github.com/gchhablani/multilingual-vqa).