Spaces:

flax-community
/

Multilingual-VQA

Runtime error

gchhablani commited on Jul 24, 2021

Commit

53ddc87

•

1 Parent(s): 919efff

Fix markdown paths

Files changed (5) hide show

apps/article.py CHANGED Viewed

@@ -2,7 +2,7 @@ import streamlit as st
 from apps.utils import read_markdown
 from streamlit_tensorboard import st_tensorboard
 from .utils import Toc
-def app(state):
     toc = Toc()
     st.info("Welcome to our Multilingual-VQA demo. Please use the navigation sidebar to move to our demo, or scroll below to read all about our project. 🤗")
@@ -47,19 +47,19 @@ def app(state):
     toc.header("Conclusion, Future Work, and Social Impact")
     toc.subheader("Conclusion")
-    st.write(read_markdown("conclusion.md"))
     toc.subheader("Future Work")
-    st.write(read_markdown("future_work.md"))
     toc.subheader("Social Impact")
-    st.write(read_markdown("social_impact.md"))
     toc.header("References")
     st.write(read_markdown("references.md"))
     toc.header("Checkpoints")
-    st.write(read_markdown("checkpoints.md"))
     toc.subheader("Other Checkpoints")
-    st.write(read_markdown("other_checkpoints.md"))
     toc.header("Acknowledgements")
     st.write(read_markdown("acknowledgements.md"))

 from apps.utils import read_markdown
 from streamlit_tensorboard import st_tensorboard
 from .utils import Toc
+def app(state=None):
     toc = Toc()
     st.info("Welcome to our Multilingual-VQA demo. Please use the navigation sidebar to move to our demo, or scroll below to read all about our project. 🤗")
     toc.header("Conclusion, Future Work, and Social Impact")
     toc.subheader("Conclusion")
+    st.write(read_markdown("conclusion_future_work/conclusion.md"))
     toc.subheader("Future Work")
+    st.write(read_markdown("conclusion_future_work/future_work.md"))
     toc.subheader("Social Impact")
+    st.write(read_markdown("conclusion_future_work/social_impact.md"))
     toc.header("References")
     st.write(read_markdown("references.md"))
     toc.header("Checkpoints")
+    st.write(read_markdown("checkpoints/checkpoints.md"))
     toc.subheader("Other Checkpoints")
+    st.write(read_markdown("checkpoints/other_checkpoints.md"))
     toc.header("Acknowledgements")
     st.write(read_markdown("acknowledgements.md"))

sections/finetuning/data.md CHANGED Viewed

@@ -1,3 +1 @@
-**Dataset**
 For fine-tuning, we use the [VQA 2.0](https://visualqa.org/) dataset - particularly, the `train` and `validation` sets. We translate all the questions into the four languages specified above using language-specific MarianMT models. This is because MarianMT models return better labels and are faster, hence, are better for fine-tuning. We get 4x the number of examples in each subset.




1	For fine-tuning, we use the [VQA 2.0](https://visualqa.org/) dataset - particularly, the `train` and `validation` sets. We translate all the questions into the four languages specified above using language-specific MarianMT models. This is because MarianMT models return better labels and are faster, hence, are better for fine-tuning. We get 4x the number of examples in each subset.

sections/finetuning/model.md CHANGED Viewed

@@ -1,3 +1 @@
-**Model**
 We use the `SequenceClassification` model as reference to create our own sequence classification model. In this, a classification layer is attached on top of the pre-trained BERT model in order to performance multi-class classification. 3129 answer labels are chosen, as is the convention for the English VQA task, which can be found [here](https://github.com/gchhablani/multilingual-vqa/blob/main/answer_mapping.json). These are the same labels used in fine-tuning of the VisualBERT models. The outputs shown here have been translated using the [`mtranslate`](https://github.com/mouuff/mtranslate) Google Translate API library. Then we use various pre-trained checkpoints and train the sequence classification model for various steps.




1	We use the `SequenceClassification` model as reference to create our own sequence classification model. In this, a classification layer is attached on top of the pre-trained BERT model in order to performance multi-class classification. 3129 answer labels are chosen, as is the convention for the English VQA task, which can be found [here](https://github.com/gchhablani/multilingual-vqa/blob/main/answer_mapping.json). These are the same labels used in fine-tuning of the VisualBERT models. The outputs shown here have been translated using the [`mtranslate`](https://github.com/mouuff/mtranslate) Google Translate API library. Then we use various pre-trained checkpoints and train the sequence classification model for various steps.

sections/pretraining/data.md CHANGED Viewed

@@ -1,3 +1 @@
-**Dataset**
 The dataset we use for pre-training is a cleaned version of [Conceptual 12M](https://github.com/google-research-datasets/conceptual-12m). The dataset is downloaded and then broken images are removed which gives us about 10M images. Then we use the MBart50 `mbart-large-50-one-to-many-mmt` checkpoint to translate the dataset into four different languages - English, French, German, and Spanish, keeping 2.5 million examples of each language.




1	The dataset we use for pre-training is a cleaned version of [Conceptual 12M](https://github.com/google-research-datasets/conceptual-12m). The dataset is downloaded and then broken images are removed which gives us about 10M images. Then we use the MBart50 `mbart-large-50-one-to-many-mmt` checkpoint to translate the dataset into four different languages - English, French, German, and Spanish, keeping 2.5 million examples of each language.

sections/pretraining/model.md CHANGED Viewed

@@ -1,3 +1 @@
-**Model**
 The model is shown in the figure below. The `Dummy MLM Head` is actually combined with the MLM head but it never contributes to the MLM loss, hence the name (the predictions on these tokens are ignored). We create a custom model in Flax which integerates the CLIP Vision model inside BERT embeddings. We also use custom configs and modules in order to accomodate for these changes, and allow loading from BERT and CLIP Vision checkpoints. The image is fed to the CLIP Vision encoder and the text is fed to the word-embedding layers of BERT model. We use the `bert-base-multilingual-uncased` and `openai/clip-vit-base-patch32` checkpoints for BERT and CLIP Vision models, respectively. All our code and hyperparameters are available on [GitHub](https://github.com/gchhablani/multilingual-vqa).




1	The model is shown in the figure below. The `Dummy MLM Head` is actually combined with the MLM head but it never contributes to the MLM loss, hence the name (the predictions on these tokens are ignored). We create a custom model in Flax which integerates the CLIP Vision model inside BERT embeddings. We also use custom configs and modules in order to accomodate for these changes, and allow loading from BERT and CLIP Vision checkpoints. The image is fed to the CLIP Vision encoder and the text is fed to the word-embedding layers of BERT model. We use the `bert-base-multilingual-uncased` and `openai/clip-vit-base-patch32` checkpoints for BERT and CLIP Vision models, respectively. All our code and hyperparameters are available on [GitHub](https://github.com/gchhablani/multilingual-vqa).