Spaces:

flax-community
/

Multilingual-VQA

Runtime error

gchhablani commited on Jul 18, 2021

Commit

627e34d

•

1 Parent(s): 8b842e0

Update application

Files changed (5) hide show

app.py CHANGED Viewed

@@ -57,11 +57,6 @@ st.write("[Gunjan Chhablani](https://huggingface.co/gchhablani), [Bhavitvya Mali
 with st.beta_expander("Usage"):
     st.markdown(read_markdown("usage.md"))
-with st.beta_expander("Method"):
-    st.image("./misc/Multilingual-VQA.png")
-    st.markdown(read_markdown("pretraining.md"))
-    st.markdown(read_markdown("finetuning.md"))
 first_index = 20
 # Init Session State
 if state.image_file is None:
@@ -122,8 +117,12 @@ fig = plotly_express_horizontal_bar_plot(values, translated_labels)
 st.plotly_chart(fig, use_container_width = True)
-st.write(read_markdown("about.md"))
 st.write(read_markdown("caveats.md"))
 st.write(read_markdown("challenges.md"))
 st.write(read_markdown("social_impact.md"))
 st.write(read_markdown("references.md"))

 with st.beta_expander("Usage"):
     st.markdown(read_markdown("usage.md"))
 first_index = 20
 # Init Session State
 if state.image_file is None:
 st.plotly_chart(fig, use_container_width = True)
+st.write(read_markdown("abstract.md"))
 st.write(read_markdown("caveats.md"))
+st.write("# Methodology")
+st.image("./misc/Multilingual-VQA.png", caption="Masked LM model for Image-text Pretraining.")
+st.markdown(read_markdown("pretraining.md"))
+st.markdown(read_markdown("finetuning.md"))
 st.write(read_markdown("challenges.md"))
 st.write(read_markdown("social_impact.md"))
 st.write(read_markdown("references.md"))

sections/{about.md → abstract.md} RENAMED Viewed

	@@ -1,2 +1,2 @@
1	- # ~~About~~
2	This project is focused on Mutilingual Visual Question Answering. Most of the existing datasets and models on this task work with English-only image-text pairs. Our intention here is to provide a Proof-of-Concept with our simple ViT+BERT model which can be trained on multilingual text checkpoints with pre-trained image encoders and made to perform well enough. Due to lack of good-quality multilingual data, we translate subsets of the Conceptual 12M dataset into English (already in English), French, German and Spanish using the Marian models. We achieved 0.49 accuracy on the multilingual validation set we created. With better captions, and hyperparameter-tuning, we expect to see higher performance.


1	+ # Abstract
2	This project is focused on Mutilingual Visual Question Answering. Most of the existing datasets and models on this task work with English-only image-text pairs. Our intention here is to provide a Proof-of-Concept with our simple ViT+BERT model which can be trained on multilingual text checkpoints with pre-trained image encoders and made to perform well enough. Due to lack of good-quality multilingual data, we translate subsets of the Conceptual 12M dataset into English (already in English), French, German and Spanish using the Marian models. We achieved 0.49 accuracy on the multilingual validation set we created. With better captions, and hyperparameter-tuning, we expect to see higher performance.

sections/checkpoints.md CHANGED Viewed

@@ -1,4 +1,4 @@
-**Checkpoints**:
 - Pre-trained checkpoint: [multilingual-vqa](https://huggingface.co/flax-community/multilingual-vqa)
 - Fine-tuned on 45k pretrained checkpoint: [multilingual-vqa-pt-45k-ft](https://huggingface.co/flax-community/multilingual-vqa-pt-45k-ft)
 - Fine-tuned on 45k pretrained checkpoint with AdaFactor (others use AdamW): [multilingual-vqa-pt-45k-ft-adf](https://huggingface.co/flax-community/multilingual-vqa-pt-45k-ft-adf)

+# Checkpoints
 - Pre-trained checkpoint: [multilingual-vqa](https://huggingface.co/flax-community/multilingual-vqa)
 - Fine-tuned on 45k pretrained checkpoint: [multilingual-vqa-pt-45k-ft](https://huggingface.co/flax-community/multilingual-vqa-pt-45k-ft)
 - Fine-tuned on 45k pretrained checkpoint with AdaFactor (others use AdamW): [multilingual-vqa-pt-45k-ft-adf](https://huggingface.co/flax-community/multilingual-vqa-pt-45k-ft-adf)

sections/social_impact.md CHANGED Viewed

	@@ -1,2 +1,2 @@
1	# Social Impact
2	- Multilingual Visual Question Answering has not received a lot of attention. There are very few multilingual VQA datasets, and that is what we wanted to address here. Our initial plan was to include 4 high-resource and 4 low-resource languages in our training data. However, the existing translations do not perform as well and we would have received poor labels, ~~along~~ with longer training time ~~needed~~. We hope to improve this in the future by using better translators (for e.g. Google Translate API) to get more multilingual data, especially in low-resource languages. Regardless, our aim with this project was to provide with a pipeline approach to deal with Multilingual visuo-linguistic pretraining and perform Multilingual Visual Question Answering.


1	# Social Impact
2	+ Multilingual Visual Question Answering has not received a lot of attention. There are very few multilingual VQA datasets, and that is what we wanted to address here. Our initial plan was to include 4 high-resource and 4 low-resource languages in our training data. However, the existing translations do not perform as well and we would have received poor labels, not to mention, with a longer training time. We hope to improve this in the future by using better translators (for e.g. Google Translate API) to get more multilingual data, especially in low-resource languages. Regardless, our aim with this project was to provide with a pipeline approach to deal with Multilingual visuo-linguistic pretraining and perform Multilingual Visual Question Answering.

sections/usage.md CHANGED Viewed

	@@ -10,4 +10,6 @@
10
11	- The top-5 predictions are displayed below and their respective confidence scores are shown in form of a bar plot.
12


13


10
11	- The top-5 predictions are displayed below and their respective confidence scores are shown in form of a bar plot.
12
13	+ For more info, scroll to the end of this app.
14	+
15