gchhablani commited on
Commit
627e34d
β€’
1 Parent(s): 8b842e0

Update application

Browse files
app.py CHANGED
@@ -57,11 +57,6 @@ st.write("[Gunjan Chhablani](https://huggingface.co/gchhablani), [Bhavitvya Mali
57
  with st.beta_expander("Usage"):
58
  st.markdown(read_markdown("usage.md"))
59
 
60
- with st.beta_expander("Method"):
61
- st.image("./misc/Multilingual-VQA.png")
62
- st.markdown(read_markdown("pretraining.md"))
63
- st.markdown(read_markdown("finetuning.md"))
64
-
65
  first_index = 20
66
  # Init Session State
67
  if state.image_file is None:
@@ -122,8 +117,12 @@ fig = plotly_express_horizontal_bar_plot(values, translated_labels)
122
  st.plotly_chart(fig, use_container_width = True)
123
 
124
 
125
- st.write(read_markdown("about.md"))
126
  st.write(read_markdown("caveats.md"))
 
 
 
 
127
  st.write(read_markdown("challenges.md"))
128
  st.write(read_markdown("social_impact.md"))
129
  st.write(read_markdown("references.md"))
 
57
  with st.beta_expander("Usage"):
58
  st.markdown(read_markdown("usage.md"))
59
 
 
 
 
 
 
60
  first_index = 20
61
  # Init Session State
62
  if state.image_file is None:
 
117
  st.plotly_chart(fig, use_container_width = True)
118
 
119
 
120
+ st.write(read_markdown("abstract.md"))
121
  st.write(read_markdown("caveats.md"))
122
+ st.write("# Methodology")
123
+ st.image("./misc/Multilingual-VQA.png", caption="Masked LM model for Image-text Pretraining.")
124
+ st.markdown(read_markdown("pretraining.md"))
125
+ st.markdown(read_markdown("finetuning.md"))
126
  st.write(read_markdown("challenges.md"))
127
  st.write(read_markdown("social_impact.md"))
128
  st.write(read_markdown("references.md"))
sections/{about.md β†’ abstract.md} RENAMED
@@ -1,2 +1,2 @@
1
- # About
2
  This project is focused on Mutilingual Visual Question Answering. Most of the existing datasets and models on this task work with English-only image-text pairs. Our intention here is to provide a Proof-of-Concept with our simple ViT+BERT model which can be trained on multilingual text checkpoints with pre-trained image encoders and made to perform well enough. Due to lack of good-quality multilingual data, we translate subsets of the Conceptual 12M dataset into English (already in English), French, German and Spanish using the Marian models. We achieved 0.49 accuracy on the multilingual validation set we created. With better captions, and hyperparameter-tuning, we expect to see higher performance.
 
1
+ # Abstract
2
  This project is focused on Mutilingual Visual Question Answering. Most of the existing datasets and models on this task work with English-only image-text pairs. Our intention here is to provide a Proof-of-Concept with our simple ViT+BERT model which can be trained on multilingual text checkpoints with pre-trained image encoders and made to perform well enough. Due to lack of good-quality multilingual data, we translate subsets of the Conceptual 12M dataset into English (already in English), French, German and Spanish using the Marian models. We achieved 0.49 accuracy on the multilingual validation set we created. With better captions, and hyperparameter-tuning, we expect to see higher performance.
sections/checkpoints.md CHANGED
@@ -1,4 +1,4 @@
1
- **Checkpoints**:
2
  - Pre-trained checkpoint: [multilingual-vqa](https://huggingface.co/flax-community/multilingual-vqa)
3
  - Fine-tuned on 45k pretrained checkpoint: [multilingual-vqa-pt-45k-ft](https://huggingface.co/flax-community/multilingual-vqa-pt-45k-ft)
4
  - Fine-tuned on 45k pretrained checkpoint with AdaFactor (others use AdamW): [multilingual-vqa-pt-45k-ft-adf](https://huggingface.co/flax-community/multilingual-vqa-pt-45k-ft-adf)
 
1
+ # Checkpoints
2
  - Pre-trained checkpoint: [multilingual-vqa](https://huggingface.co/flax-community/multilingual-vqa)
3
  - Fine-tuned on 45k pretrained checkpoint: [multilingual-vqa-pt-45k-ft](https://huggingface.co/flax-community/multilingual-vqa-pt-45k-ft)
4
  - Fine-tuned on 45k pretrained checkpoint with AdaFactor (others use AdamW): [multilingual-vqa-pt-45k-ft-adf](https://huggingface.co/flax-community/multilingual-vqa-pt-45k-ft-adf)
sections/social_impact.md CHANGED
@@ -1,2 +1,2 @@
1
  # Social Impact
2
- Multilingual Visual Question Answering has not received a lot of attention. There are very few multilingual VQA datasets, and that is what we wanted to address here. Our initial plan was to include 4 high-resource and 4 low-resource languages in our training data. However, the existing translations do not perform as well and we would have received poor labels, along with longer training time needed. We hope to improve this in the future by using better translators (for e.g. Google Translate API) to get more multilingual data, especially in low-resource languages. Regardless, our aim with this project was to provide with a pipeline approach to deal with Multilingual visuo-linguistic pretraining and perform Multilingual Visual Question Answering.
 
1
  # Social Impact
2
+ Multilingual Visual Question Answering has not received a lot of attention. There are very few multilingual VQA datasets, and that is what we wanted to address here. Our initial plan was to include 4 high-resource and 4 low-resource languages in our training data. However, the existing translations do not perform as well and we would have received poor labels, not to mention, with a longer training time. We hope to improve this in the future by using better translators (for e.g. Google Translate API) to get more multilingual data, especially in low-resource languages. Regardless, our aim with this project was to provide with a pipeline approach to deal with Multilingual visuo-linguistic pretraining and perform Multilingual Visual Question Answering.
sections/usage.md CHANGED
@@ -10,4 +10,6 @@
10
 
11
  - The top-5 predictions are displayed below and their respective confidence scores are shown in form of a bar plot.
12
 
 
 
13
 
 
10
 
11
  - The top-5 predictions are displayed below and their respective confidence scores are shown in form of a bar plot.
12
 
13
+ For more info, scroll to the end of this app.
14
+
15