bhavitvyamalik commited on
Commit
da4fdce
1 Parent(s): db42db5
Files changed (2) hide show
  1. apps/article.py +5 -6
  2. sections/intro/intro.md +1 -1
apps/article.py CHANGED
@@ -30,6 +30,11 @@ def app(state=None):
30
  # toc.subsubheader("MLM Training Logs")
31
  # st.info("In case the TensorBoard logs are not displayed, please visit this link: https://huggingface.co/flax-community/multilingual-vqa-pt-ckpts/tensorboard")
32
  # st_tensorboard(logdir='./logs/pretrain_logs', port=6006)
 
 
 
 
 
33
  st.write(read_markdown("bias.md"))
34
 
35
  _, col2, col3, _ = st.beta_columns([0.5,2.5,2.5,0.5])
@@ -50,12 +55,6 @@ def app(state=None):
50
  with col3:
51
  st.image("./misc/examples/female_biker_resized.jpg", width=350, caption = 'German Caption: <PERSON> auf dem Motorrad von <PERSON>.', use_column_width='always')
52
 
53
- toc.header("Challenges and Technical Difficulties")
54
- st.write(read_markdown("challenges.md"))
55
-
56
- toc.header("Limitations and Biases")
57
- st.write(read_markdown("bias.md"))
58
-
59
  toc.header("Conclusion, Future Work, and Social Impact")
60
  toc.subheader("Conclusion")
61
  st.write(read_markdown("conclusion_future_work/conclusion.md"))
30
  # toc.subsubheader("MLM Training Logs")
31
  # st.info("In case the TensorBoard logs are not displayed, please visit this link: https://huggingface.co/flax-community/multilingual-vqa-pt-ckpts/tensorboard")
32
  # st_tensorboard(logdir='./logs/pretrain_logs', port=6006)
33
+
34
+ toc.header("Challenges and Technical Difficulties")
35
+ st.write(read_markdown("challenges.md"))
36
+
37
+ toc.header("Limitations and Biases")
38
  st.write(read_markdown("bias.md"))
39
 
40
  _, col2, col3, _ = st.beta_columns([0.5,2.5,2.5,0.5])
55
  with col3:
56
  st.image("./misc/examples/female_biker_resized.jpg", width=350, caption = 'German Caption: <PERSON> auf dem Motorrad von <PERSON>.', use_column_width='always')
57
 
 
 
 
 
 
 
58
  toc.header("Conclusion, Future Work, and Social Impact")
59
  toc.subheader("Conclusion")
60
  st.write(read_markdown("conclusion_future_work/conclusion.md"))
sections/intro/intro.md CHANGED
@@ -1,3 +1,3 @@
1
- This project is focused on Mutilingual Image Captioning which has attracted an increasing amount of attention in the last decade due to its potential applications. Most of the existing datasets and models on this task work with English-only image-text pairs. It is a challenging task to generate captions with proper linguistics properties in different languages as it requires an advanced level of image understanding. Our intention here is to provide a Proof-of-Concept with our CLIP Vision + mBART-50 model baseline which leverages a multilingual checkpoint with pre-trained image encoders. Our model currently supports for four languages - **English, French, German, and Spanish**.
2
 
3
  Due to lack of good-quality multilingual data, we translate subsets of the Conceptual 12M dataset into English (no translation needed), French, German and Spanish using the MarianMT model belonging to the respective language. With better translated captions, and hyperparameter-tuning, we expect to see higher performance.
1
+ This project is focused on Mutilingual Image Captioning, which has attracted an increasing amount of attention in the last decade due to its potential applications. Most of the existing datasets and models on this task work with English-only image-text pairs. It is a challenging task to generate captions with proper linguistics properties in different languages as it requires an advanced level of image understanding. Our intention here is to provide a Proof-of-Concept with our CLIP Vision + mBART-50 model baseline which leverages a multilingual checkpoint with pre-trained image encoders. Our model currently supports for four languages - **English, French, German, and Spanish**.
2
 
3
  Due to lack of good-quality multilingual data, we translate subsets of the Conceptual 12M dataset into English (no translation needed), French, German and Spanish using the MarianMT model belonging to the respective language. With better translated captions, and hyperparameter-tuning, we expect to see higher performance.