bhavitvyamalik commited on
Commit
099abfc
1 Parent(s): 47dc27b

add contributions

Browse files
Files changed (2) hide show
  1. app.py +1 -1
  2. sections/intro/contributions.md +5 -0
app.py CHANGED
@@ -177,7 +177,7 @@ def main():
177
  st.set_page_config(
178
  page_title="Multilingual Image Captioning",
179
  layout="wide",
180
- initial_sidebar_state="collapsed",
181
  page_icon="./misc/mic-logo.png",
182
  )
183
 
 
177
  st.set_page_config(
178
  page_title="Multilingual Image Captioning",
179
  layout="wide",
180
+ initial_sidebar_state="auto",
181
  page_icon="./misc/mic-logo.png",
182
  )
183
 
sections/intro/contributions.md CHANGED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ Our novel contributions include:
2
+ - A [multilingual variant of the Conceptual-12M dataset (mBART50)](https://huggingface.co/datasets/flax-community/conceptual-12m-mbart-50-multilingual) containing 2.5M image-text pairs each in four languages - English, French, German and Spanish, translated using mBART-50 model.
3
+ - A [multilingual variant of the Conceptual-12M dataset (MarianMT)](https://huggingface.co/datasets/flax-community/conceptual-12m-multilingual-marian) containing 2.5M image-text pairs each in four languages - English, French, German and Spanish, translated using MarianMT model.
4
+ - [A fusion of CLIP Vision Transformer and mBART50 model](https://github.com/gchhablani/multilingual-vqa/tree/main/models/flax_clip_vision_bert). It takes in visual embeddings from CLIP-Vision transformer and feeds into the `encoder_hidden_states` of a mBART50 decoder. This is done for deep cross-modal interaction via cross-attention between the two models.
5
+ - A [pre-trained checkpooint](https://huggingface.co/flax-community/clip-vit-base-patch32_mbart-large-50) on our multilingual Conceptual-12M variant.