gchhablani commited on
Commit
abbfb41
1 Parent(s): 97b0cf1

Update sections

Browse files
sections/abstract.md CHANGED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ # Social Impact
2
+ Being able to automatically describe the content of an image using properly formed sentences in any language is a challenging task, but it could have great impact by helping visually impaired people better understand their surroundings.
3
+
4
+ Our initial plan was to work with a low-resource language - Marathi. However, the existing translations do not perform as well and we would have received poor labels and hence we did not pursue this further.
sections/acknowledgements.md CHANGED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ # Acknowledgements
2
+ We'd like to thank [Abheesht Sharma](https://huggingface.co/abheesht) for helping in the discussions in the initial phases. [Luke Melas](https://github.com/lukemelas) helped us get the cleaned CC-12M data on our TPU-VMs and we are very grateful to him.
3
+
4
+ This project would not be possible without the help of [Patrick](https://huggingface.co/patrickvonplaten) and [Suraj](https://huggingface.co/valhalla) who met with us and helped us review our approach and guided us throughout the project. We especially thank Patrick for going out of the way and allowing us extra TPU time so that we could work on this project.
5
+
6
+ Last but not the least, we thank the Google Team for helping answer our queries on the Slack channel, and for providing us TPU-VMs.
sections/challenges.md CHANGED
@@ -0,0 +1 @@
 
 
1
+ # Challenges and Technical Difficulties
sections/intro.md CHANGED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+
2
+ This demo uses [CLIP-Vision-Marian model checkpoint](https://huggingface.co/flax-community/spanish-image-captioninh/) to predict caption for a given image in Spanish. Training was done using image encoder and text decoder with approximately 2.5 million image-text pairs taken from the [Conceptual 12M dataset](https://github.com/google-research-datasets/conceptual-12m) with captions translated using [Marian](https://huggingface.co/transformers/model_doc/marian.html).
3
+
4
+
5
+ For more details, click on `Usage` or `Article` 🤗 below.
sections/references.md CHANGED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # References
2
+ - [Conceptual 12M Dataset](https://github.com/google-research-datasets/conceptual-12m)
3
+
4
+ - [Hybrid CLIP Example](https://github.com/huggingface/transformers/blob/master/src/transformers/models/clip/modeling_flax_clip.py)
5
+
6
+ - [Marian Modeling File](https://github.com/huggingface/transformers/blob/master/src/transformers/models/marian/modeling_flax_marian.py)
7
+
8
+ - [CLIP Modeling File](https://github.com/huggingface/transformers/blob/master/src/transformers/models/clip/modeling_flax_clip.py)
9
+
10
+ - [Hybrid CLIP Training Script](https://github.com/huggingface/transformers/blob/master/examples/research_projects/jax-projects/hybrid_clip/run_hybrid_clip.py)
11
+
12
+ - [Summarization Training Script](https://github.com/huggingface/transformers/blob/master/examples/flax/summarization/run_summarization_flax.py)
sections/social_impact.md CHANGED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ # Social Impact
2
+ Being able to automatically describe the content of an image using properly formed sentences in any language is a challenging task, but it could have great impact by helping visually impaired people better understand their surroundings.
3
+
4
+ Our initial plan was to work with a low-resource language - Marathi. However, the existing translations do not perform as well and we would have received poor labels and hence we did not pursue this further.