gchhablani commited on
Commit
58582da
1 Parent(s): 7cce2d4

Update acknowledgement and intro.

Browse files
sections/acknowledgements.md CHANGED
@@ -1,6 +1,6 @@
1
  # Acknowledgements
2
  We'd like to thank [Abheesht Sharma](https://huggingface.co/abheesht) for helping in the discussions in the initial phases. [Luke Melas](https://github.com/lukemelas) helped us get the cleaned CC-12M data on our TPU-VMs and we are very grateful to him.
3
 
4
- This project would not be possible without the help of [Patrick](https://huggingface.co/patrickvonplaten) and [Suraj](https://huggingface.co/valhalla) who met with us and helped us review our approach and guided us throughout the project.
5
 
6
  Last but not the least, we thank the Google Team for helping answer our queries on the Slack channel, and for providing us TPU-VMs.
 
1
  # Acknowledgements
2
  We'd like to thank [Abheesht Sharma](https://huggingface.co/abheesht) for helping in the discussions in the initial phases. [Luke Melas](https://github.com/lukemelas) helped us get the cleaned CC-12M data on our TPU-VMs and we are very grateful to him.
3
 
4
+ This project would not be possible without the help of [Patrick](https://huggingface.co/patrickvonplaten) and [Suraj](https://huggingface.co/valhalla) who met with us and helped us review our approach and guided us throughout the project. We especially thank Patrick for going out of the way and allowing us extra TPU time so that we could work on this project.
5
 
6
  Last but not the least, we thank the Google Team for helping answer our queries on the Slack channel, and for providing us TPU-VMs.
sections/intro.md CHANGED
@@ -1,5 +1,5 @@
1
 
2
- This demo uses [CLIP-mBART50 model checkpoint](https://huggingface.co/flax-community/multilingual-image-captioning-5M/) to predict caption for a given image in 4 languages (English, French, German, Spanish). Training was done using image encoder and text decoder with approximately 5 million image-text pairs taken from the [Conceptual 12M dataset](https://github.com/google-research-datasets/conceptual-12m) translated using [MBart50](https://huggingface.co/transformers/model_doc/mbart50.html).
3
 
4
  New demo coming soon 🤗
5
 
 
1
 
2
+ This demo uses [CLIP-mBART50 model checkpoint](https://huggingface.co/flax-community/multilingual-image-captioning-5M/) to predict caption for a given image in 4 languages (English, French, German, Spanish). Training was done using image encoder (CLIP-ViT) and text decoder (mBART50) with approximately 5 million image-text pairs taken from the [Conceptual 12M dataset](https://github.com/google-research-datasets/conceptual-12m) translated using [MBart50](https://huggingface.co/transformers/model_doc/mbart50.html).
3
 
4
  New demo coming soon 🤗
5