Spaces:

flax-community
/

spanish-image-captioning

Runtime error

bhavitvyamalik commited on Jul 22, 2021

Commit

c62e9c5

•

1 Parent(s): 8678313

update sections

Files changed (4) hide show

sections/challenges.md CHANGED Viewed

@@ -5,6 +5,4 @@ We faced challenges at every step of the way, despite having some example script
 - The translations with deep learning models aren't as "perfect" as translation APIs like Google and Yandex. This could lead to poor performance.
-- We prepared the model and config classes for our model from scratch, basing it on `CLIP Vision` and `mBART` implementations in Flax. The ViT embeddings should be used inside the BERT embeddings class, which was the major challenge here.
-- We were only able to get around 1.5 days of training time on TPUs due to above mentioned challenges. We were unable to perform hyperparameter tuning. Our [loss curves on the pre-training model](https://huggingface.co/flax-community/spanish-image-captioning/tensorboard) show that the training hasn't converged, and we could see further improvement in the BLEU scores.


5
6	- The translations with deep learning models aren't as "perfect" as translation APIs like Google and Yandex. This could lead to poor performance.
7
8	+ - We prepared the model and config classes for our model from scratch, basing it on `CLIP Vision` and `Marian` implementations in Flax.

sections/intro.md CHANGED Viewed

@@ -1,4 +1,4 @@
-This demo uses [CLIP-Vision-Marian model checkpoint](https://huggingface.co/flax-community/spanish-image-captioning/) to predict caption for a given image in Spanish. Training was done using image encoder and text decoder with approximately 2.5 million image-text pairs taken from the [Conceptual 12M dataset](https://github.com/google-research-datasets/conceptual-12m) with captions translated using [Marian](https://huggingface.co/transformers/model_doc/marian.html).
 For more details, click on `Usage` or `Article` 🤗 below.


1	+ This demo uses [CLIP-Vision-Marian model checkpoint](https://huggingface.co/flax-community/clip-vit-base-patch32_marian-es) to predict caption for a given image in Spanish. Training was done using image encoder and text decoder with approximately 2.5 million image-text pairs taken from the [Conceptual 12M dataset](https://github.com/google-research-datasets/conceptual-12m) with captions translated using [MarianMT English to Spanish](https://huggingface.co/transformers/model_doc/marian.html).
2
3
4	For more details, click on `Usage` or `Article` 🤗 below.

sections/social_impact.md CHANGED Viewed

@@ -1,4 +1,4 @@
 ## Social Impact
 Being able to automatically describe the content of an image using properly formed sentences in any language is a challenging task, but it could have great impact by helping visually impaired people better understand their surroundings.
-Our initial plan was to work with a low-resource language - Marathi. However, the existing translations do not perform as well and we would have received poor labels and hence we did not pursue this further.

 ## Social Impact
 Being able to automatically describe the content of an image using properly formed sentences in any language is a challenging task, but it could have great impact by helping visually impaired people better understand their surroundings.
+Our initial plan was to work with a low-resource language only. However, the existing translations do not perform as well and we would have received poor labels and hence we did not pursue this further.

sections/usage.md CHANGED Viewed

@@ -1,4 +1,4 @@
-- This demo loads the `FlaxCLIPVisionMarianMT` present in the `model` directory of this repository. The checkpoint is loaded from `ckpt/ckpt-23999` which is pre-trained checkpoint with 24kk steps. 100 random validation set examples are present in the `references.tsv` with respective images in the `images` directory.
 - We provide `English Translation` of the generated caption and reference captions for users who are not well-acquainted with Spanish. This is done using `mtranslate` to keep things flexible enough and needs internet connection as it uses the Google Translate API. We will also add the original captions soon.


1	+ - This demo loads the `FlaxCLIPVisionMarianMT` present in the `model` directory of this repository. The checkpoint is loaded from `ckpt/ckpt-23999` which is pre-trained checkpoint with 24k steps. 100 random validation set examples are present in the `references.tsv` with respective images in the `images` directory.
2
3	- We provide `English Translation` of the generated caption and reference captions for users who are not well-acquainted with Spanish. This is done using `mtranslate` to keep things flexible enough and needs internet connection as it uses the Google Translate API. We will also add the original captions soon.
4