Spaces:

flax-community
/

multilingual-image-captioning

Runtime error

bhavitvyamalik commited on Jul 27, 2021

Commit

13976c3

1 Parent(s): 211aef0

updated limitations

Files changed (1) hide show

sections/bias.md CHANGED Viewed

@@ -1,5 +1,5 @@
 ### Limitations
-- Our model has a major limitation in that the training data provided was limited to a sequence length of 64 tokens. Hence, it doesn not perform very well with longer sequence lengths. Sometimes, it yields up empty captions. We are working on it as of this writing by doubling the maximum sequence length of translation and training.
 - The dataset has all `Person` type named entites masked as `<PERSON>`. While that is good for biases as we explain below, the dataset contains too many `<PERSON>` tags and the model results in `<PERSON><PERSON><PERSON>` sometimes for Person-related images.
 - Our captions are sometimes generic. Stating what is present in the image instead of generation well-formed and convoluted captions. Despite the training, the BLEU scores we achieve are not very great, which could be a reason for this. With higher BLEU scores, we can expect less-generic models.
 - English captions are sometimes better than other languages. This can be due to the fact that we limit sequence length of other languages to 64 (and now 128) while English text works fine. This could also be due to poor-quality translations which we wish to address in our next attempt.

 ### Limitations
+- Our model has a major limitation in that the training data provided was limited to a sequence length of 64 tokens. Hence, it does not perform very well with longer sequence lengths. Sometimes, it yields up empty captions. We are working on it as of this writing by doubling the maximum sequence length of translation and training.
 - The dataset has all `Person` type named entites masked as `<PERSON>`. While that is good for biases as we explain below, the dataset contains too many `<PERSON>` tags and the model results in `<PERSON><PERSON><PERSON>` sometimes for Person-related images.
 - Our captions are sometimes generic. Stating what is present in the image instead of generation well-formed and convoluted captions. Despite the training, the BLEU scores we achieve are not very great, which could be a reason for this. With higher BLEU scores, we can expect less-generic models.
 - English captions are sometimes better than other languages. This can be due to the fact that we limit sequence length of other languages to 64 (and now 128) while English text works fine. This could also be due to poor-quality translations which we wish to address in our next attempt.