sections/future_scope.md · flax-community/multilingual-image-captioning at f5ecd98e8899dc27683b88f2a36ec996d563e09d

Future scope of work

We hope to improve this project in the future by using:

Better translating options: Translation has a very huge impact on how the end model would perform. Better translators (for e.g. Google Translate API) and language specific seq2seq models for translation are able to generate better data, both for high-resource and low-resource languages.
More training time: We found that training image captioning model for an epoch takes a lot of compute time and if we want to replicate the same then the training time goes up manyfold for the same number of samples.
Accessibility: Make model deployable on hand-held devices to make it more accessible. Currently, our model is too large to fit on mobile/edge devices because of which not many will be able to access it. However, our final goal is ensure everyone can access it without any computation barriers. We got to know that JAX has an experimental converter jax2tfto convert JAX functions to TF. Hopefully we'll be able to support TFLite for our model as well in future.