## Future scope of work
We hope to improve this project in the future by using:
- Better translating options: Translation has a very huge impact on how the end model would perform. Better translators (for e.g. Google Translate API) and language specific seq2seq models for translation are able to generate better data, both for high-resource and low-resource languages.
- More training time: We found that training image captioning model for a simple epoch takes a lot of compute time and if we want to replicate the same then the training time goes up manyfold for the same number of samples.
- Accessibility: Make model deployable on hand-held devices to make it more accessible. Currently, our model is too large to fit on mobile/edge devices because of which not many will be able to access it. However, our final goal is ensure everyone can access it without any computation barriers. We got to know that JAX has an experimental converter `jax2tf`to convert JAX functions to TF. I hope we'll be able to support TFLite for our model as well in future.