WIT: Wikipedia-based Image Text Dataset for Multimodal Multilingual Machine Learning Paper • 2103.01913 • Published Mar 2, 2021 • 2