microsoft 's Collections

GIT

GIT (Generative Image-to-text Transformer) is a model useful for vision-language tasks such as image/video captioning and question answering.