GIT Collection GIT (Generative Image-to-text Transformer) is a model useful for vision-language tasks such as image/video captioning and question answering. • 18 items • Updated Jul 11 • 10