jwyang
add docs
41fcb47
|
raw
history blame
No virus
4.07 kB

Unified Contrastive Learning in Image-Text-Label Space

"Unifiled Contrastive Learning in Image-Text-Label Space. CVPR 2022" by Jianwei Yang*, Chunyuan Li*, Pengchuan Zhang*, Bin Xiao*, Ce Liu, Lu Yuan and Jianfeng Gao.

Motivation

In this paper, we introduce a new perspective on commonly used image-label and image-text data by residing them in an image-text-label space. In this space, a new learning paradigm, called Unified Contrastive Learning (UniCL) with a single learning objective is proposed to seamlessly prompt the synergy of two data types. We demonstrate that UniCL is an effective way of learning semantically rich yet discriminative representations, universally for image recognition in zero-shot, linear-probe, fully finetuning and transfer learning scenarios. When scaled up to billions of data, UniCL can exclusively learn a powerful visual-semantic representation supporting dozens of downstream tasks shown in Florence.

Benchmarking

Image-label training augmented by image-text pairs

Model Training Set Top-1 on IN-1K ZS on 14 datasets Download
Swin-T IN-1K 79.9 30.2 ckpt/config
Swin-T IN-1K + GCC-3M 80.2 39.0 ckpt/config
Swin-T IN-1K + GYFCC-14M 81.1 40.0 ckpt/config
Swin-T IN-1K + GCC-15M 81.8 45.1 ckpt/config

Note that all the above models are trained without strong data augmentations like mixup and cutmix.

Image-text learning augmented by image-label data

Model Training Set ZS on IN-1K ZS on 14 datasets Download
Swin-T YFCC-14M 30.1 36.3 ckpt/config
Swin-T IN-21K 28.5 37.8 ckpt/config
Swin-T IN-21K (half) + YFCC-14M (half) 36.4 45.5 ckpt/config
Swin-T IN-21K + YFCC-14M 40.5 49.1 ckpt/config
Swin-B YFCC-14M 37.8 - ckpt/config
Swin-B IN-21K 29.9 42.4 ckpt/config
Swin-B IN-21K (half) + YFCC-14M (half) 41.1 48.5 ckpt/config
Swin-B IN-21K + YFCC-14M 44.3 52.2 ckpt/config
Swin-B IN-21K + YFCC-14M + GCC-15M 57.9 - ckpt/config