FAPM / dataset_card /coco_caption.md
wenkai's picture
Upload 32 files
a60ca6c verified
|
raw
history blame
3.89 kB

Samples from the COCO Caption dataset (Image credit: "https://arxiv.org/pdf/1504.00325.pdf").(Samples from the COCO Caption dataset. Image credit: "https://arxiv.org/pdf/1504.00325.pdf")

Microsoft COCO Dataset (Captioning)

Description

Microsoft COCO Captions dataset contains over one and a half million captions describing over 330,000 images. For the training and validation images, five independent human generated captions are be provided for each image.

Task

(from https://paperswithcode.com/task/image-captioning)

Image captioning is the task of describing the content of an image in words. This task lies at the intersection of computer vision and natural language processing. Most image captioning systems use an encoder-decoder framework, where an input image is encoded into an intermediate representation of the information in the image, and then decoded into a descriptive text sequence.

Metrics

Models are typically evaluated according to a BLEU or CIDER metric.

Leaderboard

(Ranked by BLEU-4)

Rank Model BLEU-4 CIDEr METEOR SPICE Resources
1 OFA 44.9 154.9 32.5 26.6 paper, code
2 LEMON 42.6 145.5 31.4 25.5 paper
3 CoCa 40.9 143.6 33.9 24.7 paper
4 SimVLM 40.6 143.3 33.7 25.4 paper
5 VinVL 41.0 140.9 31.1 25.2 paper, code
6 OSCAR 40.7 140.0 30.6 24.5 paper, code
7 BLIP 40.4 136.7 31.4 24.3 paper, code, demo
8 M^2 39.1 131.2 29.2 22.6 paper, code
9 BUTD 36.5 113.5 27.0 20.3 paper, code
10 ClipCap 32.2 108.4 27.1 20.1 paper, code

Auto-Downloading

cd lavis/datasets/download_scripts && python download_coco.py

References

"Microsoft COCO Captions: Data Collection and Evaluation Server", Xinlei Chen, Hao Fang, Tsung-Yi Lin, Ramakrishna Vedantam, Saurabh Gupta, Piotr Dollar, C. Lawrence Zitnick