|
Benchmark
|
|
|
|
|
|
We provide scripts for evaluating and training models on task datasets. The following benchmark results are included for reference.
|
|
|
|
|
|
ALBEF
|
|
*******
|
|
.. list-table::
|
|
:widths: 30 80 20
|
|
|
|
* - **Pretraining**
|
|
- COCO (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_coco.py>`__)
|
|
- `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/albef/train/pretrain.sh>`__
|
|
* -
|
|
- Visual Genome (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_vg.py>`__)
|
|
-
|
|
* -
|
|
- SBU (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_sbu.py>`__)
|
|
-
|
|
* -
|
|
- CC3M (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/DownloadConceptualCaptions/download_data_cc3m.py>`__)
|
|
-
|
|
* -
|
|
- CC12M (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/DownloadConceptualCaptions/download_data_cc12m.py>`__)
|
|
-
|
|
|
|
.. list-table::
|
|
:widths: 30 40 20 20 20 30 30
|
|
:header-rows: 1
|
|
|
|
* -
|
|
- **Retrieval**
|
|
- **R1**
|
|
- **R5**
|
|
- **R10**
|
|
- **Training**
|
|
- **Evaluation**
|
|
* - TR
|
|
- COCO (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_coco.py>`__)
|
|
- 77.6
|
|
- 94.1
|
|
- 97.2
|
|
- `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/albef/train/train_coco_retrieval_albef.sh>`__
|
|
- `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/albef/eval/eval_coco_retrieval.sh>`__
|
|
* - IR
|
|
- COCO (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_coco.py>`__)
|
|
- 61.0
|
|
- 84.5
|
|
- 90.7
|
|
- `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/albef/train/train_coco_retrieval_albef.sh>`__
|
|
- `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/albef/eval/eval_coco_retrieval.sh>`__
|
|
* - TR
|
|
- Flickr30k (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_flickr.py>`__)
|
|
- 77.6
|
|
- 94.1
|
|
- 97.2
|
|
- `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/albef/train/train_flickr30k_retrieval_albef.sh>`__
|
|
- `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/albef/eval/eval_flickr30k_retrieval.sh>`__
|
|
* - IR
|
|
- Flickr30k (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_flickr.py>`__)
|
|
- 61.0
|
|
- 84.5
|
|
- 90.7
|
|
- `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/albef/train/train_flickr30k_retrieval_albef.sh>`__
|
|
- `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/albef/eval/eval_flickr30k_retrieval.sh>`__
|
|
|
|
|
|
.. list-table::
|
|
:widths: 20 20 20 20 20
|
|
:header-rows: 1
|
|
|
|
* - **VQA**
|
|
- **test-dev**
|
|
- **test-std/test**
|
|
- **Training**
|
|
- **Evaluation**
|
|
* - VQAv2 (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_coco.py>`__)
|
|
- 76.35
|
|
- 76.54
|
|
- `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/albef/train/train_vqa_albef.sh>`__
|
|
- `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/albef/eval/test_albef_vqa.sh>`__
|
|
* - OKVQA (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_coco.py>`__)
|
|
- NA
|
|
- 54.7
|
|
- `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/albef/train/train_okvqa_albef.sh>`__
|
|
- NA
|
|
* - AOKVQA (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_coco.py>`__)
|
|
- 54.5
|
|
- NA
|
|
- `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/albef/train/train_aokvqa_albef.sh>`__
|
|
- NA
|
|
|
|
|
|
.. list-table::
|
|
:widths: 20 20 20 20 20
|
|
:header-rows: 1
|
|
|
|
* - **Multimodal Classification**
|
|
- **val**
|
|
- **test**
|
|
- **Training**
|
|
- **Evaluation**
|
|
* - SNLI-VE (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_coco.py>`__)
|
|
- 80.60
|
|
- 81.04
|
|
- `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/albef/train/train_ve_albef.sh>`__
|
|
- `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/albef/eval/eval_albef_ve.sh>`__
|
|
* - NLVR2 (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_coco.py>`__)
|
|
- 82.47
|
|
- 82.91
|
|
- `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/albef/train/train_nlvr_albef.sh>`__
|
|
- `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/albef/eval/eval_albef_nlvr.sh>`__
|
|
|
|
BLIP
|
|
*******
|
|
.. list-table::
|
|
:widths: 30 80 20
|
|
|
|
* - **Pretraining (14M)**
|
|
- COCO (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_coco.py>`__)
|
|
- `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/blip/train/pretrain.sh>`__
|
|
* -
|
|
- Visual Genome (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_vg.py>`__)
|
|
-
|
|
* -
|
|
- SBU (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_sbu.py>`__)
|
|
-
|
|
* -
|
|
- CC3M (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/DownloadConceptualCaptions/download_data_cc3m.py>`__)
|
|
-
|
|
* -
|
|
- CC12M (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/DownloadConceptualCaptions/download_data_cc12m.py>`__)
|
|
-
|
|
|
|
.. list-table::
|
|
:widths: 30 40 20 20 20 30 30
|
|
:header-rows: 1
|
|
|
|
* - **Tasks**
|
|
- **Retrieval**
|
|
- **R1**
|
|
- **R5**
|
|
- **R10**
|
|
- **Training**
|
|
- **Evaluation**
|
|
* - TR
|
|
- COCO (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_coco.py>`__)
|
|
- 82.0
|
|
- 95.8
|
|
- 98.1
|
|
- `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/blip/train/train_retrieval_coco.sh>`__
|
|
- `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/blip/eval/eval_ret_coco.sh>`__
|
|
* - IR
|
|
- COCO (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_coco.py>`__)
|
|
- 64.5
|
|
- 86.0
|
|
- 91.7
|
|
- `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/blip/train/train_retrieval_coco.sh>`__
|
|
- `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/blip/eval/eval_ret_coco.sh>`__
|
|
* - TR
|
|
- Flickr30k (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_flickr.py>`__)
|
|
- 96.9
|
|
- 99.9
|
|
- 100.0
|
|
- `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/blip/train/train_retrieval_flickr.sh>`__
|
|
- `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/blip/eval/eval_ret_flickr.sh>`__
|
|
* - IR
|
|
- Flickr30k (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_flickr.py>`__)
|
|
- 87.5
|
|
- 97.6
|
|
- 98.9
|
|
- `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/blip/train/train_retrieval_flickr.sh>`__
|
|
- `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/blip/eval/eval_ret_flickr.sh>`__
|
|
|
|
|
|
.. list-table::
|
|
:widths: 20 20 20 20 20
|
|
:header-rows: 1
|
|
|
|
* - **VQA**
|
|
- **test-dev**
|
|
- **test-std/test**
|
|
- **Training**
|
|
- **Evaluation**
|
|
* - VQAv2 (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_coco.py>`__)
|
|
- 78.23
|
|
- 78.29
|
|
- `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/albef/train/train_vqa_albef.sh>`__
|
|
- `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/albef/eval/test_albef_vqa.sh>`__
|
|
* - OKVQA (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_coco.py>`__)
|
|
- NA
|
|
- 55.4
|
|
- `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/blip/train/train_okvqa.sh>`__
|
|
- `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/blip/eval/eval_okvqa.sh>`__
|
|
* - AOKVQA (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_coco.py>`__)
|
|
- 56.2
|
|
- 50.1
|
|
- `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/blip/train/train_aokvqa.sh>`__
|
|
- `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/blip/eval/eval_aokvqa.sh>`__
|
|
|
|
|
|
.. list-table::
|
|
:widths: 20 20 20 20 20 20
|
|
:header-rows: 1
|
|
|
|
* - **Image Captioning**
|
|
- **BLEU@4**
|
|
- **CIDEr**
|
|
- **SPICE**
|
|
- **Training**
|
|
- **Evaluation**
|
|
* - COCO (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_coco.py>`__)
|
|
- 39.9
|
|
- 133.5
|
|
- 23.7
|
|
- `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/blip/train/train_caption_coco.sh>`__
|
|
- `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/blip/eval/eval_coco_cap.sh>`__
|
|
* - NoCaps (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_nocaps.py>`__)
|
|
- 31.9
|
|
- 109.1
|
|
- 14.7
|
|
- NA
|
|
- `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/blip/eval/eval_nocaps.sh>`__
|
|
|
|
|
|
.. list-table::
|
|
:widths: 20 20 20 20 20
|
|
:header-rows: 1
|
|
|
|
* - **Multimodal Classification**
|
|
- **val**
|
|
- **test**
|
|
- **Training**
|
|
- **Evaluation**
|
|
* - NLVR2 (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_coco.py>`__)
|
|
- 82.48
|
|
- 83.25
|
|
- `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/blip/train/train_nlvr.sh>`__
|
|
- `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/blip/eval/eval_nlvr.sh>`__
|
|
|
|
CLIP
|
|
*******
|
|
.. list-table::
|
|
:widths: 30 40 20 20 20 30
|
|
:header-rows: 1
|
|
|
|
* - **Tasks**
|
|
- **Retrieval (Zero-shot)**
|
|
- **R1**
|
|
- **R5**
|
|
- **R10**
|
|
- **Evaluation**
|
|
* - TR
|
|
- COCO (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_coco.py>`__)
|
|
- 57.2
|
|
- 80.5
|
|
- 87.8
|
|
- `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/clip/eval/eval_clip_ret_coco.sh>`__
|
|
* - IR
|
|
- COCO (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_coco.py>`__)
|
|
- 36.5
|
|
- 60.8
|
|
- 71.0
|
|
- `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/clip/eval/eval_clip_ret_coco.sh>`__
|
|
* - TR
|
|
- Flickr30k (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_flickr.py>`__)
|
|
- 86.5
|
|
- 98.0
|
|
- 99.1
|
|
- `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/clip/eval/eval_clip_ret_flickr.sh>`__
|
|
* - IR
|
|
- Flickr30k (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_flickr.py>`__)
|
|
- 67.0
|
|
- 88.9
|
|
- 93.3
|
|
- `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/clip/eval/eval_clip_ret_flickr.sh>`__
|
|
|
|
.. list-table::
|
|
:widths: 20 20 20
|
|
:header-rows: 1
|
|
|
|
* - **Multimodal Classification**
|
|
- **val**
|
|
- **Evaluation**
|
|
* - ImageNet
|
|
- 76.5
|
|
- `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/clip/eval/eval_clip_zs_imnet.sh>`__
|
|
|
|
|
|
ALPRO
|
|
*******
|
|
.. list-table::
|
|
:widths: 30 40 20 20 20 20 30
|
|
:header-rows: 1
|
|
|
|
* - **Tasks**
|
|
- **Retrieval**
|
|
- **R1**
|
|
- **R5**
|
|
- **R10**
|
|
- **Training**
|
|
- **Evaluation**
|
|
* - TR
|
|
- MSRVTT (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_msrvtt.py>`__)
|
|
- 33.2
|
|
- 60.5
|
|
- 71.7
|
|
- `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/alpro/train/train_msrvtt_ret.sh>`__
|
|
- `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/alpro/eval/eval_msrvtt_ret.sh>`__
|
|
* - VR
|
|
- MSRVTT (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_msrvtt.py>`__)
|
|
- 33.8
|
|
- 61.4
|
|
- 72.7
|
|
- `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/alpro/train/train_msrvtt_ret.sh>`__
|
|
- `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/alpro/eval/eval_msrvtt_ret.sh>`__
|
|
* - TR
|
|
- DiDeMo (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_didemo.py>`__)
|
|
- 38.8
|
|
- 66.4
|
|
- 76.8
|
|
- `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/alpro/train/train_didemo_ret.sh>`__
|
|
- `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/alpro/eval/eval_didemo_ret.sh>`__
|
|
* - VR
|
|
- DiDeMo (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_didemo.py>`__)
|
|
- 36.6
|
|
- 67.5
|
|
- 77.9
|
|
- `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/alpro/train/train_didemo_ret.sh>`__
|
|
- `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/alpro/eval/eval_didemo_ret.sh>`__
|
|
|
|
.. list-table::
|
|
:widths: 20 20 20 20
|
|
:header-rows: 1
|
|
|
|
* - **Video QA**
|
|
- **test**
|
|
- **Training**
|
|
- **Evaluation**
|
|
* - MSRVTT
|
|
- 42.1
|
|
- `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/alpro/train/train_msrvtt_qa.sh>`__
|
|
- `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/alpro/eval/eval_msrvtt_qa.sh>`__
|
|
* - MSVD
|
|
- 46.0
|
|
- `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/alpro/train/train_msvd_qa.sh>`__
|
|
- `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/alpro/eval/eval_msvd_qa.sh>`__ |