zamborg's picture
added datasets and virtex
a5f8a35
raw history blame
No virus
8.34 kB
How to evaluate on downstream tasks?
====================================
In our paper, we evaluate our pretrained VirTex models on seven different
downstream tasks. Our codebase supports all of these evaluations. Throughout
this documentation, we consider a specific example of our VirTex pretrained
model being evaluated for ensuring filepath uniformity in the following example
command snippets. Paths can be trivially adjusted for any other VirTex model;
evaluating the baselines (MoCo, ImageNet-supervised, Random Init) require
additional changes in commands, explained in the last sub-section.
As an example, consider a pretraining job for our best performing VirTex model
(``width_ablations/bicaptioning_R_50_L1_H2048.yaml``). The serialization
directory might look something like this:
.. code-block:: text
/tmp/bicaptioning_R_50_L1_H2048
pretrain_config.yaml
log-rank0.txt # stdout/stderr per GPU process
log-rank1.txt
...
log-rank7.txt
checkpoint_2000.pth
checkpoint_4000.pth
...
checkpoint_498000.pth
checkpoint_500000.pth # serialized checkpoints
train_captioning_forward/
events.out.* ... # tensorboard logs
...
We evaluate all checkpoints on **PASCAL VOC 2007 Linear Classification**, and
then evaluate the best checkpoint (here, it was iteration 500000) on all other
downstream tasks.
PASCAL VOC 2007 Linear Classification
-------------------------------------
Evaluate a single VirTex pretrained checkpoint on VOC 2007 ``trainval`` split:
.. code-block:: shell
python scripts/clf_voc07.py \
--config /tmp/bicaptioning_R_50_L1_H2048/pretrain_config.yaml \
--down-config configs/downstream/voc07_clf.yaml \
--checkpoint-path /tmp/bicaptioning_R_50_L1_H2048/checkpoint_500000.pth \
--weight-init virtex \
--num-gpus-per-machine 1 \
--cpu-workers 4 \
--serialization-dir /tmp/bicaptioning_R_50_L1_H2048
To evaluate recent 100 checkpoints in the sub-directory, this command can be
looped over as follows:
.. code-block:: shell
for ((iter = 300000; iter <= 500000; iter+=2000)); do
# add command with `checkpoint_$iter.pth`
done
This script write metric to tensorboard logs in the same pretraining directory,
all VOC07 mAP curves appear together with pretraining loss curves.
-------------------------------------------------------------------------------
ImageNet Linear Classification
------------------------------
We train a linear classifier on 2048-dimensional global average pooled features
extracted from a frozen visual backbone. Evaluate a checkpoint (for example,
iteration 500000) on this task as:
.. code-block:: shell
python scripts/clf_linear.py \
--config /tmp/bicaptioning_R_50_L1_H2048/pretrain_config.yaml \
--down-config configs/downstream/imagenet_clf.yaml \
--checkpoint-path /tmp/bicaptioning_R_50_L1_H2048/checkpoint_500000.pth \
--weight-init virtex \
--num-gpus-per-machine 8 \
--cpu-workers 4 \
--serialization-dir /tmp/bicaptioning_R_50_L1_H2048/imagenet_500000 \
--checkpoint-every 5005 # 1 epoch of ImageNet
-------------------------------------------------------------------------------
Instance Segmentation (and Object Detection) on COCO
----------------------------------------------------
Train a Mask R-CNN with FPN backbone for COCO Instance Segmentation (and Object
Detection, because it also has a box head) by initializing the backbone from
VirTex pretrained weights:
.. code-block:: shell
python scripts/eval_detectron2.py \
--config /tmp/bicaptioning_R_50_L1_H2048/pretrain_config.yaml \
--d2-config configs/detectron2/coco_segm_default_init_2x.yaml \
--checkpoint-path /tmp/bicaptioning_R_50_L1_H2048/checkpoint_500000.pth \
--weight-init virtex \
--num-gpus-per-machine 8 \
--cpu-workers 2 \
--serialization-dir /tmp/bicaptioning_R_50_L1_H2048/coco_segm_500000 \
--checkpoint-every 5000
.. note::
1. This script periodically serializes checkpoints but skips validation
step during training for saving time; to evaluate a serialized checkpoint
and write results to tensorboard, provide it as ``--checkpoint-path`` and
additional flags ``--resume --eval-only``.
2. Note that ``--d2-config`` here is in Detectron2 format, and not our
package :class:`~virtex.config.Config`.
These points are applicable for all tasks described below.
-------------------------------------------------------------------------------
Instance Segmentation on LVIS
-----------------------------
Train a Mask R-CNN with FPN backbone for LVIS Instance Segmentation by
initializing the backbone from VirTex pretrained weights:
.. code-block:: shell
python scripts/eval_detectron2.py \
--config /tmp/bicaptioning_R_50_L1_H2048/pretrain_config.yaml \
--d2-config configs/detectron2/lvis_segm_default_init_2x.yaml \
--checkpoint-path /tmp/bicaptioning_R_50_L1_H2048/checkpoint_500000.pth \
--weight-init virtex \
--num-gpus-per-machine 8 \
--cpu-workers 2 \
--serialization-dir /tmp/bicaptioning_R_50_L1_H2048/lvis_segm_500000 \
--checkpoint-every 5000
-------------------------------------------------------------------------------
Object Detection on PASCAL VOC 2007+12
--------------------------------------
Train a Faster R-CNN with C4 backbone for PASCAL VOC 2007+12 Object Detection
by initializing the backbone from VirTex pretrained weights:
.. code-block:: shell
python scripts/eval_detectron2.py \
--config /tmp/bicaptioning_R_50_L1_H2048/pretrain_config.yaml \
--d2-config configs/detectron2/voc_det_default_init_24k.yaml \
--checkpoint-path /tmp/bicaptioning_R_50_L1_H2048/checkpoint_500000.pth \
--weight-init virtex \
--num-gpus-per-machine 8 \
--cpu-workers 2 \
--serialization-dir /tmp/bicaptioning_R_50_L1_H2048/voc_det_500000 \
--checkpoint-every 2500
-------------------------------------------------------------------------------
iNaturalist 2018 Fine-Grained Classification
--------------------------------------------
Fine-tune the VirTex pretrained visual backbone end-to-end on iNaturalist 2018
dataset:
.. code-block:: shell
python scripts/clf_linear.py \
--config /tmp/bicaptioning_R_50_L1_H2048/pretrain_config.yaml \
--down-config configs/downstream/inaturalist_clf.yaml \
--checkpoint-path /tmp/bicaptioning_R_50_L1_H2048/checkpoint_500000.pth \
--weight-init virtex \
--num-gpus-per-machine 8 \
--cpu-workers 4 \
--serialization-dir /tmp/bicaptioning_R_50_L1_H2048/inaturalist_500000 \
--checkpoint-every 1710 # 1 epoch of iNaturalist
-------------------------------------------------------------------------------
Image Captioning on COCO Captions val2017
-----------------------------------------
Evaluate a pretrained VirTex model on image captioning for COCO Captions val2017
split (reporting CIDEr and SPICE metics):
.. code-block:: shell
python scripts/eval_captioning.py \
--config /tmp/bicaptioning_R_50_L1_H2048/pretrain_config.yaml \
--checkpoint-path /tmp/bicaptioning_R_50_L1_H2048/checkpoint_500000.pth \
--calc-metrics \
--num-gpus-per-machine 1 \
--cpu-workers 4
-------------------------------------------------------------------------------
Running Image Captioning Inference on Arbitrary Images
------------------------------------------------------
The above script can be used for generating captions for any images in a directory.
Replace certain commands as follows:
.. code-block:: shell
python scripts/eval_captioning.py \
--config /tmp/bicaptioning_R_50_L1_H2048/pretrain_config.yaml \
--checkpoint-path /tmp/bicaptioning_R_50_L1_H2048/checkpoint_500000.pth \
--data-root /path/to/images_dir \
--output /path/to/save/predictions.json \
--num-gpus-per-machine 1 \
--cpu-workers 4
This script will save predictions in JSON format. Since our goal is to not
improve image captioning, these models may not generate the best captions.