hpc-yekin
initial commit
92e0882
|
raw
history blame
2.75 kB

Zero-Shot Referring Expression Comprehension on RefCOCO

Preparing Data

1.Download images for RefCOCO/g/+. Put downloaded dataset(train2014) to eval/rec_zs_test/data/.

2.Download preprocessed data files via gsutil cp gs://reclip-sanjays/reclip_data.tar.gz and cd rec_zs_test, and then extract the data using tar -xvzf reclip_data.tar.gz.

Preparing model

3.Download SAM (vit-h), Alpha-CLIP model, and put them in ./eval/rec_zs_test/ckpt.

β”œβ”€β”€ eval
β”‚   β”œβ”€β”€ rec_zs_test
β”‚   β”‚   β”œβ”€β”€ data
β”‚   β”‚       └── train2014
β”‚   β”‚   β”œβ”€β”€ reclip_data
β”‚   β”‚       └── refcoco_val.jsonl
β”‚   β”‚       └── refcoco_dets_dict.json
β”‚   β”‚           ...
β”‚   β”‚   β”œβ”€β”€ ckpt
β”‚   β”‚       └── sam_vit_h_4b8939.pth
β”‚   β”‚       └── grit1m
β”‚   β”‚           └── clip_b16_grit+mim_fultune_4xe.pth
β”‚   β”‚           └── clip_l14_grit+mim_fultune_6xe.pth
β”‚   β”‚   β”œβ”€β”€ methods
β”‚   β”‚   β”œβ”€β”€ cache
β”‚   β”‚   β”œβ”€β”€ output
β”‚   β”‚   β”œβ”€β”€ main.py
β”‚   β”‚   β”œβ”€β”€ executor.py
β”‚   β”‚   β”œβ”€β”€ run.sh
β”‚   β”‚   β”œβ”€β”€ ...

4.run test script.

cd eval/rec_zs_test
bash run.sh

or

python main.py --input_file reclip_data/refcoco_val.jsonl --image_root ./data/train2014 --method parse --gradcam_alpha 0.5 0.5 --box_representation_method full,blur --box_method_aggregator sum --clip_model ViT-B/16,ViT-L/14 --detector_file reclip_data/refcoco+_dets_dict.json --cache_path ./cache

(We recommend using cache_path to reduce time to generate mask by SAM for a image repeatedly.`)

For multi-gpus testing, try:

bash run_multi_gpus.sh
python cal_acc.py refcoco_val

Acknowledgement

We test our model based on the wonderful work ReCLIP. We simply replace CLIP with Alpha-CLIP; and skip the image-cropping operation.

Experiment results

Method RefCOCO RefCOCO+ RefCOCOg
Val TestA TestB Val TestA TestB Val Test
CPT [67] 32.2 36.1 30.3 31.9 35.2 28.8 36.7 36.5
ReCLIP [54] 45.8 46.1 47.1 47.9 50.1 45.1 59.3 59.0
Red Circle [52] 49.8 58.6 39.9 55.3 63.9 45.4 59.4 58.9
Alpha-CLIP 55.7 61.1 50.3 55.6 62.7 46.4 61.2 62.0