Spaces:
Running
on
Zero
Running
on
Zero
Zero-Shot Referring Expression Comprehension on RefCOCO
Preparing Data
1.Download images for RefCOCO/g/+. Put downloaded dataset(train2014) to eval/rec_zs_test/data/.
2.Download preprocessed data files via gsutil cp gs://reclip-sanjays/reclip_data.tar.gz
and cd rec_zs_test
, and then extract the data using tar -xvzf reclip_data.tar.gz
.
Preparing model
3.Download SAM (vit-h), Alpha-CLIP model, and put them in ./eval/rec_zs_test/ckpt.
βββ eval
β βββ rec_zs_test
β β βββ data
β β βββ train2014
β β βββ reclip_data
β β βββ refcoco_val.jsonl
β β βββ refcoco_dets_dict.json
β β ...
β β βββ ckpt
β β βββ sam_vit_h_4b8939.pth
β β βββ grit1m
β β βββ clip_b16_grit+mim_fultune_4xe.pth
β β βββ clip_l14_grit+mim_fultune_6xe.pth
β β βββ methods
β β βββ cache
β β βββ output
β β βββ main.py
β β βββ executor.py
β β βββ run.sh
β β βββ ...
4.run test script.
cd eval/rec_zs_test
bash run.sh
or
python main.py --input_file reclip_data/refcoco_val.jsonl --image_root ./data/train2014 --method parse --gradcam_alpha 0.5 0.5 --box_representation_method full,blur --box_method_aggregator sum --clip_model ViT-B/16,ViT-L/14 --detector_file reclip_data/refcoco+_dets_dict.json --cache_path ./cache
(We recommend using cache_path
to reduce time to generate mask by SAM for a image repeatedly.`)
For multi-gpus testing, try:
bash run_multi_gpus.sh
python cal_acc.py refcoco_val
Acknowledgement
We test our model based on the wonderful work ReCLIP. We simply replace CLIP with Alpha-CLIP; and skip the image-cropping operation.
Experiment results
Method | RefCOCO | RefCOCO+ | RefCOCOg | |||||
---|---|---|---|---|---|---|---|---|
Val | TestA | TestB | Val | TestA | TestB | Val | Test | |
CPT [67] | 32.2 | 36.1 | 30.3 | 31.9 | 35.2 | 28.8 | 36.7 | 36.5 |
ReCLIP [54] | 45.8 | 46.1 | 47.1 | 47.9 | 50.1 | 45.1 | 59.3 | 59.0 |
Red Circle [52] | 49.8 | 58.6 | 39.9 | 55.3 | 63.9 | 45.4 | 59.4 | 58.9 |
Alpha-CLIP | 55.7 | 61.1 | 50.3 | 55.6 | 62.7 | 46.4 | 61.2 | 62.0 |