Spaces:
Running
on
A10G
Running
on
A10G
Referring Image Segmentation
Getting Started
- Install the required packages.
pip install -r requirements.txt
- Prepare RefCOCO datasets following LAVT.
Download COCO 2014 Train Images [83K/13GB] from COCO, and extract
train2014.zip
to./refer/data/images/mscoco/images
Follow the instructions in
./refer
to download and extractrefclef.zip, refcoco.zip, refcoco+.zip, refcocog.zip
to./refer/data
Your dataset directory should be:
refer/
βββdata/
β βββ images/mscoco/images/
β βββ refclef
β βββ refcoco
β βββ refcoco+
β βββ refcocog
βββevaluation/
βββ...
Results and Fine-tuned Models of EVP
EVP achieves 76.35 overall IoU and 77.61 mean IoU on the validation set of RefCOCO.
Training
We count the max length of referring sentences and set the token length of lenguage model accrodingly. The checkpoint of the best epoch would be saved at ./checkpoints/
.
- Train on RefCOCO
bash train.sh refcoco /path/to/logdir <NUM_GPUS> --token_length 40
- Train on RefCOCO+
bash train.sh refcoco+ /path/to/logdir <NUM_GPUS> --token_length 40
- Train on RefCOCOg
bash train.sh refcocog /path/to/logdir <NUM_GPUS> --token_length 77 --splitBy umd
Evaluation
- Evaluate on RefCOCO
bash test.sh refcoco /path/to/evp_ris_refcoco.pth --token_length 40
- Evaluate on RefCOCO+
bash test.sh refcoco+ /path/to/evp_ris_refcoco+.pth --token_length 40
- Evaluate on RefCOCOg
bash test.sh refcocog /path/to/evp_ris_gref.pth --token_length 77 --splitBy umd
Custom inference
PYTHONPATH="../":$PYTHONPATH python inference.py --img_path test_img.jpg --resume refcoco.pth --token_length 40 --prompt 'green plant'