Spaces:
Running
on
A10G
Running
on
A10G
# Referring Image Segmentation | |
## Getting Started | |
1. Install the required packages. | |
``` | |
pip install -r requirements.txt | |
``` | |
2. Prepare RefCOCO datasets following [LAVT](https://github.com/yz93/LAVT-RIS). | |
* Download COCO 2014 Train Images [83K/13GB] from [COCO](https://cocodataset.org/#download), and extract `train2014.zip` to `./refer/data/images/mscoco/images` | |
* Follow the instructions in `./refer` to download and extract `refclef.zip, refcoco.zip, refcoco+.zip, refcocog.zip` to `./refer/data` | |
Your dataset directory should be: | |
``` | |
refer/ | |
βββdata/ | |
β βββ images/mscoco/images/ | |
β βββ refclef | |
β βββ refcoco | |
β βββ refcoco+ | |
β βββ refcocog | |
βββevaluation/ | |
βββ... | |
``` | |
## Results and Fine-tuned Models of EVP | |
EVP achieves 76.35 overall IoU and 77.61 mean IoU on the validation set of RefCOCO. | |
## Training | |
We count the max length of referring sentences and set the token length of lenguage model accrodingly. The checkpoint of the best epoch would be saved at `./checkpoints/`. | |
* Train on RefCOCO | |
``` | |
bash train.sh refcoco /path/to/logdir <NUM_GPUS> --token_length 40 | |
``` | |
* Train on RefCOCO+ | |
``` | |
bash train.sh refcoco+ /path/to/logdir <NUM_GPUS> --token_length 40 | |
``` | |
* Train on RefCOCOg | |
``` | |
bash train.sh refcocog /path/to/logdir <NUM_GPUS> --token_length 77 --splitBy umd | |
``` | |
## Evaluation | |
* Evaluate on RefCOCO | |
``` | |
bash test.sh refcoco /path/to/evp_ris_refcoco.pth --token_length 40 | |
``` | |
* Evaluate on RefCOCO+ | |
``` | |
bash test.sh refcoco+ /path/to/evp_ris_refcoco+.pth --token_length 40 | |
``` | |
* Evaluate on RefCOCOg | |
``` | |
bash test.sh refcocog /path/to/evp_ris_gref.pth --token_length 77 --splitBy umd | |
``` | |
## Custom inference | |
``` | |
PYTHONPATH="../":$PYTHONPATH python inference.py --img_path test_img.jpg --resume refcoco.pth --token_length 40 --prompt 'green plant' | |
``` | |