File size: 1,849 Bytes
bcec54e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
# Referring Image Segmentation
## Getting Started 

1. Install the required packages.

```
pip install -r requirements.txt
```

2. Prepare RefCOCO datasets following [LAVT](https://github.com/yz93/LAVT-RIS).

* Download COCO 2014 Train Images [83K/13GB] from [COCO](https://cocodataset.org/#download), and extract `train2014.zip` to `./refer/data/images/mscoco/images`

* Follow the instructions in `./refer` to download and extract `refclef.zip, refcoco.zip, refcoco+.zip, refcocog.zip` to `./refer/data`

Your dataset directory should be:

```
refer/
β”œβ”€β”€data/
β”‚  β”œβ”€β”€ images/mscoco/images/
β”‚  β”œβ”€β”€ refclef
β”‚  β”œβ”€β”€ refcoco
β”‚  β”œβ”€β”€ refcoco+
β”‚  β”œβ”€β”€ refcocog
β”œβ”€β”€evaluation/
β”œβ”€β”€...
```

## Results and Fine-tuned Models of EVP
EVP achieves 76.35 overall IoU and 77.61 mean IoU on the validation set of RefCOCO.

## Training

We count the max length of referring sentences and set the token length of lenguage model accrodingly. The checkpoint of the best epoch would be saved at `./checkpoints/`.

* Train on RefCOCO

```
bash train.sh refcoco /path/to/logdir <NUM_GPUS> --token_length 40
```

* Train on RefCOCO+

```
bash train.sh refcoco+ /path/to/logdir <NUM_GPUS> --token_length 40
```

* Train on RefCOCOg

```
bash train.sh refcocog /path/to/logdir <NUM_GPUS> --token_length 77 --splitBy umd
```

## Evaluation

* Evaluate on RefCOCO

```
bash test.sh refcoco /path/to/evp_ris_refcoco.pth --token_length 40
```

* Evaluate on RefCOCO+

```
bash test.sh refcoco+ /path/to/evp_ris_refcoco+.pth --token_length 40
```

* Evaluate on RefCOCOg

```
bash test.sh refcocog /path/to/evp_ris_gref.pth --token_length 77 --splitBy umd
```

## Custom inference
```
PYTHONPATH="../":$PYTHONPATH python inference.py --img_path test_img.jpg --resume refcoco.pth --token_length 40 --prompt 'green plant'
```