Spaces:
Running
Running
jiang
commited on
Commit
•
3c6babc
1
Parent(s):
650c5f6
update
Browse files
README.md
CHANGED
@@ -1,185 +1,13 @@
|
|
1 |
-
|
2 |
-
|
3 |
-
|
4 |
-
|
5 |
-
|
6 |
-
|
7 |
-
|
8 |
-
|
9 |
-
|
10 |
-
|
11 |
-
|
12 |
-
|
13 |
-
|
14 |
-
## :notes: Introduction
|
15 |
-
![github_figure](pipeline.gif)
|
16 |
-
PolyFormer is a unified model for referring image segmentation (polygon vertex sequence) and referring expression comprehension (bounding box corner points). The polygons are converted to segmentation masks in the end.
|
17 |
-
|
18 |
-
**Contributions:**
|
19 |
-
|
20 |
-
* State-of-the-art results on referring image segmentation and referring expression comprehension on 6 datasets;
|
21 |
-
* A unified framework for referring image segmentation (RIS) and referring expression comprehension (REC) by formulating them as a sequence-to-sequence (seq2seq) prediction problem;
|
22 |
-
* A regression-based decoder for accurate coordinate prediction, which outputs continuous 2D coordinates directly without quantization error..
|
23 |
-
|
24 |
-
|
25 |
-
|
26 |
-
## Getting Started
|
27 |
-
### Installation
|
28 |
-
```bash
|
29 |
-
conda create -n polyformer python=3.7.4
|
30 |
-
conda activate polyformer
|
31 |
-
python -m pip install -r requirements.txt
|
32 |
-
```
|
33 |
-
Note: if you are getting import errors from `fairseq`, try the following:
|
34 |
-
```bash
|
35 |
-
python -m pip install pip==21.2.4
|
36 |
-
pip uninstall fairseq
|
37 |
-
pip install -r requirements.txt
|
38 |
-
```
|
39 |
-
|
40 |
-
## Datasets
|
41 |
-
### Prepare Pretraining Data
|
42 |
-
1. Create the dataset folders
|
43 |
-
```bash
|
44 |
-
mkdir datasets
|
45 |
-
mkdir datasets/images
|
46 |
-
mkdir datasets/annotations
|
47 |
-
```
|
48 |
-
2. Download the *2014 Train images [83K/13GB]* from [COCO](https://cocodataset.org/#download),
|
49 |
-
original [Flickr30K images](http://shannon.cs.illinois.edu/DenotationGraph/),
|
50 |
-
[ReferItGame images](https://drive.google.com/file/d/1R6Tm7tQTHCil6A_eOhjudK3rgaBxkD2t/view?usp=sharing),
|
51 |
-
and [Visual Genome images](http://visualgenome.org/api/v0/api_home.html), and extract them to `datasets/images`.
|
52 |
-
3. Download the annotation file for pretraining datasets [instances.json](https://drive.google.com/drive/folders/1O4hzL8_s3aUsnj_JZnM3CwANd7TejcJO)
|
53 |
-
provided by [SeqTR](https://github.com/sean-zhuh/SeqTR) and store it in `datasets/annotations`.
|
54 |
-
The workspace directory should be organized like this:
|
55 |
-
```
|
56 |
-
PolyFormer/
|
57 |
-
├── datasets/
|
58 |
-
│ ├── images
|
59 |
-
│ │ ├── flickr30k/*.jpg
|
60 |
-
│ │ ├── mscoco/
|
61 |
-
│ │ │ └── train2014/*.jpg
|
62 |
-
│ │ ├── saiaprtc12/*.jpg
|
63 |
-
│ │ └── visual-genome/*.jpg
|
64 |
-
│ └── annotations
|
65 |
-
│ └── instances.json
|
66 |
-
└── ...
|
67 |
-
```
|
68 |
-
4. Generate the tsv files for pretraining
|
69 |
-
```bash
|
70 |
-
python data/create_pretraining_data.py
|
71 |
-
```
|
72 |
-
### Prepare Finetuning Data
|
73 |
-
1. Follow the instructions in the `./refer` directory to set up subdirectories
|
74 |
-
and download annotations.
|
75 |
-
This directory is based on the [refer](https://github.com/lichengunc/refer) API.
|
76 |
-
|
77 |
-
2. Generate the tsv files for finetuning
|
78 |
-
```bash
|
79 |
-
python data/create_finetuning_data.py
|
80 |
-
```
|
81 |
-
|
82 |
-
|
83 |
-
|
84 |
-
|
85 |
-
## Pretraining
|
86 |
-
1. Create the checkpoints folder
|
87 |
-
```bash
|
88 |
-
mkdir weights
|
89 |
-
```
|
90 |
-
2. Download pretrain weights of [Swin-base](https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_base_patch4_window12_384_22k.pth),
|
91 |
-
[Swin-large](https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_large_patch4_window12_384_22k.pth),
|
92 |
-
[BERT-base](https://cdn.huggingface.co/bert-base-uncased-pytorch_model.bin)
|
93 |
-
and put the weight files in `./pretrained_weights`.
|
94 |
-
These weights are needed for training to initialize the model.
|
95 |
-
|
96 |
-
|
97 |
-
3. Run the pretraining scripts for model pretraining on the referring expression comprehension task:
|
98 |
-
```bash
|
99 |
-
cd run_scripts/pretrain
|
100 |
-
bash pretrain_polyformer_b.sh # for pretraining PolyFormer-B model
|
101 |
-
bash pretrain_polyformer_l.sh # for pretraining PolyFormer-L model
|
102 |
-
```
|
103 |
-
|
104 |
-
## Finetuning
|
105 |
-
Run the finetuning scripts for model pretraining on the referring image segmentation and referring expression comprehension tasks:
|
106 |
-
```bash
|
107 |
-
cd run_scripts/finetune
|
108 |
-
bash train_polyformer_b.sh # for finetuning PolyFormer-B model
|
109 |
-
bash train_polyformer_l.sh # for finetuning PolyFormer-L model
|
110 |
-
```
|
111 |
-
Please make sure to link the pretrain weight paths (Line 20) in the finetuning scripts to the best pretraining checkpoints.
|
112 |
-
|
113 |
-
## Evaluation
|
114 |
-
Run the evaluation scripts for evaluating on the referring image segmentation and referring expression comprehension tasks:
|
115 |
-
```bash
|
116 |
-
cd run_scripts/evaluation
|
117 |
-
|
118 |
-
# for evaluating PolyFormer-B model
|
119 |
-
bash evaluate_polyformer_b_refcoco.sh
|
120 |
-
bash evaluate_polyformer_b_refcoco+.sh
|
121 |
-
bash evaluate_polyformer_b_refcocog.sh
|
122 |
-
|
123 |
-
# for evaluating PolyFormer-L model
|
124 |
-
bash evaluate_polyformer_l_refcoco.sh
|
125 |
-
bash evaluate_polyformer_l_refcoco+.sh
|
126 |
-
bash evaluate_polyformer_l_refcocog.sh
|
127 |
-
```
|
128 |
-
|
129 |
-
## Model Zoo
|
130 |
-
Download the model weights to `./weights` if you want to use our trained models for finetuning and evaluation.
|
131 |
-
|
132 |
-
| | Refcoco val| | | Refcoco testA| | | Refcoco testB| ||
|
133 |
-
|-------------------------------------------------------------------------------------------------------|------|------|---------|------|-------|------|-----|------|------|
|
134 |
-
| Model | oIoU | mIoU | Prec@0.5 | oIoU | mIoU |Prec@0.5 | oIoU | mIoU |Prec@0.5 |
|
135 |
-
| [PolyFormer-B](https://drive.google.com/file/d/1K0y-WBO6cL7gBzNnJaHAeNu3pgq4DbJ9/view?usp=share_link) | 74.82| 75.96 | 89.73 |76.64| 77.09 | 91.73| 71.06| 73.22 | 86.03 |
|
136 |
-
| [PolyFormer-L](https://drive.google.com/file/d/15P6m5RI6HAQE2QXQXMAjw_oBsaPii7b3/view?usp=share_link) | 75.96| 76.94 | 90.38 |78.29| 78.49 | 92.89| 73.25| 74.83 | 87.16|
|
137 |
-
|
138 |
-
|
139 |
-
| [test_demo.py](..%2F..%2FDownloads%2Ftest_demo.py) | Refcoco val| | | Refcoco testA| | | Refcoco testB| ||
|
140 |
-
|--------------------------------------------------------------------------------------------------------|------|------|------|------|------|------|------|------|------|
|
141 |
-
| Model | oIoU | mIoU |Prec@0.5| oIoU | mIoU |Prec@0.5 | oIoU | mIoU |Prec@0.5 |
|
142 |
-
| [PolyFormer-B ](https://drive.google.com/file/d/12_ylFhsbqGySxDqgeEByn8nKoJtT2n2w/view?usp=share_link) | 67.64| 70.65 | 83.73 | 72.89| 74.51 | 88.60 | 59.33| 64.64 | 76.38 | 67.76| 69.36 |
|
143 |
-
| [PolyFormer-L](https://drive.google.com/file/d/1lUCv7dUPctEz4vEpPr7aI8A8ZmfYCB8y/view?usp=share_link) | 69.33| 72.15 | 84.98 | 74.56| 75.71 | 89.77 | 61.87| 66.73 | 77.97 | 69.20| 71.15 |
|
144 |
-
|
145 |
-
|
146 |
-
| | Refcocog val| || | Refcocog test| |
|
147 |
-
|-------------------------------------------------------------------------------------------------------|------|------|------|------|------|------|
|
148 |
-
| Model | oIoU | mIoU |Prec@0.5 | oIoU | mIoU |Prec@0.5 |
|
149 |
-
| [PolyFormer-B](https://drive.google.com/file/d/12_ylFhsbqGySxDqgeEByn8nKoJtT2n2w/view?usp=share_link) | 67.76| 69.36 | 84.46| 69.05| 69.88 | 84.96 |
|
150 |
-
| [PolyFormer-L](https://drive.google.com/file/d/1lUCv7dUPctEz4vEpPr7aI8A8ZmfYCB8y/view?usp=share_link) | 69.20| 71.15 | 85.83 | 70.19| 71.17 | 85.91|
|
151 |
-
|
152 |
-
* Pretrained weights:
|
153 |
-
* [PolyFormer-B](https://drive.google.com/file/d/1sAzfChYDdHdaeatB2K14lrJjG4uiXAol/view?usp=share_link)
|
154 |
-
* [PolyFormer-L](https://drive.google.com/file/d/1knRxgM1lmEkuZZ-cOm_fmwKP1H0bJGU9/view?usp=share_link)
|
155 |
-
|
156 |
-
# Acknowlegement
|
157 |
-
This codebase is developed based on [OFA](https://github.com/OFA-Sys/OFA).
|
158 |
-
Other related codebases include:
|
159 |
-
* [Fairseq](https://github.com/pytorch/fairseq)
|
160 |
-
* [refer](https://github.com/lichengunc/refer)
|
161 |
-
* [LAVT-RIS](https://github.com/yz93/LAVT-RIS/)
|
162 |
-
* [SeqTR](https://github.com/sean-zhuh/SeqTR)
|
163 |
-
|
164 |
-
|
165 |
-
|
166 |
-
# Citation
|
167 |
-
Please cite our paper if you find this codebase helpful :)
|
168 |
-
|
169 |
-
```
|
170 |
-
@inproceedings{liu2023polyformer,
|
171 |
-
title={PolyFormer: Referring Image Segmentation as Sequential Polygon Generation},
|
172 |
-
author={Liu, Jiang and Ding, Hui and Cai, Zhaowei and Zhang, Yuting and Satzoda, Ravi Kumar and Mahadevan, Vijay and Manmatha, R},
|
173 |
-
booktitle={CVPR},
|
174 |
-
year={2023}
|
175 |
-
}
|
176 |
-
```
|
177 |
-
|
178 |
-
## Security
|
179 |
-
|
180 |
-
See [CONTRIBUTING](CONTRIBUTING.md#security-issue-notifications) for more information.
|
181 |
-
|
182 |
-
## License
|
183 |
-
|
184 |
-
This project is licensed under the Apache-2.0 License.
|
185 |
-
|
|
|
1 |
+
---
|
2 |
+
title: PolyFormer
|
3 |
+
emoji: 🖌️🎨
|
4 |
+
colorFrom: pink
|
5 |
+
colorTo: purple
|
6 |
+
sdk: gradio
|
7 |
+
sdk_version: 3.14.0
|
8 |
+
app_file: app.py
|
9 |
+
pinned: false
|
10 |
+
license: afl-3.0
|
11 |
+
---
|
12 |
+
|
13 |
+
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|