File size: 6,513 Bytes
6e14436
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
# Prepare datasets for Detic

The basic training of our model uses [LVIS](https://www.lvisdataset.org/) (which uses [COCO](https://cocodataset.org/) images) and [ImageNet-21K](https://www.image-net.org/download.php). 
Some models are trained on [Conceptual Caption (CC3M)](https://ai.google.com/research/ConceptualCaptions/).
Optionally, we use [Objects365](https://www.objects365.org/) and [OpenImages (Challenge 2019 version)](https://storage.googleapis.com/openimages/web/challenge2019.html) for cross-dataset evaluation. 
Before starting processing, please download the (selected) datasets from the official websites and place or sim-link them under `$Detic_ROOT/datasets/`. 

```
$Detic_ROOT/datasets/
    metadata/
    lvis/
    coco/
    imagenet/
    cc3m/
    objects365/
    oid/
```
`metadata/` is our preprocessed meta-data (included in the repo). See the below [section](#Metadata) for details.
Please follow the following instruction to pre-process individual datasets.

### COCO and LVIS

First, download COCO and LVIS data place them in the following way:

```
lvis/
    lvis_v1_train.json
    lvis_v1_val.json
coco/
    train2017/
    val2017/
    annotations/
        captions_train2017.json
        instances_train2017.json 
        instances_val2017.json
```

Next, prepare the open-vocabulary LVIS training set using 

```
python tools/remove_lvis_rare.py --ann datasets/lvis/lvis_v1_train.json
```

This will generate `datasets/lvis/lvis_v1_train_norare.json`.

### ImageNet-21K

The ImageNet-21K folder should look like:
```
imagenet/
    ImageNet-21K/
        n01593028.tar
        n01593282.tar
        ...
```

We first unzip the overlapping classes of LVIS (we will directly work with the .tar file for the rest classes) and convert them into LVIS annotation format.

~~~
mkdir imagenet/annotations
python tools/unzip_imagenet_lvis.py --dst_path datasets/imagenet/ImageNet-LVIS
python tools/create_imagenetlvis_json.py --imagenet_path datasets/imagenet/ImageNet-LVIS --out_path datasets/imagenet/annotations/imagenet_lvis_image_info.json
~~~
This creates `datasets/imagenet/annotations/imagenet_lvis_image_info.json`.

[Optional] To train with all the 21K classes, run

~~~
python tools/get_imagenet_21k_full_tar_json.py
python tools/create_lvis_21k.py
~~~
This creates `datasets/imagenet/annotations/imagenet-21k_image_info_lvis-21k.json` and `datasets/lvis/lvis_v1_train_lvis-21k.json` (combined LVIS and ImageNet-21K classes in `categories`).

[Optional] To train on combined LVIS and COCO, run

~~~
python tools/merge_lvis_coco.py
~~~
This creates `datasets/lvis/lvis_v1_train+coco_mask.json`

### Conceptual Caption


Download the dataset from [this](https://ai.google.com/research/ConceptualCaptions/download) page and place them as:
```
cc3m/
    GCC-training.tsv
```

Run the following command to download the images and convert the annotations to LVIS format (Note: download images takes long).

~~~
python tools/download_cc.py --ann datasets/cc3m/GCC-training.tsv --save_image_path datasets/cc3m/training/ --out_path datasets/cc3m/train_image_info.json
python tools/get_cc_tags.py
~~~

This creates `datasets/cc3m/train_image_info_tags.json`.

### Objects365
Download Objects365 (v2) from the website. We only need the validation set in this project:
```
objects365/
    annotations/
        zhiyuan_objv2_val.json
    val/
        images/
            v1/
                patch0/
                ...
                patch15/
            v2/
                patch16/
                ...
                patch49/

```

The original annotation has typos in the class names, we first fix them for our following use of language embeddings.

```
python tools/fix_o365_names.py --ann datasets/objects365/annotations/zhiyuan_objv2_val.json
```
This creates `datasets/objects365/zhiyuan_objv2_val_fixname.json`.

To train on Objects365, download the training images and use the command above.  We note some images in the training annotation do not exist.
We use the following command to filter the missing images.
~~~
python tools/fix_0365_path.py
~~~
This creates `datasets/objects365/zhiyuan_objv2_train_fixname_fixmiss.json`.

### OpenImages

We followed the instructions in [UniDet](https://github.com/xingyizhou/UniDet/blob/master/projects/UniDet/unidet_docs/DATASETS.md#openimages) to convert the metadata for OpenImages.

The converted folder should look like

```
oid/
    annotations/
        oid_challenge_2019_train_bbox.json
        oid_challenge_2019_val_expanded.json
    images/
        0/
        1/
        2/
        ...
```

### Open-vocabulary COCO

We first follow [OVR-CNN](https://github.com/alirezazareian/ovr-cnn/blob/master/ipynb/003.ipynb) to create the open-vocabulary COCO split. The converted files should be like 

```
coco/
    zero-shot/
        instances_train2017_seen_2.json
        instances_val2017_all_2.json
```

We further pre-process the annotation format for easier evaluation:

```
python tools/get_coco_zeroshot_oriorder.py --data_path datasets/coco/zero-shot/instances_train2017_seen_2.json
python tools/get_coco_zeroshot_oriorder.py --data_path datasets/coco/zero-shot/instances_val2017_all_2.json
```

Next, we preprocess the COCO caption data:

```
python tools/get_cc_tags.py --cc_ann datasets/coco/annotations/captions_train2017.json --out_path datasets/coco/captions_train2017_tags_allcaps.json --allcaps --convert_caption
```
This creates `datasets/coco/captions_train2017_tags_allcaps.json`.

### Metadata

```
metadata/
    lvis_v1_train_cat_info.json
    coco_clip_a+cname.npy
    lvis_v1_clip_a+cname.npy
    o365_clip_a+cnamefix.npy
    oid_clip_a+cname.npy
    imagenet_lvis_wnid.txt
    Objects365_names_fix.csv
```

`lvis_v1_train_cat_info.json` is used by the Federated loss.
This is created by 
~~~
python tools/get_lvis_cat_info.py --ann datasets/lvis/lvis_v1_train.json
~~~

`*_clip_a+cname.npy` is the pre-computed CLIP embeddings for each datasets.
They are created by (taking LVIS as an example)
~~~
python tools/dump_clip_features.py --ann datasets/lvis/lvis_v1_val.json --out_path metadata/lvis_v1_clip_a+cname.npy
~~~
Note we do not include the 21K class embeddings due to the large file size.
To create it, run
~~~
python tools/dump_clip_features.py --ann datasets/lvis/lvis_v1_val_lvis-21k.json --out_path datasets/metadata/lvis-21k_clip_a+cname.npy
~~~

`imagenet_lvis_wnid.txt` is the list of matched classes between ImageNet-21K and LVIS.

`Objects365_names_fix.csv` is our manual fix of the Objects365 names.