# Prepare Datasets for CAT-Seg

A dataset can be used by accessing [DatasetCatalog](https://detectron2.readthedocs.io/modules/data.html#detectron2.data.DatasetCatalog)
for its data, or [MetadataCatalog](https://detectron2.readthedocs.io/modules/data.html#detectron2.data.MetadataCatalog) for its metadata (class names, etc).
This document explains how to setup the builtin datasets so they can be used by the above APIs.
[Use Custom Datasets](https://detectron2.readthedocs.io/tutorials/datasets.html) gives a deeper dive on how to use `DatasetCatalog` and `MetadataCatalog`,
and how to add new datasets to them.

CAT-Seg has builtin support for a few datasets.
The datasets are assumed to exist in a directory specified by the environment variable
`DETECTRON2_DATASETS`.
Under this directory, detectron2 will look for datasets in the structure described below, if needed.
```
$DETECTRON2_DATASETS/
  coco/                   # COCO-Stuff
  ADEChallengeData2016/   # ADE20K-150
  ADE20K_2021_17_01/      # ADE20K-847
  VOCdevkit/ 
    VOC2010/              # PASCAL Context
    VOC2012/              # PASCAL VOC
```

You can set the location for builtin datasets by `export DETECTRON2_DATASETS=/path/to/datasets`.
If left unset, the default is `./datasets` relative to your current working directory.

## Prepare data for [COCO-Stuff](https://github.com/nightrome/cocostuff):

### Expected data structure

```
coco-stuff/
  annotations/
    train2017/
    val2017/
  images/
    train2017/
    val2017/
  # below are generated by prepare_coco_stuff.py
  annotations_detectron2/
    train2017/
    val2017/ 
```
Download the COCO (2017) images from https://cocodataset.org/

```bash
wget http://images.cocodataset.org/zips/train2017.zip
wget http://images.cocodataset.org/zips/val2017.zip
```

Download the COCO-Stuff annotation from https://github.com/nightrome/cocostuff.
```bash
wget http://calvin.inf.ed.ac.uk/wp-content/uploads/data/cocostuffdataset/stuffthingmaps_trainval2017.zip
```
Unzip `train2017.zip`, `val2017.zip`, and `stuffthingmaps_trainval2017.zip`. Then put them to the correct location listed above.

Generate the labels for training and testing.

```
python datasets/prepare_coco_stuff.py
```


## Prepare data for [ADE20K-150](http://sceneparsing.csail.mit.edu):

### Expected data structure 
```
ADEChallengeData2016/
  annotations/
    validation/
  images/
    validation/
  # below are generated by prepare_ade20k_150.py
  annotations_detectron2/
    validation/
```
Download the data of ADE20K-150 from http://sceneparsing.csail.mit.edu.
```
wget http://data.csail.mit.edu/places/ADEchallenge/ADEChallengeData2016.zip
```
Unzip `ADEChallengeData2016.zip` and generate the labels for testing.
```
python datasets/prepare_ade20k_150.py
```
## Prepare data for [ADE20k-847](https://groups.csail.mit.edu/vision/datasets/ADE20K/):

### Expected data structure 
```
ADE20K_2021_17_01/
  images/
    ADE/
      validation/
  index_ade20k.mat
  index_ade20k.pkl
  # below are generated by prepare_ade20k_847.py
  annotations_detectron2/
    validation/
```
Download the data of ADE20k-Full from https://groups.csail.mit.edu/vision/datasets/ADE20K/request_data/
Unzip the dataset and generate the labels for testing.
```
python datasets/prepare_ade20k_847.py
```

## Prepare data for [PASCAL VOC 2012](http://host.robots.ox.ac.uk/pascal/VOC/voc2012/#devkit):


### Expected data structure 
```
VOCdevkit/
  VOC2012/
    Annotations/
    ImageSets/
    JPEGImages/
    SegmentationClass/
    SegmentationClassAug/ 
    SegmentationObject/
    # below are generated by prepare_voc.py
    annotations_detectron2
    annotations_detectron2_bg

```
Download the data of PASCAL VOC from http://host.robots.ox.ac.uk/pascal/VOC/voc2012/#devkit.

We use SBD augmentated training data as SegmentationClassAug following [Deeplab](https://github.com/kazuto1011/deeplab-pytorch/blob/master/data/datasets/voc12/README.md).
```
wget http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar
wget https://www.dropbox.com/s/oeu149j8qtbs1x0/SegmentationClassAug.zip
```
Unzip `VOCtrainval_11-May-2012.tar` and `SegmentationClassAug.zip`. Then put them to the correct location listed above and generate the labels for testing.
```
python datasets/prepare_voc.py
```


## Prepare data for [PASCAL Context](https://www.cs.stanford.edu/~roozbeh/pascal-context/):


### Expected data structure 
```
VOCdevkit/
  VOC2010/
    Annotations/
    ImageSets/
    JPEGImages/
    SegmentationClass/
    SegmentationObject/
    trainval/
    labels.txt
    59_labels.txt
    pascalcontext_val.txt
    # below are generated by prepare_pascal_context.py
    annotations_detectron2/
      pc459_val
      pc59_val
```
Download the data of PASCAL VOC 2010 from https://www.cs.stanford.edu/~roozbeh/pascal-context/. 

```
wget http://host.robots.ox.ac.uk/pascal/VOC/voc2010/VOCtrainval_03-May-2010.tar
wget https://www.cs.stanford.edu/~roozbeh/pascal-context/trainval.tar.gz
wget https://www.cs.stanford.edu/~roozbeh/pascal-context/59_labels.txt
```
Unzip `VOCtrainval_03-May-2010.tar` and `trainval.tar.gz`. Then put them to the correct location listed above and generate the labels for testing.
```
python datasets/prepare_pascal_context.py
```