# Prepare Datasets for FrozenSeg A dataset can be used by accessing [DatasetCatalog](https://detectron2.readthedocs.io/modules/data.html#detectron2.data.DatasetCatalog) for its data, or [MetadataCatalog](https://detectron2.readthedocs.io/modules/data.html#detectron2.data.MetadataCatalog) for its metadata (class names, etc). This document explains how to setup the builtin datasets so they can be used by the above APIs. [Use Custom Datasets](https://detectron2.readthedocs.io/tutorials/datasets.html) gives a deeper dive on how to use `DatasetCatalog` and `MetadataCatalog`, and how to add new datasets to them. FrozenSeg has builtin support for a few datasets. The datasets are assumed to exist in a directory specified by the environment variable `DETECTRON2_DATASETS`. Under this directory, detectron2 will look for datasets in the structure described below, if needed. ``` $DETECTRON2_DATASETS/ # panoptic datasets ADEChallengeData2016/ coco/ cityscapes/ mapillary_vistas/ bdd100k/ # semantic datasets VOCdevkit/ ADE20K_2021_17_01/ pascal_ctx_d2/ pascal_voc_d2/ ``` You can set the location for builtin datasets by `export DETECTRON2_DATASETS=/path/to/datasets`. If left unset, the default is `./datasets` relative to your current working directory. ## Expected dataset structure for [COCO](https://cocodataset.org/#download): ``` coco/ annotations/ instances_{train,val}2017.json panoptic_{train,val}2017.json {train,val}2017/ # image files that are mentioned in the corresponding json panoptic_{train,val}2017/ # png annotations panoptic_semseg_{train,val}2017/ # generated by the script mentioned below ``` Install panopticapi by: ``` pip install git+https://github.com/cocodataset/panopticapi.git ``` Then, run `python datasets/prepare_coco_semantic_annos_from_panoptic_annos.py`, to extract semantic annotations from panoptic annotations (only used for evaluation). ## Expected dataset structure for [cityscapes](https://www.cityscapes-dataset.com/downloads/): ``` cityscapes/ gtFine/ train/ aachen/ color.png, instanceIds.png, labelIds.png, polygons.json, labelTrainIds.png ... val/ test/ # below are generated Cityscapes panoptic annotation cityscapes_panoptic_train.json cityscapes_panoptic_train/ cityscapes_panoptic_val.json cityscapes_panoptic_val/ cityscapes_panoptic_test.json cityscapes_panoptic_test/ leftImg8bit/ train/ val/ test/ ``` Install cityscapes scripts by: ``` pip install git+https://github.com/mcordts/cityscapesScripts.git ``` Note: to create labelTrainIds.png, first prepare the above structure, then run cityscapesescript with: ``` CITYSCAPES_DATASET=/path/to/abovementioned/cityscapes python cityscapesscripts/preparation/createTrainIdLabelImgs.py ``` These files are not needed for instance segmentation. Note: to generate Cityscapes panoptic dataset, run cityscapesescript with: ``` CITYSCAPES_DATASET=/path/to/abovementioned/cityscapes python cityscapesscripts/preparation/createPanopticImgs.py ``` These files are not needed for semantic and instance segmentation. ## Expected dataset structure for [ADE20k (A150)](http://sceneparsing.csail.mit.edu/): ``` ADEChallengeData2016/ images/ annotations/ objectInfo150.txt # download instance annotation annotations_instance/ # generated by prepare_ade20k_sem_seg.py annotations_detectron2/ # below are generated by prepare_ade20k_pan_seg.py ade20k_panoptic_{train,val}.json ade20k_panoptic_{train,val}/ # below are generated by prepare_ade20k_ins_seg.py ade20k_instance_{train,val}.json ``` The directory `annotations_detectron2` is generated by running `python datasets/prepare_ade20k_sem_seg.py`. Install panopticapi by: ```bash pip install git+https://github.com/cocodataset/panopticapi.git ``` Download the instance annotation from http://sceneparsing.csail.mit.edu/: ```bash wget http://sceneparsing.csail.mit.edu/data/ChallengeData2017/annotations_instance.tar ``` Then, run `python datasets/prepare_ade20k_pan_seg.py`, to combine semantic and instance annotations for panoptic annotations. And run `python datasets/prepare_ade20k_ins_seg.py`, to extract instance annotations in COCO format. ## Expected dataset structure for [Mapillary Vistas](https://www.mapillary.com/dataset/vistas): ``` mapillary_vistas/ training/ images/ instances/ labels/ panoptic/ validation/ images/ instances/ labels/ panoptic/ ``` No preprocessing is needed for Mapillary Vistas on semantic and panoptic segmentation. ## Expected dataset structure for [BDD100K](https://doc.bdd100k.com/download.html#id1) ``` bdd100k/ images/ 10k/ train/ val/ test/ json labels/ pan_seg/ sem_seg/ ``` `coco-format` annotations is obtained by running: ``` cd $DETECTRON2_DATASETS wget https://github.com/chenxi52/FrozenSeg/releases/download/latest/bdd100k_json.zip unzip bdd100k_json.zip ``` ## Expected dataset structure for [ADE20k-Full (A-847)](https://groups.csail.mit.edu/vision/datasets/ADE20K/): ``` ADE20K_2021_17_01/ images/ index_ade20k.pkl objects.txt # generated by prepare_ade20k_full_sem_seg.py images_detectron2/ annotations_detectron2/ ``` Register and download the dataset from https://groups.csail.mit.edu/vision/datasets/ADE20K/: ```bash cd $DETECTRON2_DATASETS wget your/personal/download/link/{username}_{hash}.zip unzip {username}_{hash}.zip ``` Generate the directories `ADE20K_2021_17_01/images_detectron2` and `ADE20K_2021_17_01/annotations_detectron2` by running: ```bash python datasets/prepare_ade20k_full_sem_seg.py ``` ## Expected dataset structure for [PASCAL Context Full (PC-459)](https://www.cs.stanford.edu/~roozbeh/pascal-context/) and [PASCAL VOC (PAS-21)](http://host.robots.ox.ac.uk/pascal/VOC/): ```bash VOCdevkit/ VOC2012/ Annotations/ JPEGImages/ ImageSets/ Segmentation/ VOC2010/ JPEGImages/ trainval/ trainval_merged.json # generated by prepare_pascal_voc_sem_seg.py pascal_voc_d2/ images/ annotations_pascal21/ # pascal 20 excludes the background class annotations_pascal20/ # generated by prepare_pascal_ctx_sem_seg.py pascal_ctx_d2/ images/ annotations_ctx59/ # generated by prepare_pascal_ctx_full_sem_seg.py annotations_ctx459/ ``` ### PASCAL VOC (PAS-21) Download the dataset from http://host.robots.ox.ac.uk/pascal/VOC/: ```bash cd $DETECTRON2_DATASETS wget http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar # generate folder VOCdevkit/VOC2012 tar -xvf VOCtrainval_11-May-2012.tar ``` Generate directory `pascal_voc_d2` running: ```bash python datasets/prepare_pascal_voc_sem_seg.py ``` ### PASCAL Context Full (PC-459) Download the dataset from http://host.robots.ox.ac.uk/pascal/VOC/ and annotation from https://www.cs.stanford.edu/~roozbeh/pascal-context/: ```bash cd $DETECTRON2_DATASETS wget http://host.robots.ox.ac.uk/pascal/VOC/voc2010/VOCtrainval_03-May-2010.tar # generate folder VOCdevkit/VOC2010 tar -xvf VOCtrainval_03-May-2010.tar wget https://www.cs.stanford.edu/~roozbeh/pascal-context/trainval.tar.gz # generate folder VOCdevkit/VOC2010/trainval tar -xvzf trainval.tar.gz -C VOCdevkit/VOC2010 wget https://codalabuser.blob.core.windows.net/public/trainval_merged.json -P VOCdevkit/VOC2010/ ``` Install [Detail API](https://github.com/zhanghang1989/detail-api) by: ```bash git clone https://github.com/zhanghang1989/detail-api.git rm detail-api/PythonAPI/detail/_mask.c pip install -e detail-api/PythonAPI/ ``` Generate directory `pascal_ctx_d2/images` running: ```bash python datasets/prepare_pascal_ctx_sem_seg.py ``` Generate directory `pascal_ctx_d2/annotations_ctx459` running: ```bash python datasets/prepare_pascal_ctx_full_sem_seg.py ```