ov-seg / datasets /DATASETS.md
liangfeng
add ovseg
583456e
|
raw
history blame
No virus
5.1 kB

Prepare Datasets for OVSeg

This doc is a modification/extension of MaskFormer following Detectron2 fromat.

A dataset can be used by accessing DatasetCatalog for its data, or MetadataCatalog for its metadata (class names, etc). This document explains how to setup the builtin datasets so they can be used by the above APIs. Use Custom Datasets gives a deeper dive on how to use DatasetCatalog and MetadataCatalog, and how to add new datasets to them.

OVSeg has builtin support for a few datasets. The datasets are assumed to exist in a directory specified by the environment variable DETECTRON2_DATASETS. Under this directory, detectron2 will look for datasets in the structure described below, if needed.

$DETECTRON2_DATASETS/
  coco/                 # COCOStuff-171
  ADEChallengeData2016/ # ADE20K-150
  ADE20K_2021_17_01/    # ADE20K-847
  VOCdevkit/
    VOC2012/            # PASCALVOC-20
    VOC2010/            # PASCALContext-59, PASCALContext-459

You can set the location for builtin datasets by export DETECTRON2_DATASETS=/path/to/datasets. If left unset, the default is ./datasets relative to your current working directory.

Without specific notifications, our model is trained on COCOStuff-171 and evlauted on ADE20K-150, ADE20K-847, PASCALVOC-20, PASCALContext-59 and PASCALContext-459.

dataset split # images # categories
COCO Stuff train2017 118K 171
ADE20K val 2K 150/847
Pascal VOC val 1.5K 20
Pascal Context val 5K 59/459

Expected dataset structure for COCO Stuff:

coco/
  train2017/ # http://images.cocodataset.org/zips/train2017.zip
  annotations/ # http://images.cocodataset.org/annotations/annotations_trainval2017.zip
  stuffthingmaps/
    stuffthingmaps_trainval2017.zip # http://calvin.inf.ed.ac.uk/wp-content/uploads/data/cocostuffdataset/stuffthingmaps_trainval2017.zip
    train2017/
  # below are generated
  stuffthingmaps_detectron2/ 
    train2017/

The directory stuffthingmaps_detectron2 is generated by running python datasets/prepare_coco_stuff_sem_seg.py.

Expected dataset structure for ADE20k Scene Parsing (ADE20K-150):

ADEChallengeData2016/
  annotations/
  images/
  objectInfo150.txt
  # below are generated
  annotations_detectron2/

The directory annotations_detectron2 is generated by running python datasets/prepare_ade20k_sem_seg.py.

Expected dataset structure for ADE20k-Full (ADE20K-847):

ADE20K_2021_17_01/
  images/
  index_ade20k.pkl
  objects.txt
  # below are generated
  images_detectron2/
  annotations_detectron2/

The directories images_detectron2 and annotations_detectron2 are generated by running python datasets/prepare_ade20k_full_sem_seg.py.

Expected dataset structure for Pascal VOC 2012 (PASCALVOC-20):

VOCdevkit/VOC2012/
  Annotations/
  ImageSets/
  JPEGImages/
  SegmentationClass/
  SegmentationObject/
  SegmentationClassAug/ # https://github.com/kazuto1011/deeplab-pytorch/blob/master/data/datasets/voc12/README.md
  # below are generated
  images_detectron2/
  annotations_detectron2/

It starts with a tar file VOCtrainval_11-May-2012.tar.

We use SBD augmentated training data as SegmentationClassAug following Deeplab

The directories images_detectron2 and annotations_detectron2 are generated by running python datasets/prepare_voc_sem_seg.py.

Expected dataset structure for Pascal Context:

VOCdevkit/VOC2010/
  Annotations/
  ImageSets/
  JPEGImages/
  SegmentationClass/
  SegmentationObject/
  # below are from https://www.cs.stanford.edu/~roozbeh/pascal-context/trainval.tar.gz
  trainval/
  labels.txt
  59_labels.txt # https://www.cs.stanford.edu/~roozbeh/pascal-context/59_labels.txt
  pascalcontext_val.txt # https://drive.google.com/file/d/1BCbiOKtLvozjVnlTJX51koIveUZHCcUh/view?usp=sharing
  # below are generated
  annotations_detectron2/
    pc459_val
    pc59_val

It starts with a tar file VOCtrainval_03-May-2010.tar. You may want to download the 5K validation set here.

The directory annotations_detectron2 is generated by running python datasets/prepare_pascal_context.py.