Spaces:

facebook
/

ov-seg

Runtime error

App Files Files Community

ov-seg / datasets /DATASETS.md

liangfeng

add ovseg

583456e over 1 year ago

preview code

raw

history blame

No virus

5.1 kB

	## Prepare Datasets for OVSeg

	This doc is a modification/extension of [MaskFormer](https://github.com/facebookresearch/MaskFormer/blob/main/datasets/README.md) following [Detectron2 fromat](https://detectron2.readthedocs.io/en/latest/tutorials/datasets.html).

	A dataset can be used by accessing [DatasetCatalog](https://detectron2.readthedocs.io/modules/data.html#detectron2.data.DatasetCatalog)
	for its data, or [MetadataCatalog](https://detectron2.readthedocs.io/modules/data.html#detectron2.data.MetadataCatalog) for its metadata (class names, etc).
	This document explains how to setup the builtin datasets so they can be used by the above APIs.
	[Use Custom Datasets](https://detectron2.readthedocs.io/tutorials/datasets.html) gives a deeper dive on how to use `DatasetCatalog` and `MetadataCatalog`,
	and how to add new datasets to them.

	OVSeg has builtin support for a few datasets.
	The datasets are assumed to exist in a directory specified by the environment variable
	`DETECTRON2_DATASETS`.
	Under this directory, detectron2 will look for datasets in the structure described below, if needed.
	```
	$DETECTRON2_DATASETS/
	coco/ # COCOStuff-171
	ADEChallengeData2016/ # ADE20K-150
	ADE20K_2021_17_01/ # ADE20K-847
	VOCdevkit/
	VOC2012/ # PASCALVOC-20
	VOC2010/ # PASCALContext-59, PASCALContext-459
	```

	You can set the location for builtin datasets by `export DETECTRON2_DATASETS=/path/to/datasets`.
	If left unset, the default is `./datasets` relative to your current working directory.

	Without specific notifications, our model is trained on COCOStuff-171 and evlauted on ADE20K-150, ADE20K-847, PASCALVOC-20, PASCALContext-59 and PASCALContext-459.

	\| dataset \| split \| # images \| # categories \|
	\|:--------------:\|:---------:\|:--------:\|:------------:\|
	\| COCO Stuff \| train2017 \| 118K \| 171 \|
	\| ADE20K \| val \| 2K \| 150/847 \|
	\| Pascal VOC \| val \| 1.5K \| 20 \|
	\| Pascal Context \| val \| 5K \| 59/459 \|


	### Expected dataset structure for [COCO Stuff](https://github.com/nightrome/cocostuff):
	```
	coco/
	train2017/ # http://images.cocodataset.org/zips/train2017.zip
	annotations/ # http://images.cocodataset.org/annotations/annotations_trainval2017.zip
	stuffthingmaps/
	stuffthingmaps_trainval2017.zip # http://calvin.inf.ed.ac.uk/wp-content/uploads/data/cocostuffdataset/stuffthingmaps_trainval2017.zip
	train2017/
	# below are generated
	stuffthingmaps_detectron2/
	train2017/
	```

	The directory `stuffthingmaps_detectron2` is generated by running `python datasets/prepare_coco_stuff_sem_seg.py`.



	### Expected dataset structure for [ADE20k Scene Parsing (ADE20K-150)](http://sceneparsing.csail.mit.edu/):
	```
	ADEChallengeData2016/
	annotations/
	images/
	objectInfo150.txt
	# below are generated
	annotations_detectron2/
	```
	The directory `annotations_detectron2` is generated by running `python datasets/prepare_ade20k_sem_seg.py`.


	### Expected dataset structure for [ADE20k-Full (ADE20K-847)](https://github.com/CSAILVision/ADE20K#download):
	```
	ADE20K_2021_17_01/
	images/
	index_ade20k.pkl
	objects.txt
	# below are generated
	images_detectron2/
	annotations_detectron2/
	```
	The directories `images_detectron2` and `annotations_detectron2` are generated by running `python datasets/prepare_ade20k_full_sem_seg.py`.

	### Expected dataset structure for [Pascal VOC 2012 (PASCALVOC-20)](http://host.robots.ox.ac.uk/pascal/VOC/voc2012/#devkit):
	```
	VOCdevkit/VOC2012/
	Annotations/
	ImageSets/
	JPEGImages/
	SegmentationClass/
	SegmentationObject/
	SegmentationClassAug/ # https://github.com/kazuto1011/deeplab-pytorch/blob/master/data/datasets/voc12/README.md
	# below are generated
	images_detectron2/
	annotations_detectron2/
	```

	It starts with a tar file `VOCtrainval_11-May-2012.tar`.

	We use SBD augmentated training data as `SegmentationClassAug` following [Deeplab](https://github.com/kazuto1011/deeplab-pytorch/blob/master/data/datasets/voc12/README.md)

	The directories `images_detectron2` and `annotations_detectron2` are generated by running `python datasets/prepare_voc_sem_seg.py`.


	### Expected dataset structure for [Pascal Context](https://www.cs.stanford.edu/~roozbeh/pascal-context/):

	```
	VOCdevkit/VOC2010/
	Annotations/
	ImageSets/
	JPEGImages/
	SegmentationClass/
	SegmentationObject/
	# below are from https://www.cs.stanford.edu/~roozbeh/pascal-context/trainval.tar.gz
	trainval/
	labels.txt
	59_labels.txt # https://www.cs.stanford.edu/~roozbeh/pascal-context/59_labels.txt
	pascalcontext_val.txt # https://drive.google.com/file/d/1BCbiOKtLvozjVnlTJX51koIveUZHCcUh/view?usp=sharing
	# below are generated
	annotations_detectron2/
	pc459_val
	pc59_val
	```
	It starts with a tar file `VOCtrainval_03-May-2010.tar`. You may want to download the 5K validation set [here](https://drive.google.com/file/d/1BCbiOKtLvozjVnlTJX51koIveUZHCcUh/view?usp=sharing).

	The directory `annotations_detectron2` is generated by running `python datasets/prepare_pascal_context.py`.