# Panoptic-DeepLab: A Simple, Strong, and Fast Baseline for Bottom-Up Panoptic Segmentation Bowen Cheng, Maxwell D. Collins, Yukun Zhu, Ting Liu, Thomas S. Huang, Hartwig Adam, Liang-Chieh Chen [[`arXiv`](https://arxiv.org/abs/1911.10194)] [[`BibTeX`](#CitingPanopticDeepLab)] [[`Reference implementation`](https://github.com/bowenc0221/panoptic-deeplab)]

## Installation Install Detectron2 following [the instructions](https://detectron2.readthedocs.io/tutorials/install.html). To use cityscapes, prepare data follow the [tutorial](https://detectron2.readthedocs.io/tutorials/builtin_datasets.html#expected-dataset-structure-for-cityscapes). ## Training To train a model with 8 GPUs run: ```bash cd /path/to/detectron2/projects/Panoptic-DeepLab python train_net.py --config-file configs/Cityscapes-PanopticSegmentation/panoptic_deeplab_R_52_os16_mg124_poly_90k_bs32_crop_512_1024_dsconv.yaml --num-gpus 8 ``` ## Evaluation Model evaluation can be done similarly: ```bash cd /path/to/detectron2/projects/Panoptic-DeepLab python train_net.py --config-file configs/Cityscapes-PanopticSegmentation/panoptic_deeplab_R_52_os16_mg124_poly_90k_bs32_crop_512_1024_dsconv.yaml --eval-only MODEL.WEIGHTS /path/to/model_checkpoint ``` ## Benchmark network speed If you want to benchmark the network speed without post-processing, you can run the evaluation script with `MODEL.PANOPTIC_DEEPLAB.BENCHMARK_NETWORK_SPEED True`: ```bash cd /path/to/detectron2/projects/Panoptic-DeepLab python train_net.py --config-file configs/Cityscapes-PanopticSegmentation/panoptic_deeplab_R_52_os16_mg124_poly_90k_bs32_crop_512_1024_dsconv.yaml --eval-only MODEL.WEIGHTS /path/to/model_checkpoint MODEL.PANOPTIC_DEEPLAB.BENCHMARK_NETWORK_SPEED True ``` ## Cityscapes Panoptic Segmentation Cityscapes models are trained with ImageNet pretraining.
Method Backbone Output
resolution
PQ SQ RQ mIoU AP Memory (M) model id download
Panoptic-DeepLab R50-DC5 1024×2048 58.6 80.9 71.2 75.9 29.8 8668 - model | metrics
Panoptic-DeepLab R52-DC5 1024×2048 60.3 81.5 72.9 78.2 33.2 9682 30841561 model | metrics
Panoptic-DeepLab (DSConv) R52-DC5 1024×2048 60.3 81.0 73.2 78.7 32.1 10466 33148034 model | metrics
Note: - [R52](https://dl.fbaipublicfiles.com/detectron2/DeepLab/R-52.pkl): a ResNet-50 with its first 7x7 convolution replaced by 3 3x3 convolutions. This modification has been used in most semantic segmentation papers. We pre-train this backbone on ImageNet using the default recipe of [pytorch examples](https://github.com/pytorch/examples/tree/master/imagenet). - DC5 means using dilated convolution in `res5`. - We use a smaller training crop size (512x1024) than the original paper (1025x2049), we find using larger crop size (1024x2048) could further improve PQ by 1.5% but also degrades AP by 3%. - The implementation with regular Conv2d in ASPP and head is much heavier head than the original paper. - This implementation does not include optimized post-processing code needed for deployment. Post-processing the network outputs now takes similar amount of time to the network itself. Please refer to speed in the original paper for comparison. - DSConv refers to using DepthwiseSeparableConv2d in ASPP and decoder. The implementation with DSConv is identical to the original paper. ## COCO Panoptic Segmentation COCO models are trained with ImageNet pretraining on 16 V100s.
Method Backbone Output
resolution
PQ SQ RQ Box AP Mask AP Memory (M) model id download
Panoptic-DeepLab (DSConv) R52-DC5 640×640 35.5 77.3 44.7 18.6 19.7 246448865 model | metrics
Note: - [R52](https://dl.fbaipublicfiles.com/detectron2/DeepLab/R-52.pkl): a ResNet-50 with its first 7x7 convolution replaced by 3 3x3 convolutions. This modification has been used in most semantic segmentation papers. We pre-train this backbone on ImageNet using the default recipe of [pytorch examples](https://github.com/pytorch/examples/tree/master/imagenet). - DC5 means using dilated convolution in `res5`. - This reproduced number matches the original paper (35.5 vs. 35.1 PQ). - This implementation does not include optimized post-processing code needed for deployment. Post-processing the network outputs now takes more time than the network itself. Please refer to speed in the original paper for comparison. - DSConv refers to using DepthwiseSeparableConv2d in ASPP and decoder. ## Citing Panoptic-DeepLab If you use Panoptic-DeepLab, please use the following BibTeX entry. * CVPR 2020 paper: ``` @inproceedings{cheng2020panoptic, title={Panoptic-DeepLab: A Simple, Strong, and Fast Baseline for Bottom-Up Panoptic Segmentation}, author={Cheng, Bowen and Collins, Maxwell D and Zhu, Yukun and Liu, Ting and Huang, Thomas S and Adam, Hartwig and Chen, Liang-Chieh}, booktitle={CVPR}, year={2020} } ``` * ICCV 2019 COCO-Mapillary workshp challenge report: ``` @inproceedings{cheng2019panoptic, title={Panoptic-DeepLab}, author={Cheng, Bowen and Collins, Maxwell D and Zhu, Yukun and Liu, Ting and Huang, Thomas S and Adam, Hartwig and Chen, Liang-Chieh}, booktitle={ICCV COCO + Mapillary Joint Recognition Challenge Workshop}, year={2019} } ```