Vishakaraj's picture
Upload folder using huggingface_hub
c709b60

A newer version of the Gradio SDK is available: 5.4.0

Upgrade

Panoptic-DeepLab: A Simple, Strong, and Fast Baseline for Bottom-Up Panoptic Segmentation

Bowen Cheng, Maxwell D. Collins, Yukun Zhu, Ting Liu, Thomas S. Huang, Hartwig Adam, Liang-Chieh Chen

[arXiv] [BibTeX] [Reference implementation]


Installation

Install Detectron2 following the instructions. To use cityscapes, prepare data follow the tutorial.

Training

To train a model with 8 GPUs run:

cd /path/to/detectron2/projects/Panoptic-DeepLab
python train_net.py --config-file configs/Cityscapes-PanopticSegmentation/panoptic_deeplab_R_52_os16_mg124_poly_90k_bs32_crop_512_1024_dsconv.yaml --num-gpus 8

Evaluation

Model evaluation can be done similarly:

cd /path/to/detectron2/projects/Panoptic-DeepLab
python train_net.py --config-file configs/Cityscapes-PanopticSegmentation/panoptic_deeplab_R_52_os16_mg124_poly_90k_bs32_crop_512_1024_dsconv.yaml --eval-only MODEL.WEIGHTS /path/to/model_checkpoint

Benchmark network speed

If you want to benchmark the network speed without post-processing, you can run the evaluation script with MODEL.PANOPTIC_DEEPLAB.BENCHMARK_NETWORK_SPEED True:

cd /path/to/detectron2/projects/Panoptic-DeepLab
python train_net.py --config-file configs/Cityscapes-PanopticSegmentation/panoptic_deeplab_R_52_os16_mg124_poly_90k_bs32_crop_512_1024_dsconv.yaml --eval-only MODEL.WEIGHTS /path/to/model_checkpoint MODEL.PANOPTIC_DEEPLAB.BENCHMARK_NETWORK_SPEED True

Cityscapes Panoptic Segmentation

Cityscapes models are trained with ImageNet pretraining.

Method Backbone Output
resolution
PQ SQ RQ mIoU AP Memory (M) model id download
Panoptic-DeepLab R50-DC5 1024Γ—2048 58.6 80.9 71.2 75.9 29.8 8668 - model | metrics
Panoptic-DeepLab R52-DC5 1024Γ—2048 60.3 81.5 72.9 78.2 33.2 9682 30841561 model | metrics
Panoptic-DeepLab (DSConv) R52-DC5 1024Γ—2048 60.3 81.0 73.2 78.7 32.1 10466 33148034 model | metrics

Note:

  • R52: a ResNet-50 with its first 7x7 convolution replaced by 3 3x3 convolutions. This modification has been used in most semantic segmentation papers. We pre-train this backbone on ImageNet using the default recipe of pytorch examples.
  • DC5 means using dilated convolution in res5.
  • We use a smaller training crop size (512x1024) than the original paper (1025x2049), we find using larger crop size (1024x2048) could further improve PQ by 1.5% but also degrades AP by 3%.
  • The implementation with regular Conv2d in ASPP and head is much heavier head than the original paper.
  • This implementation does not include optimized post-processing code needed for deployment. Post-processing the network outputs now takes similar amount of time to the network itself. Please refer to speed in the original paper for comparison.
  • DSConv refers to using DepthwiseSeparableConv2d in ASPP and decoder. The implementation with DSConv is identical to the original paper.

COCO Panoptic Segmentation

COCO models are trained with ImageNet pretraining on 16 V100s.

Method Backbone Output
resolution
PQ SQ RQ Box AP Mask AP Memory (M) model id download
Panoptic-DeepLab (DSConv) R52-DC5 640Γ—640 35.5 77.3 44.7 18.6 19.7 246448865 model | metrics

Note:

  • R52: a ResNet-50 with its first 7x7 convolution replaced by 3 3x3 convolutions. This modification has been used in most semantic segmentation papers. We pre-train this backbone on ImageNet using the default recipe of pytorch examples.
  • DC5 means using dilated convolution in res5.
  • This reproduced number matches the original paper (35.5 vs. 35.1 PQ).
  • This implementation does not include optimized post-processing code needed for deployment. Post-processing the network outputs now takes more time than the network itself. Please refer to speed in the original paper for comparison.
  • DSConv refers to using DepthwiseSeparableConv2d in ASPP and decoder.

Citing Panoptic-DeepLab

If you use Panoptic-DeepLab, please use the following BibTeX entry.

  • CVPR 2020 paper:
@inproceedings{cheng2020panoptic,
  title={Panoptic-DeepLab: A Simple, Strong, and Fast Baseline for Bottom-Up Panoptic Segmentation},
  author={Cheng, Bowen and Collins, Maxwell D and Zhu, Yukun and Liu, Ting and Huang, Thomas S and Adam, Hartwig and Chen, Liang-Chieh},
  booktitle={CVPR},
  year={2020}
}
  • ICCV 2019 COCO-Mapillary workshp challenge report:
@inproceedings{cheng2019panoptic,
  title={Panoptic-DeepLab},
  author={Cheng, Bowen and Collins, Maxwell D and Zhu, Yukun and Liu, Ting and Huang, Thomas S and Adam, Hartwig and Chen, Liang-Chieh},
  booktitle={ICCV COCO + Mapillary Joint Recognition Challenge Workshop},
  year={2019}
}