Decoder Denoising Pretraining for Semantic Segmentation
Abstract
Semantic segmentation labels are expensive and time consuming to acquire. Hence, pretraining is commonly used to improve the label-efficiency of segmentation models. Typically, the encoder of a segmentation model is pretrained as a classifier and the decoder is randomly initialized. Here, we argue that random initialization of the decoder can be suboptimal, especially when few labeled examples are available. We propose a decoder pretraining approach based on denoising, which can be combined with supervised <PRE_TAG>pretraining</POST_TAG> of the encoder. We find that decoder denoising <PRE_TAG>pretraining</POST_TAG> on the ImageNet dataset strongly outperforms encoder-only supervised <PRE_TAG>pretraining</POST_TAG>. Despite its simplicity, decoder denoising <PRE_TAG>pretraining</POST_TAG> achieves state-of-the-art results on label-efficient semantic segmentation and offers considerable gains on the Cityscapes, Pascal Context, and ADE20K datasets.
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper