UperNet, Swin Transformer base-sized backbone

UperNet framework for semantic segmentation, leveraging a Swin Transformer backbone. UperNet was introduced in the paper Unified Perceptual Parsing for Scene Understanding by Xiao et al.

Combining UperNet with a Swin Transformer backbone was introduced in the paper Swin Transformer: Hierarchical Vision Transformer using Shifted Windows.

Disclaimer: The team releasing UperNet + Swin Transformer did not write a model card for this model so this model card has been written by the Hugging Face team.

Model description

UperNet is a framework for semantic segmentation. It consists of several components, including a backbone, a Feature Pyramid Network (FPN) and a Pyramid Pooling Module (PPM).

Any visual backbone can be plugged into the UperNet framework. The framework predicts a semantic label per pixel.

UperNet architecture

Intended uses & limitations

You can use the raw model for semantic segmentation. See the model hub to look for fine-tuned versions (with various backbones) on a task that interests you.

How to use

For code examples, we refer to the documentation.

Downloads last month
410
Safetensors
Model size
122M params
Tensor type
I64
ยท
F32
ยท
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Spaces using openmmlab/upernet-swin-base 2