s3nh/SegFormer-b5-person-segm

Description

Semantic segmentation is a computer vision technique for assigning a label to each pixel in an image, representing the semantic class of the objects or regions in the image. It's a form of dense prediction because it involves assigning a label to each pixel in an image, instead of just boxes around objects or key points as in object detection or instance segmentation. The goal of semantic segmentation is to recognize and understand the objects and scenes in an image, and partition the image into segments corresponding to different entities.

Parameters

model = SegformerForSemanticSegmentation.from_pretrained("nvidia/mit-b5",
                                                         num_labels=2, 
                                                         id2label=id2label, 
                                                         label2id=label2id, )

Usage


from torch import nn
import numpy as np
import matplotlib.pyplot as plt

# Transforms
_transform = A.Compose([
    A.Resize(height = 512, width=512), 
    ToTensorV2(), 
])


trans_image = _transform(image=np.array(image))
outputs = model(trans_image['image'].float().unsqueeze(0))
logits = outputs.logits.cpu()
print(logits.shape)


# First, rescale logits to original image size
upsampled_logits = nn.functional.interpolate(logits,
                size=image.size[::-1], # (height, width)
                mode='bilinear',
                align_corners=False)


seg = upsampled_logits.argmax(dim=1)[0]
color_seg = np.zeros((seg.shape[0], seg.shape[1], 3), dtype=np.uint8) # height, width, 3
palette = np.array([[0, 0, 0],[255, 255, 255]])
for label, color in enumerate(palette):
    color_seg[seg == label, :] = color
# Convert to BGR
color_seg = color_seg[..., ::-1]

#Metric Todo

#Note

This model was not built by using Huggingface based feature extractor, so automatic api could not work.