PSALM: Pixelwise SegmentAtion with Large Multi-Modal Model

Features

  • A powerful extension of the Large Multi-modal Model for generic (panoptic, instance, semantic) segmentation, referring segmentation and interactivate segmentation.
  • Support joint training across multiple segmentation tasks and visual-language tasks.
  • Demonstrates zero-shot capabilities on unseen task, such as open-vocabulary segmentation, generalizaed referring segmentation, and video object segmentation.

Note

You need to change mm_vision_tower to your mask2former checkpoint path.

Downloads last month
64
Safetensors
Model size
1.6B params
Tensor type
I64
·
F32
·
Inference API
Unable to determine this model’s pipeline type. Check the docs .