Spaces:
Runtime error
Runtime error
ImageBind with SAM
This is an experimental demo aims to combine ImageBind and SAM to generate mask with different modalities.
This basic idea is followed with IEA: Image Editing Anything and CLIP-SAM which generate the referring mask with the following steps:
- Step 1: Generate auto masks with
SamAutomaticMaskGenerator
- Step 2: Crop all the box region from the masks
- Step 3: Compute the similarity with cropped images and different modalities
- Step 4: Merge the highest similarity mask region
Table of contents
- Installation
- ImageBind-SAM Demo
- Audio Referring Segment
- Text Referring Segment
- Image Referring Segment
Installation
- Download the pretrained checkpoints
cd playground/ImageBind_SAM
mkdir .checkpoints
cd .checkpoints
# download imagebind weights
wget https://dl.fbaipublicfiles.com/imagebind/imagebind_huge.pth
wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth
- Install ImageBind follow the official installation guidance.
- Install Grounded-SAM follow install Grounded-SAM.
Run the demo
python demo.py
We implement Text Seg
and Audio Seg
in this demo, the generate masks will be saved as text_sam_merged_mask.jpg
and audio_sam_merged_mask.jpg
:
Input Model | Modality | Generate Mask |
---|---|---|
car audio | ||
"A car" | ||
By setting different threshold may influence a lot on the final results.
Run image referring segmentation demo
# download the referring image
cd .assets
wget https://github.com/IDEA-Research/detrex-storage/releases/download/grounded-sam-storage/referring_car_image.jpg
cd ..
python image_referring_seg_demo.py
Run audio referring segmentation demo
python audio_referring_seg_demo.py
Run text referring segmentation demo
python text_referring_seg_demo.py