merve HF staff commited on
Commit
4ae76f2
β€’
1 Parent(s): 9cc3ad8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -770
README.md CHANGED
@@ -1,770 +1,11 @@
1
- ![](./assets/Grounded-SAM_logo.png)
2
-
3
- # Grounded-Segment-Anything
4
- [![YouTube](https://badges.aleen42.com/src/youtube.svg)](https://youtu.be/oEQYStnF2l8) [![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/roboflow-ai/notebooks/blob/main/notebooks/automated-dataset-annotation-and-evaluation-with-grounding-dino-and-sam.ipynb) [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://github.com/camenduru/grounded-segment-anything-colab) [![HuggingFace Space](https://img.shields.io/badge/πŸ€—-HuggingFace%20Space-cyan.svg)](https://huggingface.co/spaces/IDEA-Research/Grounded-SAM) [![Replicate](https://replicate.com/cjwbw/grounded-recognize-anything/badge)](https://replicate.com/cjwbw/grounded-recognize-anything) [![ModelScope Official Demo](https://img.shields.io/badge/ModelScope-Official%20Demo-important)](https://modelscope.cn/studios/tuofeilunhifi/Grounded-Segment-Anything/summary) [![Huggingface Demo by Community](https://img.shields.io/badge/Huggingface-Demo%20by%20Community-red)](https://huggingface.co/spaces/yizhangliu/Grounded-Segment-Anything) [![Stable-Diffusion WebUI](https://img.shields.io/badge/Stable--Diffusion-WebUI%20by%20Community-critical)](https://github.com/continue-revolution/sd-webui-segment-anything) [![Jupyter Notebook Demo](https://img.shields.io/badge/Demo-Jupyter%20Notebook-informational)](./grounded_sam.ipynb)
5
-
6
-
7
- We plan to create a very interesting demo by combining [Grounding DINO](https://github.com/IDEA-Research/GroundingDINO) and [Segment Anything](https://github.com/facebookresearch/segment-anything) which aims to detect and segment anything with text inputs! And we will continue to improve it and create more interesting demos based on this foundation.
8
-
9
- We are very willing to **help everyone share and promote new projects** based on Segment-Anything, Please check out here for more amazing demos and works in the community: [Highlight Extension Projects](#highlighted-projects). You can submit a new issue (with `project` tag) or a new pull request to add new project's links.
10
-
11
- ![](./assets/grounded_sam_new_demo_image.png)
12
-
13
- ![](./assets/ram_grounded_sam_new.png)
14
-
15
- **πŸ„ Why Building this Project?**
16
-
17
- The **core idea** behind this project is to **combine the strengths of different models in order to build a very powerful pipeline for solving complex problems**. And it's worth mentioning that this is a workflow for combining strong expert models, where **all parts can be used separately or in combination, and can be replaced with any similar but different models (like replacing Grounding DINO with GLIP or other detectors / replacing Stable-Diffusion with ControlNet or GLIGEN/ Combining with ChatGPT)**.
18
-
19
- **πŸ‡ Updates**
20
- - **`2023/12/17`** Support [Grounded-RepViT-SAM](https://github.com/IDEA-Research/Grounded-Segment-Anything/tree/main/EfficientSAM#run-grounded-repvit-sam-demo) demo, thanks a lot for their great work!
21
- - **`2023/12/16`** Support [Grounded-Edge-SAM](https://github.com/IDEA-Research/Grounded-Segment-Anything/tree/main/EfficientSAM#run-grounded-edge-sam-demo) demo, thanks a lot for their great work!
22
- - **`2023/12/10`** Support [Grounded-Efficient-SAM](https://github.com/IDEA-Research/Grounded-Segment-Anything/tree/main/EfficientSAM#run-grounded-efficient-sam-demo) demo, thanks a lot for their great work!
23
- - **`2023/11/24`** Release [RAM++](https://arxiv.org/abs/2310.15200), which is the next generation of RAM. RAM++ can recognize any category with high accuracy, including both predefined common categories and diverse open-set categories.
24
- - **`2023/11/23`** Release our newly proposed visual prompt counting model [T-Rex](https://github.com/IDEA-Research/T-Rex). The introduction [Video](https://www.youtube.com/watch?v=engIEhZogAQ) and [Demo](https://deepdataspace.com/playground/ivp) is available in [DDS](https://github.com/IDEA-Research/deepdataspace) now.
25
- - **`2023/07/25`** Support [Light-HQ-SAM](https://github.com/SysCV/sam-hq) in [EfficientSAM](./EfficientSAM/), credits to [Mingqiao Ye](https://github.com/ymq2017) and [Lei Ke](https://github.com/lkeab), thanks a lot for their great work!
26
- - **`2023/07/14`** Combining **Grounding-DINO-B** with [SAM-HQ](https://github.com/SysCV/sam-hq) achieves **49.6 mean AP** in [Segmentation in the Wild](https://eval.ai/web/challenges/challenge-page/1931/overview) competition zero-shot track, surpassing Grounded-SAM by **3.6 mean AP**, thanks for their great work!
27
- - **`2023/06/28`** Combining Grounding-DINO with Efficient SAM variants including [FastSAM](https://github.com/CASIA-IVA-Lab/FastSAM) and [MobileSAM](https://github.com/ChaoningZhang/MobileSAM) in [EfficientSAM](./EfficientSAM/) for faster annotating, thanks a lot for their great work!
28
- - **`2023/06/20`** By combining **Grounding-DINO-L** with **SAM-ViT-H**, Grounded-SAM achieves 46.0 mean AP in [Segmentation in the Wild](https://eval.ai/web/challenges/challenge-page/1931/overview) competition zero-shot track on [CVPR 2023 workshop](https://computer-vision-in-the-wild.github.io/cvpr-2023/), surpassing [UNINEXT (CVPR 2023)](https://github.com/MasterBin-IIAU/UNINEXT) by about **4 mean AP**.
29
- - **`2023/06/16`** Release [RAM-Grounded-SAM Replicate Online Demo](https://replicate.com/cjwbw/ram-grounded-sam). Thanks a lot to [Chenxi](https://chenxwh.github.io/) for providing this nice demo 🌹.
30
- - **`2023/06/14`** Support [RAM-Grounded-SAM & SAM-HQ](./automatic_label_ram_demo.py) and update [Simple Automatic Label Demo](./automatic_label_ram_demo.py) to support [RAM](https://github.com/OPPOMKLab/recognize-anything), setting up a strong automatic annotation pipeline.
31
- - **`2023/06/13`** Checkout the [Autodistill: Train YOLOv8 with ZERO Annotations](https://youtu.be/gKTYMfwPo4M) tutorial to learn how to use Grounded-SAM + [Autodistill](https://github.com/autodistill/autodistill) for automated data labeling and real-time model training.
32
- - **`2023/06/13`** Support [SAM-HQ](https://github.com/SysCV/sam-hq) in [Grounded-SAM Demo](#running_man-grounded-sam-detect-and-segment-everything-with-text-prompt) for higher quality prediction.
33
- - **`2023/06/12`** Support [RAM-Grounded-SAM](#label-grounded-sam-with-ram-or-tag2text-for-automatic-labeling) for strong automatic labeling pipeline! Thanks for [Recognize-Anything](https://github.com/OPPOMKLab/recognize-anything).
34
- - **`2023/06/01`** Our Grounded-SAM has been accepted to present a **demo** at [ICCV 2023](https://iccv2023.thecvf.com/)! See you in Paris!
35
- - **`2023/05/23`**: Support `Image-Referring-Segment`, `Audio-Referring-Segment` and `Text-Referring-Segment` in [ImageBind-SAM](./playground/ImageBind_SAM/).
36
- - **`2023/05/03`**: Checkout the [Automated Dataset Annotation and Evaluation with GroundingDINO and SAM](https://colab.research.google.com/github/roboflow-ai/notebooks/blob/main/notebooks/automated-dataset-annotation-and-evaluation-with-grounding-dino-and-sam.ipynb) which is an amazing tutorial on automatic labeling! Thanks a lot for [Piotr Skalski](https://github.com/SkalskiP) and [Roboflow](https://github.com/roboflow/notebooks)!
37
-
38
-
39
- ## Table of Contents
40
- - [Grounded-Segment-Anything](#grounded-segment-anything)
41
- - [Preliminary Works](#preliminary-works)
42
- - [Highlighted Projects](#highlighted-projects)
43
- - [Installation](#installation)
44
- - [Install with Docker](#install-with-docker)
45
- - [Install locally](#install-without-docker)
46
- - [Grounded-SAM Playground](#grounded-sam-playground)
47
- - [Step-by-Step Notebook Demo](#open_book-step-by-step-notebook-demo)
48
- - [GroundingDINO: Detect Everything with Text Prompt](#running_man-groundingdino-detect-everything-with-text-prompt)
49
- - [Grounded-SAM: Detect and Segment Everything with Text Prompt](#running_man-grounded-sam-detect-and-segment-everything-with-text-prompt)
50
- - [Grounded-SAM with Inpainting: Detect, Segment and Generate Everything with Text Prompt](#skier-grounded-sam-with-inpainting-detect-segment-and-generate-everything-with-text-prompt)
51
- - [Grounded-SAM and Inpaint Gradio APP](#golfing-grounded-sam-and-inpaint-gradio-app)
52
- - [Grounded-SAM with RAM or Tag2Text for Automatic Labeling](#label-grounded-sam-with-ram-or-tag2text-for-automatic-labeling)
53
- - [Grounded-SAM with BLIP & ChatGPT for Automatic Labeling](#robot-grounded-sam-with-blip-for-automatic-labeling)
54
- - [Grounded-SAM with Whisper: Detect and Segment Anything with Audio](#open_mouth-grounded-sam-with-whisper-detect-and-segment-anything-with-audio)
55
- - [Grounded-SAM ChatBot with Visual ChatGPT](#speech_balloon-grounded-sam-chatbot-demo)
56
- - [Grounded-SAM with OSX for 3D Whole-Body Mesh Recovery](#man_dancing-run-grounded-segment-anything--osx-demo)
57
- - [Grounded-SAM with VISAM for Tracking and Segment Anything](#man_dancing-run-grounded-segment-anything--visam-demo)
58
- - [Interactive Fashion-Edit Playground: Click for Segmentation And Editing](#dancers-interactive-editing)
59
- - [Interactive Human-face Editing Playground: Click And Editing Human Face](#dancers-interactive-editing)
60
- - [3D Box Via Segment Anything](#camera-3d-box-via-segment-anything)
61
- - [Playground: More Interesting and Imaginative Demos with Grounded-SAM](./playground/)
62
- - [DeepFloyd: Image Generation with Text Prompt](./playground/DeepFloyd/)
63
- - [PaintByExample: Exemplar-based Image Editing with Diffusion Models](./playground/PaintByExample/)
64
- - [LaMa: Resolution-robust Large Mask Inpainting with Fourier Convolutions](./playground/LaMa/)
65
- - [RePaint: Inpainting using Denoising Diffusion Probabilistic Models](./playground/RePaint/)
66
- - [ImageBind with SAM: Segment with Different Modalities](./playground/ImageBind_SAM/)
67
- - [Efficient SAM Series for Faster Annotation](./EfficientSAM/)
68
- - [Grounded-FastSAM Demo](https://github.com/IDEA-Research/Grounded-Segment-Anything/tree/main/EfficientSAM#run-grounded-fastsam-demo)
69
- - [Grounded-MobileSAM Demo](https://github.com/IDEA-Research/Grounded-Segment-Anything/tree/main/EfficientSAM#run-grounded-mobilesam-demo)
70
- - [Grounded-Light-HQSAM Demo](https://github.com/IDEA-Research/Grounded-Segment-Anything/tree/main/EfficientSAM#run-grounded-light-hqsam-demo)
71
- - [Grounded-Efficient-SAM Demo](https://github.com/IDEA-Research/Grounded-Segment-Anything/tree/main/EfficientSAM#run-grounded-efficient-sam-demo)
72
- - [Grounded-Edge-SAM Demo](https://github.com/IDEA-Research/Grounded-Segment-Anything/tree/main/EfficientSAM#run-grounded-edge-sam-demo)
73
- - [Grounded-RepViT-SAM Demo](https://github.com/IDEA-Research/Grounded-Segment-Anything/tree/main/EfficientSAM#run-grounded-repvit-sam-demo)
74
-
75
-
76
- ## Preliminary Works
77
-
78
- Here we provide some background knowledge that you may need to know before trying the demos.
79
-
80
- <div align="center">
81
-
82
- | Title | Intro | Description | Links |
83
- |:----:|:----:|:----:|:----:|
84
- | [Segment-Anything](https://arxiv.org/abs/2304.02643) | ![](https://github.com/facebookresearch/segment-anything/blob/main/assets/model_diagram.png?raw=true) | A strong foundation model aims to segment everything in an image, which needs prompts (as boxes/points/text) to generate masks | [[Github](https://github.com/facebookresearch/segment-anything)] <br> [[Page](https://segment-anything.com/)] <br> [[Demo](https://segment-anything.com/demo)] |
85
- | [Grounding DINO](https://arxiv.org/abs/2303.05499) | ![](https://github.com/IDEA-Research/GroundingDINO/blob/main/.asset/hero_figure.png?raw=True) | A strong zero-shot detector which is capable of to generate high quality boxes and labels with free-form text. | [[Github](https://github.com/IDEA-Research/GroundingDINO)] <br> [[Demo](https://huggingface.co/spaces/ShilongLiu/Grounding_DINO_demo)] |
86
- | [OSX](http://arxiv.org/abs/2303.16160) | ![](https://github.com/IDEA-Research/OSX/blob/main/assets/demo_video.gif?raw=True) | A strong and efficient one-stage motion capture method to generate high quality 3D human mesh from monucular image. OSX also releases a large-scale upper-body dataset UBody for a more accurate reconstrution in the upper-body scene. | [[Github](https://github.com/IDEA-Research/OSX)] <br> [[Page](https://osx-ubody.github.io/)] <br> [[Video](https://osx-ubody.github.io/)] <br> [[Data](https://docs.google.com/forms/d/e/1FAIpQLSehgBP7wdn_XznGAM2AiJPiPLTqXXHw5uX9l7qeQ1Dh9HoO_A/viewform)] |
87
- | [Stable-Diffusion](https://arxiv.org/abs/2112.10752) | ![](https://github.com/CompVis/stable-diffusion/blob/main/assets/stable-samples/txt2img/merged-0006.png?raw=True) | A super powerful open-source latent text-to-image diffusion model | [[Github](https://github.com/CompVis/stable-diffusion)] <br> [[Page](https://ommer-lab.com/research/latent-diffusion-models/)] |
88
- | [RAM++](https://arxiv.org/abs/2310.15200) | ![](https://github.com/xinyu1205/recognize-anything/blob/main/images/ram_plus_compare.jpg) | RAM++ is the next generation of RAM, which can recognize any category with high accuracy. | [[Github](https://github.com/OPPOMKLab/recognize-anything)] |
89
- | [RAM](https://recognize-anything.github.io/) | ![](https://github.com/xinyu1205/Tag2Text/raw/main/images/localization_and_recognition.jpg) | RAM is an image tagging model, which can recognize any common category with high accuracy. | [[Github](https://github.com/OPPOMKLab/recognize-anything)] <br> [[Demo](https://huggingface.co/spaces/xinyu1205/Recognize_Anything-Tag2Text)] |
90
- | [BLIP](https://arxiv.org/abs/2201.12086) | ![](https://github.com/salesforce/LAVIS/raw/main/docs/_static/logo_final.png) | A wonderful language-vision model for image understanding. | [[GitHub](https://github.com/salesforce/LAVIS)] |
91
- | [Visual ChatGPT](https://arxiv.org/abs/2303.04671) | ![](https://github.com/microsoft/TaskMatrix/raw/main/assets/figure.jpg) | A wonderful tool that connects ChatGPT and a series of Visual Foundation Models to enable sending and receiving images during chatting. | [[Github](https://github.com/microsoft/TaskMatrix)] <br> [[Demo](https://huggingface.co/spaces/microsoft/visual_chatgpt)] |
92
- | [Tag2Text](https://tag2text.github.io/) | ![](https://github.com/xinyu1205/Tag2Text/raw/main/images/tag2text_framework.png) | An efficient and controllable vision-language model which can simultaneously output superior image captioning and image tagging. | [[Github](https://github.com/OPPOMKLab/recognize-anything)] <br> [[Demo](https://huggingface.co/spaces/xinyu1205/Tag2Text)] |
93
- | [VoxelNeXt](https://arxiv.org/abs/2303.11301) | ![](https://github.com/dvlab-research/VoxelNeXt/raw/master/docs/sequence-v2.gif) | A clean, simple, and fully-sparse 3D object detector, which predicts objects directly upon sparse voxel features. | [[Github](https://github.com/dvlab-research/VoxelNeXt)]
94
-
95
- </div>
96
-
97
- ## Highlighted Projects
98
-
99
- Here we provide some impressive works you may find interesting:
100
-
101
- <div align="center">
102
-
103
- | Title | Description | Links |
104
- |:---:|:---:|:---:|
105
- | [Semantic-SAM](https://github.com/UX-Decoder/Semantic-SAM) | A universal image segmentation model to enable segment and recognize anything at any desired granularity | [[Github](https://github.com/UX-Decoder/Semantic-SAM)] <br> [[Demo](https://github.com/UX-Decoder/Semantic-SAM)] |
106
- | [SEEM: Segment Everything Everywhere All at Once](https://arxiv.org/pdf/2304.06718.pdf) | A powerful promptable segmentation model supports segmenting with various types of prompts (text, point, scribble, referring image, etc.) and any combination of prompts. | [[Github](https://github.com/UX-Decoder/Segment-Everything-Everywhere-All-At-Once)] <br> [[Demo](https://huggingface.co/spaces/xdecoder/SEEM)] |
107
- | [OpenSeeD](https://arxiv.org/pdf/2303.08131.pdf) | A simple framework for open-vocabulary segmentation and detection which supports interactive segmentation with box input to generate mask | [[Github](https://github.com/IDEA-Research/OpenSeeD)] |
108
- | [LLaVA](https://arxiv.org/abs/2304.08485) | Visual instruction tuning with GPT-4 | [[Github](https://github.com/haotian-liu/LLaVA)] <br> [[Page](https://llava-vl.github.io/)] <br> [[Demo](https://llava.hliu.cc/)] <br> [[Data](https://huggingface.co/datasets/liuhaotian/LLaVA-Instruct-150K)] <br> [[Model](https://huggingface.co/liuhaotian/LLaVA-13b-delta-v0)] |
109
- | [GenSAM](https://arxiv.org/abs/2312.07374) | Relaxing the instance-specific manual prompt requirement in SAM through training-free test-time adaptation | [[Github](https://github.com/jyLin8100/GenSAM)] <br> [[Page](https://lwpyh.github.io/GenSAM/)] |
110
-
111
- </div>
112
-
113
- We also list some awesome segment-anything extension projects here you may find interesting:
114
- - [Computer Vision in the Wild (CVinW) Readings](https://github.com/Computer-Vision-in-the-Wild/CVinW_Readings) for those who are interested in open-set tasks in computer vision.
115
- - [Zero-Shot Anomaly Detection](https://github.com/caoyunkang/GroundedSAM-zero-shot-anomaly-detection) by Yunkang Cao
116
- - [EditAnything: ControlNet + StableDiffusion based on the SAM segmentation mask](https://github.com/sail-sg/EditAnything) by Shanghua Gao and Pan Zhou
117
- - [IEA: Image Editing Anything](https://github.com/feizc/IEA) by Zhengcong Fei
118
- - [SAM-MMRorate: Combining Rotated Object Detector and SAM](https://github.com/Li-Qingyun/sam-mmrotate) by Qingyun Li and Xue Yang
119
- - [Awesome-Anything](https://github.com/VainF/Awesome-Anything) by Gongfan Fang
120
- - [Prompt-Segment-Anything](https://github.com/RockeyCoss/Prompt-Segment-Anything) by Rockey
121
- - [WebUI for Segment-Anything and Grounded-SAM](https://github.com/continue-revolution/sd-webui-segment-anything) by Chengsong Zhang
122
- - [Inpainting Anything: Inpaint Anything with SAM + Inpainting models](https://github.com/geekyutao/Inpaint-Anything) by Tao Yu
123
- - [Grounded Segment Anything From Objects to Parts: Combining Segment-Anything with VLPart & GLIP & Visual ChatGPT](https://github.com/Cheems-Seminar/segment-anything-and-name-it) by Peize Sun and Shoufa Chen
124
- - [Narapi-SAM: Integration of Segment Anything into Narapi (A nice viewer for SAM)](https://github.com/MIC-DKFZ/napari-sam) by MIC-DKFZ
125
- - [Grounded Segment Anything Colab](https://github.com/camenduru/grounded-segment-anything-colab) by camenduru
126
- - [Optical Character Recognition with Segment Anything](https://github.com/yeungchenwa/OCR-SAM) by Zhenhua Yang
127
- - [Transform Image into Unique Paragraph with ChatGPT, BLIP2, OFA, GRIT, Segment Anything, ControlNet](https://github.com/showlab/Image2Paragraph) by showlab
128
- - [Lang-Segment-Anything: Another awesome demo for combining GroundingDINO with Segment-Anything](https://github.com/luca-medeiros/lang-segment-anything) by Luca Medeiros
129
- - [πŸ₯³ πŸš€ **Playground: Integrate SAM and OpenMMLab!**](https://github.com/open-mmlab/playground)
130
- - [3D-object via Segment Anything](https://github.com/dvlab-research/3D-Box-Segment-Anything) by Yukang Chen
131
- - [Image2Paragraph: Transform Image Into Unique Paragraph](https://github.com/showlab/Image2Paragraph) by Show Lab
132
- - [Zero-shot Scene Graph Generate with Grounded-SAM](https://github.com/showlab/Image2Paragraph) by JackWhite-rwx
133
- - [CLIP Surgery for Better Explainability with Enhancement in Open-Vocabulary Tasks](https://github.com/xmed-lab/CLIP_Surgery) by Eli-YiLi
134
- - [Panoptic-Segment-Anything: Zero-shot panoptic segmentation using SAM](https://github.com/segments-ai/panoptic-segment-anything) by segments-ai
135
- - [Caption-Anything: Generates Descriptive Captions for Any Object within an Image](https://github.com/ttengwang/Caption-Anything) by Teng Wang
136
- - [Segment-Anything-3D: Transferring Segmentation Information of 2D Images to 3D Space](https://github.com/Pointcept/SegmentAnything3D) by Yunhan Yang
137
- - [Expediting SAM without Fine-tuning](https://github.com/Expedit-LargeScale-Vision-Transformer/Expedit-SAM) by Weicong Liang and Yuhui Yuan
138
- - [Semantic Segment Anything: Providing Rich Semantic Category Annotations for SAM](https://github.com/fudan-zvg/Semantic-Segment-Anything) by Jiaqi Chen and Zeyu Yang and Li Zhang
139
- - [Enhance Everything: Combining SAM with Image Restoration and Enhancement Tasks](https://github.com/lixinustc/Enhance-Anything) by Xin Li
140
- - [DragGAN](https://github.com/Zeqiang-Lai/DragGAN) by Shanghai AI Lab.
141
-
142
- ## Installation
143
- The code requires `python>=3.8`, as well as `pytorch>=1.7` and `torchvision>=0.8`. Please follow the instructions [here](https://pytorch.org/get-started/locally/) to install both PyTorch and TorchVision dependencies. Installing both PyTorch and TorchVision with CUDA support is strongly recommended.
144
-
145
- ### Install with Docker
146
-
147
- Open one terminal:
148
-
149
- ```
150
- make build-image
151
- ```
152
-
153
- ```
154
- make run
155
- ```
156
-
157
- That's it.
158
-
159
- If you would like to allow visualization across docker container, open another terminal and type:
160
-
161
- ```
162
- xhost +
163
- ```
164
-
165
-
166
- ### Install without Docker
167
- You should set the environment variable manually as follows if you want to build a local GPU environment for Grounded-SAM:
168
- ```bash
169
- export AM_I_DOCKER=False
170
- export BUILD_WITH_CUDA=True
171
- export CUDA_HOME=/path/to/cuda-11.3/
172
- ```
173
-
174
- Install Segment Anything:
175
-
176
- ```bash
177
- python -m pip install -e segment_anything
178
- ```
179
-
180
- Install Grounding DINO:
181
-
182
- ```bash
183
- python -m pip install -e GroundingDINO
184
- ```
185
-
186
-
187
- Install diffusers:
188
-
189
- ```bash
190
- pip install --upgrade diffusers[torch]
191
- ```
192
-
193
- Install osx:
194
-
195
- ```bash
196
- git submodule update --init --recursive
197
- cd grounded-sam-osx && bash install.sh
198
- ```
199
-
200
- Install RAM & Tag2Text:
201
-
202
- ```bash
203
- git clone https://github.com/xinyu1205/recognize-anything.git
204
- pip install -r ./recognize-anything/requirements.txt
205
- pip install -e ./recognize-anything/
206
- ```
207
-
208
- The following optional dependencies are necessary for mask post-processing, saving masks in COCO format, the example notebooks, and exporting the model in ONNX format. `jupyter` is also required to run the example notebooks.
209
-
210
- ```
211
- pip install opencv-python pycocotools matplotlib onnxruntime onnx ipykernel
212
- ```
213
-
214
- More details can be found in [install segment anything](https://github.com/facebookresearch/segment-anything#installation) and [install GroundingDINO](https://github.com/IDEA-Research/GroundingDINO#install) and [install OSX](https://github.com/IDEA-Research/OSX)
215
-
216
-
217
- ## Grounded-SAM Playground
218
- Let's start exploring our Grounding-SAM Playground and we will release more interesting demos in the future, stay tuned!
219
-
220
- ## :open_book: Step-by-Step Notebook Demo
221
- Here we list some notebook demo provided in this project:
222
- - [grounded_sam.ipynb](grounded_sam.ipynb)
223
- - [grounded_sam_colab_demo.ipynb](grounded_sam_colab_demo.ipynb)
224
- - [grounded_sam_3d_box.ipynb](grounded_sam_3d_box)
225
-
226
-
227
- ### :running_man: GroundingDINO: Detect Everything with Text Prompt
228
-
229
- :grapes: [[arXiv Paper](https://arxiv.org/abs/2303.05499)] &nbsp; :rose:[[Try the Colab Demo](https://colab.research.google.com/github/roboflow-ai/notebooks/blob/main/notebooks/zero-shot-object-detection-with-grounding-dino.ipynb)] &nbsp; :sunflower: [[Try Huggingface Demo](https://huggingface.co/spaces/ShilongLiu/Grounding_DINO_demo)] &nbsp; :mushroom: [[Automated Dataset Annotation and Evaluation](https://youtu.be/C4NqaRBz_Kw)]
230
-
231
- Here's the step-by-step tutorial on running `GroundingDINO` demo:
232
-
233
- **Step 1: Download the pretrained weights**
234
-
235
- ```bash
236
- cd Grounded-Segment-Anything
237
-
238
- # download the pretrained groundingdino-swin-tiny model
239
- wget https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth
240
- ```
241
-
242
- **Step 2: Running the demo**
243
-
244
- ```bash
245
- python grounding_dino_demo.py
246
- ```
247
-
248
- <details>
249
- <summary> <b> Running with Python (same as demo but you can run it anywhere after installing GroundingDINO) </b> </summary>
250
-
251
- ```python
252
- from groundingdino.util.inference import load_model, load_image, predict, annotate
253
- import cv2
254
-
255
- model = load_model("GroundingDINO/groundingdino/config/GroundingDINO_SwinT_OGC.py", "./groundingdino_swint_ogc.pth")
256
- IMAGE_PATH = "assets/demo1.jpg"
257
- TEXT_PROMPT = "bear."
258
- BOX_THRESHOLD = 0.35
259
- TEXT_THRESHOLD = 0.25
260
-
261
- image_source, image = load_image(IMAGE_PATH)
262
-
263
- boxes, logits, phrases = predict(
264
- model=model,
265
- image=image,
266
- caption=TEXT_PROMPT,
267
- box_threshold=BOX_THRESHOLD,
268
- text_threshold=TEXT_THRESHOLD
269
- )
270
-
271
- annotated_frame = annotate(image_source=image_source, boxes=boxes, logits=logits, phrases=phrases)
272
- cv2.imwrite("annotated_image.jpg", annotated_frame)
273
- ```
274
-
275
- </details>
276
- <br>
277
-
278
- **Tips**
279
- - If you want to detect multiple objects in one sentence with [Grounding DINO](https://github.com/IDEA-Research/GroundingDINO), we suggest separating each name with `.` . An example: `cat . dog . chair .`
280
-
281
- **Step 3: Check the annotated image**
282
-
283
- The annotated image will be saved as `./annotated_image.jpg`.
284
-
285
- <div align="center">
286
-
287
- | Text Prompt | Demo Image | Annotated Image |
288
- |:----:|:----:|:----:|
289
- | `Bear.` | ![](./assets/demo1.jpg) | ![](./assets/annotated_image.jpg) |
290
- | `Horse. Clouds. Grasses. Sky. Hill` | ![](./assets/demo7.jpg) | ![](https://github.com/IDEA-Research/detrex-storage/blob/main/assets/grounded_sam/grounding_dino/groundingdino_demo7.jpg?raw=true)
291
-
292
- </div>
293
-
294
-
295
- ### :running_man: Grounded-SAM: Detect and Segment Everything with Text Prompt
296
-
297
- Here's the step-by-step tutorial on running `Grounded-SAM` demo:
298
-
299
- **Step 1: Download the pretrained weights**
300
-
301
- ```bash
302
- cd Grounded-Segment-Anything
303
-
304
- wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth
305
- wget https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth
306
- ```
307
-
308
- We provide two versions of Grounded-SAM demo here:
309
- - [grounded_sam_demo.py](./grounded_sam_demo.py): our original implementation for Grounded-SAM.
310
- - [grounded_sam_simple_demo.py](./grounded_sam_simple_demo.py) our updated more elegant version for Grounded-SAM.
311
-
312
- **Step 2: Running original grounded-sam demo**
313
-
314
- ```python
315
- export CUDA_VISIBLE_DEVICES=0
316
- python grounded_sam_demo.py \
317
- --config GroundingDINO/groundingdino/config/GroundingDINO_SwinT_OGC.py \
318
- --grounded_checkpoint groundingdino_swint_ogc.pth \
319
- --sam_checkpoint sam_vit_h_4b8939.pth \
320
- --input_image assets/demo1.jpg \
321
- --output_dir "outputs" \
322
- --box_threshold 0.3 \
323
- --text_threshold 0.25 \
324
- --text_prompt "bear" \
325
- --device "cuda"
326
- ```
327
-
328
- The annotated results will be saved in `./outputs` as follows
329
-
330
- <div align="center">
331
-
332
- | Input Image | Annotated Image | Generated Mask |
333
- |:----:|:----:|:----:|
334
- | ![](./assets/demo1.jpg) | ![](https://github.com/IDEA-Research/detrex-storage/blob/main/assets/grounded_sam/grounded_sam/original_grounded_sam_demo1.jpg?raw=true) | ![](https://github.com/IDEA-Research/detrex-storage/blob/main/assets/grounded_sam/grounded_sam/mask.jpg?raw=true) |
335
-
336
- </div>
337
-
338
- **Step 3: Running grounded-sam demo with sam-hq**
339
- - Download the demo image
340
- ```bash
341
- wget https://github.com/IDEA-Research/detrex-storage/releases/download/grounded-sam-storage/sam_hq_demo_image.png
342
- ```
343
-
344
- - Download SAM-HQ checkpoint [here](https://github.com/SysCV/sam-hq#model-checkpoints)
345
-
346
- - Running grounded-sam-hq demo as follows:
347
- ```python
348
- export CUDA_VISIBLE_DEVICES=0
349
- python grounded_sam_demo.py \
350
- --config GroundingDINO/groundingdino/config/GroundingDINO_SwinT_OGC.py \
351
- --grounded_checkpoint groundingdino_swint_ogc.pth \
352
- --sam_hq_checkpoint ./sam_hq_vit_h.pth \ # path to sam-hq checkpoint
353
- --use_sam_hq \ # set to use sam-hq model
354
- --input_image sam_hq_demo_image.png \
355
- --output_dir "outputs" \
356
- --box_threshold 0.3 \
357
- --text_threshold 0.25 \
358
- --text_prompt "chair." \
359
- --device "cuda"
360
- ```
361
-
362
- The annotated results will be saved in `./outputs` as follows
363
-
364
- <div align="center">
365
-
366
- | Input Image | SAM Output | SAM-HQ Output |
367
- |:----:|:----:|:----:|
368
- | ![](https://github.com/IDEA-Research/detrex-storage/blob/main/assets/grounded_sam/sam_hq/sam_hq_demo.png?raw=true) | ![](https://github.com/IDEA-Research/detrex-storage/blob/main/assets/grounded_sam/sam_hq/sam_output.jpg?raw=true) | ![](https://github.com/IDEA-Research/detrex-storage/blob/main/assets/grounded_sam/sam_hq/sam_hq_output.jpg?raw=true) |
369
-
370
- </div>
371
-
372
- **Step 4: Running the updated grounded-sam demo (optional)**
373
-
374
- Note that this demo is almost same as the original demo, but **with more elegant code**.
375
-
376
- ```python
377
- python grounded_sam_simple_demo.py
378
- ```
379
-
380
- The annotated results will be saved as `./groundingdino_annotated_image.jpg` and `./grounded_sam_annotated_image.jpg`
381
-
382
- <div align="center">
383
-
384
- | Text Prompt | Input Image | GroundingDINO Annotated Image | Grounded-SAM Annotated Image |
385
- |:----:|:----:|:----:|:----:|
386
- | `The running dog` | ![](./assets/demo2.jpg) | ![](https://github.com/IDEA-Research/detrex-storage/blob/main/assets/grounded_sam/grounded_sam/groundingdino_annotated_image_demo2.jpg?raw=true) | ![](https://github.com/IDEA-Research/detrex-storage/blob/main/assets/grounded_sam/grounded_sam/grounded_sam_annotated_image_demo2.jpg?raw=true) |
387
- | `Horse. Clouds. Grasses. Sky. Hill` | ![](./assets/demo7.jpg) | ![](assets/groundingdino_annotated_image.jpg) | ![](assets/grounded_sam_annotated_image.jpg) |
388
-
389
- </div>
390
-
391
- ### :skier: Grounded-SAM with Inpainting: Detect, Segment and Generate Everything with Text Prompt
392
-
393
- **Step 1: Download the pretrained weights**
394
-
395
- ```bash
396
- cd Grounded-Segment-Anything
397
-
398
- wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth
399
- wget https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth
400
- ```
401
-
402
- **Step 2: Running grounded-sam inpainting demo**
403
-
404
- ```bash
405
- CUDA_VISIBLE_DEVICES=0
406
- python grounded_sam_inpainting_demo.py \
407
- --config GroundingDINO/groundingdino/config/GroundingDINO_SwinT_OGC.py \
408
- --grounded_checkpoint groundingdino_swint_ogc.pth \
409
- --sam_checkpoint sam_vit_h_4b8939.pth \
410
- --input_image assets/inpaint_demo.jpg \
411
- --output_dir "outputs" \
412
- --box_threshold 0.3 \
413
- --text_threshold 0.25 \
414
- --det_prompt "bench" \
415
- --inpaint_prompt "A sofa, high quality, detailed" \
416
- --device "cuda"
417
- ```
418
-
419
- The annotated and inpaint image will be saved in `./outputs`
420
-
421
- **Step 3: Check the results**
422
-
423
-
424
- <div align="center">
425
-
426
- | Input Image | Det Prompt | Annotated Image | Inpaint Prompt | Inpaint Image |
427
- |:---:|:---:|:---:|:---:|:---:|
428
- |![](./assets/inpaint_demo.jpg) | `Bench` | ![](https://github.com/IDEA-Research/detrex-storage/blob/main/assets/grounded_sam/grounded_sam_inpaint/grounded_sam_output.jpg?raw=true) | `A sofa, high quality, detailed` | ![](https://github.com/IDEA-Research/detrex-storage/blob/main/assets/grounded_sam/grounded_sam_inpaint/grounded_sam_inpainting_output.jpg?raw=true) |
429
-
430
- </div>
431
-
432
- ### :golfing: Grounded-SAM and Inpaint Gradio APP
433
-
434
- We support 6 tasks in the local Gradio APP:
435
-
436
- 1. **scribble**: Segmentation is achieved through Segment Anything and mouse click interaction (you need to click on the object with the mouse, no need to specify the prompt).
437
- 2. **automask**: Segment the entire image at once through Segment Anything (no need to specify a prompt).
438
- 3. **det**: Realize detection through Grounding DINO and text interaction (text prompt needs to be specified).
439
- 4. **seg**: Realize text interaction by combining Grounding DINO and Segment Anything to realize detection + segmentation (need to specify text prompt).
440
- 5. **inpainting**: By combining Grounding DINO + Segment Anything + Stable Diffusion to achieve text exchange and replace the target object (need to specify text prompt and inpaint prompt) .
441
- 6. **automatic**: By combining BLIP + Grounding DINO + Segment Anything to achieve non-interactive detection + segmentation (no need to specify prompt).
442
-
443
- ```bash
444
- python gradio_app.py
445
- ```
446
-
447
- - The gradio_app visualization as follows:
448
-
449
- ![](./assets/gradio_demo.png)
450
-
451
-
452
- ### :label: Grounded-SAM with RAM or Tag2Text for Automatic Labeling
453
- [**The Recognize Anything Models**](https://github.com/OPPOMKLab/recognize-anything) are a series of open-source and strong fundamental image recognition models, including [RAM++](https://arxiv.org/abs/2310.15200), [RAM](https://arxiv.org/abs/2306.03514) and [Tag2text](https://arxiv.org/abs/2303.05657).
454
-
455
-
456
- It is seamlessly linked to generate pseudo labels automatically as follows:
457
- 1. Use RAM/Tag2Text to generate tags.
458
- 2. Use Grounded-Segment-Anything to generate the boxes and masks.
459
-
460
-
461
- **Step 1: Init submodule and download the pretrained checkpoint**
462
-
463
- - Init submodule:
464
-
465
- ```bash
466
- cd Grounded-Segment-Anything
467
- git submodule init
468
- git submodule update
469
- ```
470
-
471
- - Download pretrained weights for `GroundingDINO`, `SAM` and `RAM/Tag2Text`:
472
-
473
- ```bash
474
- wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth
475
- wget https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth
476
-
477
-
478
- wget https://huggingface.co/spaces/xinyu1205/Tag2Text/resolve/main/ram_swin_large_14m.pth
479
- wget https://huggingface.co/spaces/xinyu1205/Tag2Text/resolve/main/tag2text_swin_14m.pth
480
- ```
481
-
482
- **Step 2: Running the demo with RAM**
483
- ```bash
484
- export CUDA_VISIBLE_DEVICES=0
485
- python automatic_label_ram_demo.py \
486
- --config GroundingDINO/groundingdino/config/GroundingDINO_SwinT_OGC.py \
487
- --ram_checkpoint ram_swin_large_14m.pth \
488
- --grounded_checkpoint groundingdino_swint_ogc.pth \
489
- --sam_checkpoint sam_vit_h_4b8939.pth \
490
- --input_image assets/demo9.jpg \
491
- --output_dir "outputs" \
492
- --box_threshold 0.25 \
493
- --text_threshold 0.2 \
494
- --iou_threshold 0.5 \
495
- --device "cuda"
496
- ```
497
-
498
-
499
- **Step 2: Or Running the demo with Tag2Text**
500
- ```bash
501
- export CUDA_VISIBLE_DEVICES=0
502
- python automatic_label_tag2text_demo.py \
503
- --config GroundingDINO/groundingdino/config/GroundingDINO_SwinT_OGC.py \
504
- --tag2text_checkpoint tag2text_swin_14m.pth \
505
- --grounded_checkpoint groundingdino_swint_ogc.pth \
506
- --sam_checkpoint sam_vit_h_4b8939.pth \
507
- --input_image assets/demo9.jpg \
508
- --output_dir "outputs" \
509
- --box_threshold 0.25 \
510
- --text_threshold 0.2 \
511
- --iou_threshold 0.5 \
512
- --device "cuda"
513
- ```
514
-
515
- - RAM++ significantly improves the open-set capability of RAM, for [RAM++ inference on unseen categoreis](https://github.com/xinyu1205/recognize-anything#ram-inference-on-unseen-categories-open-set).
516
- - Tag2Text also provides powerful captioning capabilities, and the process with captions can refer to [BLIP](#robot-run-grounded-segment-anything--blip-demo).
517
- - The pseudo labels and model prediction visualization will be saved in `output_dir` as follows (right figure):
518
-
519
- ![](./assets/automatic_label_output/demo9_tag2text_ram.jpg)
520
-
521
-
522
- ### :robot: Grounded-SAM with BLIP for Automatic Labeling
523
- It is easy to generate pseudo labels automatically as follows:
524
- 1. Use BLIP (or other caption models) to generate a caption.
525
- 2. Extract tags from the caption. We use ChatGPT to handle the potential complicated sentences.
526
- 3. Use Grounded-Segment-Anything to generate the boxes and masks.
527
-
528
- - Run Demo
529
- ```bash
530
- export OPENAI_API_KEY=your_openai_key
531
- export OPENAI_API_BASE=https://closeai.deno.dev/v1
532
- export CUDA_VISIBLE_DEVICES=0
533
- python automatic_label_demo.py \
534
- --config GroundingDINO/groundingdino/config/GroundingDINO_SwinT_OGC.py \
535
- --grounded_checkpoint groundingdino_swint_ogc.pth \
536
- --sam_checkpoint sam_vit_h_4b8939.pth \
537
- --input_image assets/demo3.jpg \
538
- --output_dir "outputs" \
539
- --openai_key $OPENAI_API_KEY \
540
- --box_threshold 0.25 \
541
- --text_threshold 0.2 \
542
- --iou_threshold 0.5 \
543
- --device "cuda"
544
- ```
545
-
546
- - When you don't have a paid Account for ChatGPT is also possible to use NLTK instead. Just don't include the ```openai_key``` Parameter when starting the Demo.
547
- - The Script will automatically download the necessary NLTK Data.
548
- - The pseudo labels and model prediction visualization will be saved in `output_dir` as follows:
549
-
550
- ![](./assets/automatic_label_output_demo3.jpg)
551
-
552
-
553
- ### :open_mouth: Grounded-SAM with Whisper: Detect and Segment Anything with Audio
554
- Detect and segment anything with speech!
555
-
556
- ![](assets/acoustics/gsam_whisper_inpainting_demo.png)
557
-
558
- **Install Whisper**
559
- ```bash
560
- pip install -U openai-whisper
561
- ```
562
- See the [whisper official page](https://github.com/openai/whisper#setup) if you have other questions for the installation.
563
-
564
- **Run Voice-to-Label Demo**
565
-
566
- Optional: Download the demo audio file
567
-
568
- ```bash
569
- wget https://huggingface.co/ShilongLiu/GroundingDINO/resolve/main/demo_audio.mp3
570
- ```
571
-
572
-
573
- ```bash
574
- export CUDA_VISIBLE_DEVICES=0
575
- python grounded_sam_whisper_demo.py \
576
- --config GroundingDINO/groundingdino/config/GroundingDINO_SwinT_OGC.py \
577
- --grounded_checkpoint groundingdino_swint_ogc.pth \
578
- --sam_checkpoint sam_vit_h_4b8939.pth \
579
- --input_image assets/demo4.jpg \
580
- --output_dir "outputs" \
581
- --box_threshold 0.3 \
582
- --text_threshold 0.25 \
583
- --speech_file "demo_audio.mp3" \
584
- --device "cuda"
585
- ```
586
-
587
- ![](./assets/grounded_sam_whisper_output.jpg)
588
-
589
- **Run Voice-to-inpaint Demo**
590
-
591
- You can enable chatgpt to help you automatically detect the object and inpainting order with `--enable_chatgpt`.
592
-
593
- Or you can specify the object you want to inpaint [stored in `args.det_speech_file`] and the text you want to inpaint with [stored in `args.inpaint_speech_file`].
594
-
595
- ```bash
596
- export OPENAI_API_KEY=your_openai_key
597
- export OPENAI_API_BASE=https://closeai.deno.dev/v1
598
- # Example: enable chatgpt
599
- export CUDA_VISIBLE_DEVICES=0
600
- python grounded_sam_whisper_inpainting_demo.py \
601
- --config GroundingDINO/groundingdino/config/GroundingDINO_SwinT_OGC.py \
602
- --grounded_checkpoint groundingdino_swint_ogc.pth \
603
- --sam_checkpoint sam_vit_h_4b8939.pth \
604
- --input_image assets/inpaint_demo.jpg \
605
- --output_dir "outputs" \
606
- --box_threshold 0.3 \
607
- --text_threshold 0.25 \
608
- --prompt_speech_file assets/acoustics/prompt_speech_file.mp3 \
609
- --enable_chatgpt \
610
- --openai_key $OPENAI_API_KEY\
611
- --device "cuda"
612
- ```
613
-
614
- ```bash
615
- # Example: without chatgpt
616
- export CUDA_VISIBLE_DEVICES=0
617
- python grounded_sam_whisper_inpainting_demo.py \
618
- --config GroundingDINO/groundingdino/config/GroundingDINO_SwinT_OGC.py \
619
- --grounded_checkpoint groundingdino_swint_ogc.pth \
620
- --sam_checkpoint sam_vit_h_4b8939.pth \
621
- --input_image assets/inpaint_demo.jpg \
622
- --output_dir "outputs" \
623
- --box_threshold 0.3 \
624
- --text_threshold 0.25 \
625
- --det_speech_file "assets/acoustics/det_voice.mp3" \
626
- --inpaint_speech_file "assets/acoustics/inpaint_voice.mp3" \
627
- --device "cuda"
628
- ```
629
-
630
- ![](./assets/acoustics/gsam_whisper_inpainting_pipeline.png)
631
-
632
- ### :speech_balloon: Grounded-SAM ChatBot Demo
633
-
634
- https://user-images.githubusercontent.com/24236723/231955561-2ae4ec1a-c75f-4cc5-9b7b-517aa1432123.mp4
635
-
636
- Following [Visual ChatGPT](https://github.com/microsoft/visual-chatgpt), we add a ChatBot for our project. Currently, it supports:
637
- 1. "Describe the image."
638
- 2. "Detect the dog (and the cat) in the image."
639
- 3. "Segment anything in the image."
640
- 4. "Segment the dog (and the cat) in the image."
641
- 5. "Help me label the image."
642
- 6. "Replace the dog with a cat in the image."
643
-
644
- To use the ChatBot:
645
- - Install whisper if you want to use audio as input.
646
- - Set the default model setting in the tool `Grounded_dino_sam_inpainting`.
647
- - Run Demo
648
- ```bash
649
- export OPENAI_API_KEY=your_openai_key
650
- export OPENAI_API_BASE=https://closeai.deno.dev/v1
651
- export CUDA_VISIBLE_DEVICES=0
652
- python chatbot.py
653
- ```
654
-
655
- ### :man_dancing: Run Grounded-Segment-Anything + OSX Demo
656
-
657
- <p align="middle">
658
- <img src="assets/osx/grouned_sam_osx_demo.gif">
659
- <br>
660
- </p>
661
-
662
-
663
- - Download the checkpoint `osx_l_wo_decoder.pth.tar` from [here](https://drive.google.com/drive/folders/1x7MZbB6eAlrq5PKC9MaeIm4GqkBpokow?usp=share_link) for OSX:
664
- - Download the human model files and place it into `grounded-sam-osx/utils/human_model_files` following the instruction of [OSX](https://github.com/IDEA-Research/OSX).
665
-
666
- - Run Demo
667
-
668
- ```shell
669
- export CUDA_VISIBLE_DEVICES=0
670
- python grounded_sam_osx_demo.py \
671
- --config GroundingDINO/groundingdino/config/GroundingDINO_SwinT_OGC.py \
672
- --grounded_checkpoint groundingdino_swint_ogc.pth \
673
- --sam_checkpoint sam_vit_h_4b8939.pth \
674
- --osx_checkpoint osx_l_wo_decoder.pth.tar \
675
- --input_image assets/osx/grounded_sam_osx_demo.png \
676
- --output_dir "outputs" \
677
- --box_threshold 0.3 \
678
- --text_threshold 0.25 \
679
- --text_prompt "humans, chairs" \
680
- --device "cuda"
681
- ```
682
-
683
- - The model prediction visualization will be saved in `output_dir` as follows:
684
-
685
- <img src="assets/osx/grounded_sam_osx_output.jpg" style="zoom: 49%;" />
686
-
687
- - We also support promptable 3D whole-body mesh recovery. For example, you can track someone with a text prompt and estimate his 3D pose and shape :
688
-
689
- | ![space-1.jpg](assets/osx/grounded_sam_osx_output1.jpg) |
690
- | :---------------------------------------------------: |
691
- | *A person with pink clothes* |
692
-
693
- | ![space-1.jpg](assets/osx/grounded_sam_osx_output2.jpg) |
694
- | :---------------------------------------------------: |
695
- | *A man with a sunglasses* |
696
-
697
-
698
- ## :man_dancing: Run Grounded-Segment-Anything + VISAM Demo
699
-
700
- - Download the checkpoint `motrv2_dancetrack.pth` from [here](https://drive.google.com/file/d/1EA4lndu2yQcVgBKR09KfMe5efbf631Th/view?usp=share_link) for MOTRv2:
701
- - See the more thing if you have other questions for the installation.
702
-
703
- - Run Demo
704
-
705
- ```shell
706
- export CUDA_VISIBLE_DEVICES=0
707
- python grounded_sam_visam.py \
708
- --meta_arch motr \
709
- --dataset_file e2e_dance \
710
- --with_box_refine \
711
- --query_interaction_layer QIMv2 \
712
- --num_queries 10 \
713
- --det_db det_db_motrv2.json \
714
- --use_checkpoint \
715
- --mot_path your_data_path \
716
- --resume motrv2_dancetrack.pth \
717
- --sam_checkpoint sam_vit_h_4b8939.pth \
718
- --video_path DanceTrack/test/dancetrack0003
719
- ```
720
- |![](https://raw.githubusercontent.com/BingfengYan/MOTSAM/main/visam.gif)|
721
-
722
-
723
- ### :dancers: Interactive Editing
724
- - Release the interactive fashion-edit playground in [here](https://github.com/IDEA-Research/Grounded-Segment-Anything/tree/humanFace). Run in the notebook, just click for annotating points for further segmentation. Enjoy it!
725
-
726
-
727
- - Release human-face-edit branch [here](https://github.com/IDEA-Research/Grounded-Segment-Anything/tree/humanFace). We'll keep updating this branch with more interesting features. Here are some examples:
728
-
729
- ![](https://github.com/IDEA-Research/Grounded-Segment-Anything/blob/humanFace/assets/231-hair-edit.png)
730
-
731
- ## :camera: 3D-Box via Segment Anything
732
- We extend the scope to 3D world by combining Segment Anything and [VoxelNeXt](https://github.com/dvlab-research/VoxelNeXt). When we provide a prompt (e.g., a point / box), the result is not only 2D segmentation mask, but also 3D boxes. Please check [voxelnext_3d_box](./voxelnext_3d_box/) for more details.
733
- ![](https://github.com/IDEA-Research/Grounded-Segment-Anything/blob/main/voxelnext_3d_box/images/sam-voxelnext.png)
734
- ![](https://github.com/IDEA-Research/Grounded-Segment-Anything/blob/main/voxelnext_3d_box/images/image_boxes2.png)
735
-
736
-
737
-
738
-
739
- ## :cupid: Acknowledgements
740
-
741
- - [Segment Anything](https://github.com/facebookresearch/segment-anything)
742
- - [Grounding DINO](https://github.com/IDEA-Research/GroundingDINO)
743
-
744
-
745
- ## Contributors
746
-
747
- Our project wouldn't be possible without the contributions of these amazing people! Thank you all for making this project better.
748
-
749
- <a href="https://github.com/IDEA-Research/Grounded-Segment-Anything/graphs/contributors">
750
- <img src="https://contrib.rocks/image?repo=IDEA-Research/Grounded-Segment-Anything" />
751
- </a>
752
-
753
-
754
- ## Citation
755
- If you find this project helpful for your research, please consider citing the following BibTeX entry.
756
- ```BibTex
757
- @article{kirillov2023segany,
758
- title={Segment Anything},
759
- author={Kirillov, Alexander and Mintun, Eric and Ravi, Nikhila and Mao, Hanzi and Rolland, Chloe and Gustafson, Laura and Xiao, Tete and Whitehead, Spencer and Berg, Alexander C. and Lo, Wan-Yen and Doll{\'a}r, Piotr and Girshick, Ross},
760
- journal={arXiv:2304.02643},
761
- year={2023}
762
- }
763
-
764
- @article{liu2023grounding,
765
- title={Grounding dino: Marrying dino with grounded pre-training for open-set object detection},
766
- author={Liu, Shilong and Zeng, Zhaoyang and Ren, Tianhe and Li, Feng and Zhang, Hao and Yang, Jie and Li, Chunyuan and Yang, Jianwei and Su, Hang and Zhu, Jun and others},
767
- journal={arXiv preprint arXiv:2303.05499},
768
- year={2023}
769
- }
770
- ```
 
1
+ ---
2
+ title: Grounding SAM Inpainting
3
+ emoji: 🐠
4
+ colorFrom: gray
5
+ colorTo: blue
6
+ sdk: gradio
7
+ sdk_version: 4.10.0
8
+ app_file: grounded_sam_inpainting_demo.py
9
+ pinned: false
10
+ license: apache-2.0
11
+ ---