Spaces:
Runtime error
Runtime error
Update README.md
Browse files
README.md
CHANGED
@@ -1,129 +1,38 @@
|
|
1 |
-
|
2 |
-
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
4 |
|
5 |
-
# Faster Segment Anything
|
6 |
|
7 |
-
|
8 |
|
9 |
-
![MobileSAM](assets/model_diagram.jpg?raw=true)
|
10 |
|
11 |
-
|
12 |
-
|
13 |
-
</p>
|
14 |
|
15 |
-
**MobileSAM** performs on par with the original SAM (at least visually) and keeps exactly the same pipeline as the original SAM except for a change on the image encoder. Specifically, we replace the original heavyweight ViT-H encoder (632M) with a much smaller Tiny-ViT (5M). On a single GPU, MobileSAM runs around 12ms per image: 8ms on the image encoder and 4ms on the mask decoder.
|
16 |
|
17 |
-
|
18 |
|
19 |
-
|
20 |
-
:-----------------------------------------:|:---------|:-----:
|
21 |
-
Paramters | 611M | 5M
|
22 |
-
Speed | 452ms | 8ms
|
23 |
|
24 |
-
Original SAM and MobileSAM have exactly the same prompt-guided mask decoder:
|
25 |
|
26 |
-
|
27 |
-
:-----------------------------------------:|:---------|:-----:
|
28 |
-
Paramters | 3.876M | 3.876M
|
29 |
-
Speed | 4ms | 4ms
|
30 |
-
|
31 |
-
The comparison of the whole pipeline is summarzed as follows:
|
32 |
-
Whole Pipeline (Enc+Dec) | Original SAM | MobileSAM
|
33 |
-
:-----------------------------------------:|:---------|:-----:
|
34 |
-
Paramters | 615M | 9.66M
|
35 |
-
Speed | 456ms | 12ms
|
36 |
-
|
37 |
-
**Original SAM and MobileSAM with a (single) point as the prompt.**
|
38 |
-
|
39 |
-
<p float="left">
|
40 |
-
<img src="assets/mask_point.jpg?raw=true" width="99.1%" />
|
41 |
-
</p>
|
42 |
-
|
43 |
-
**Original SAM and MobileSAM with a box as the prompt.**
|
44 |
-
<p float="left">
|
45 |
-
<img src="assets/mask_box.jpg?raw=true" width="99.1%" />
|
46 |
-
</p>
|
47 |
-
|
48 |
-
**Is MobileSAM faster and smaller than FastSAM? Yes, to our knowledge!**
|
49 |
-
MobileSAM is around 7 times smaller and around 5 times faster than the concurrent FastSAM.
|
50 |
-
The comparison of the whole pipeline is summarzed as follows:
|
51 |
-
Whole Pipeline (Enc+Dec) | FastSAM | MobileSAM
|
52 |
-
:-----------------------------------------:|:---------|:-----:
|
53 |
-
Paramters | 68M | 9.66M
|
54 |
-
Speed | 64ms |12ms
|
55 |
-
|
56 |
-
**Is MobileSAM better than FastSAM for performance? Yes, to our knowledge!**
|
57 |
-
FastSAM cannot work with a single prompt as the original SAM or our MobileSAM. Therefore, we compare the mIoU with two prompt points (with different pixel distances) and show the resutls as follows. Our MobileSAM is much better than FastSAM under this setup.
|
58 |
-
mIoU | FastSAM | MobileSAM
|
59 |
-
:-----------------------------------------:|:---------|:-----:
|
60 |
-
100 | 0.27 | 0.73
|
61 |
-
200 | 0.33 |0.71
|
62 |
-
300 | 0.37 |0.74
|
63 |
-
400 | 0.41 |0.73
|
64 |
-
500 | 0.41 |0.73
|
65 |
-
|
66 |
-
|
67 |
-
|
68 |
-
|
69 |
-
**How to Adapt from SAM to MobileSAM?** Since MobileSAM keeps exactly the same pipeline as the original SAM, we inherit pre-processing, post-processing, and all other interfaces from the original SAM. The users who use the original SAM can adapt to MobileSAM with zero effort, by assuming everything is exactly the same except for a smaller image encoder in the SAM.
|
70 |
-
|
71 |
-
**How is MobileSAM trained?** MobileSAM is trained on a single GPU with 100k datasets (1% of the original images) for less than a day. The training code will be available soon.
|
72 |
-
|
73 |
-
|
74 |
-
|
75 |
-
## Installation
|
76 |
-
|
77 |
-
The code requires `python>=3.8`, as well as `pytorch>=1.7` and `torchvision>=0.8`. Please follow the instructions [here](https://pytorch.org/get-started/locally/) to install both PyTorch and TorchVision dependencies. Installing both PyTorch and TorchVision with CUDA support is strongly recommended.
|
78 |
-
|
79 |
-
Install Mobile Segment Anything:
|
80 |
-
|
81 |
-
```
|
82 |
-
pip install git+https://github.com/ChaoningZhang/MobileSAM.git
|
83 |
-
```
|
84 |
-
|
85 |
-
or clone the repository locally and install with
|
86 |
-
|
87 |
-
```
|
88 |
-
git clone git@github.com:ChaoningZhang/MobileSAM.git
|
89 |
-
cd MobileSAM; pip install -e .
|
90 |
-
```
|
91 |
-
|
92 |
-
|
93 |
-
## <a name="GettingStarted"></a>Getting Started
|
94 |
-
The MobileSAM can be loaded in the following ways:
|
95 |
-
|
96 |
-
```
|
97 |
-
from mobile_encoder.setup_mobile_sam import setup_model
|
98 |
-
checkpoint = torch.load('../weights/mobile_sam.pt')
|
99 |
-
mobile_sam = setup_model()
|
100 |
-
mobile_sam.load_state_dict(checkpoint,strict=True)
|
101 |
-
```
|
102 |
-
|
103 |
-
Then the model can be easily used in just a few lines to get masks from a given prompt:
|
104 |
-
|
105 |
-
```
|
106 |
-
from segment_anything import SamPredictor
|
107 |
-
device = "cuda"
|
108 |
-
mobile_sam.to(device=device)
|
109 |
-
mobile_sam.eval()
|
110 |
-
predictor = SamPredictor(mobile_sam)
|
111 |
-
predictor.set_image(<your_image>)
|
112 |
-
masks, _, _ = predictor.predict(<input_prompts>)
|
113 |
-
```
|
114 |
-
|
115 |
-
or generate masks for an entire image:
|
116 |
-
|
117 |
-
```
|
118 |
-
from segment_anything import SamAutomaticMaskGenerator
|
119 |
|
120 |
-
|
121 |
-
|
122 |
-
```
|
123 |
|
|
|
124 |
|
125 |
-
|
126 |
-
If you use MobileSAM in your research, please use the following BibTeX entry. :mega: Thank you!
|
127 |
|
128 |
```bibtex
|
129 |
@article{mobile_sam,
|
@@ -133,40 +42,3 @@ If you use MobileSAM in your research, please use the following BibTeX entry. :m
|
|
133 |
year={2023}
|
134 |
}
|
135 |
```
|
136 |
-
|
137 |
-
## Acknowledgement
|
138 |
-
|
139 |
-
<details>
|
140 |
-
<summary>
|
141 |
-
<a href="https://github.com/facebookresearch/segment-anything">SAM</a> (Segment Anything) [<b>bib</b>]
|
142 |
-
</summary>
|
143 |
-
|
144 |
-
```bibtex
|
145 |
-
@article{kirillov2023segany,
|
146 |
-
title={Segment Anything},
|
147 |
-
author={Kirillov, Alexander and Mintun, Eric and Ravi, Nikhila and Mao, Hanzi and Rolland, Chloe and Gustafson, Laura and Xiao, Tete and Whitehead, Spencer and Berg, Alexander C. and Lo, Wan-Yen and Doll{\'a}r, Piotr and Girshick, Ross},
|
148 |
-
journal={arXiv:2304.02643},
|
149 |
-
year={2023}
|
150 |
-
}
|
151 |
-
```
|
152 |
-
</details>
|
153 |
-
|
154 |
-
|
155 |
-
|
156 |
-
<details>
|
157 |
-
<summary>
|
158 |
-
<a href="https://github.com/microsoft/Cream/tree/main/TinyViT">TinyViT</a> (TinyViT: Fast Pretraining Distillation for Small Vision Transformers) [<b>bib</b>]
|
159 |
-
</summary>
|
160 |
-
|
161 |
-
```bibtex
|
162 |
-
@InProceedings{tiny_vit,
|
163 |
-
title={TinyViT: Fast Pretraining Distillation for Small Vision Transformers},
|
164 |
-
author={Wu, Kan and Zhang, Jinnian and Peng, Houwen and Liu, Mengchen and Xiao, Bin and Fu, Jianlong and Yuan, Lu},
|
165 |
-
booktitle={European conference on computer vision (ECCV)},
|
166 |
-
year={2022}
|
167 |
-
```
|
168 |
-
</details>
|
169 |
-
|
170 |
-
|
171 |
-
|
172 |
-
|
|
|
1 |
+
---
|
2 |
+
title: MobileSAM
|
3 |
+
emoji: 🐠
|
4 |
+
colorFrom: indigo
|
5 |
+
colorTo: yellow
|
6 |
+
sdk: gradio
|
7 |
+
python_version: 3.8
|
8 |
+
sdk_version: 3.35.2
|
9 |
+
app_file: app.py
|
10 |
+
pinned: false
|
11 |
+
license: apache-2.0
|
12 |
+
---
|
13 |
|
14 |
+
# Faster Segment Anything(MobileSAM)
|
15 |
|
16 |
+
Official PyTorch Implementation of the <a href="https://github.com/ChaoningZhang/MobileSAM">.
|
17 |
|
|
|
18 |
|
19 |
+
**MobileSAM** performs on par with the original SAM (at least visually) and keeps exactly the same pipeline as the original SAM except for a change on the image encoder.
|
20 |
+
Specifically, we replace the original heavyweight ViT-H encoder (632M) with a much smaller Tiny-ViT (5M). On a single GPU, MobileSAM runs around 12ms per image: 8ms on the image encoder and 4ms on the mask decoder.
|
|
|
21 |
|
|
|
22 |
|
23 |
+
## License
|
24 |
|
25 |
+
The model is licensed under the [Apache 2.0 license](LICENSE).
|
|
|
|
|
|
|
26 |
|
|
|
27 |
|
28 |
+
## Acknowledgement
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
29 |
|
30 |
+
- [Segment Anything](https://segment-anything.com/) provides the SA-1B dataset and the base codes.
|
31 |
+
- [TinyViT](https://github.com/microsoft/Cream/tree/main/TinyViT) provides codes and pre-trained models.
|
|
|
32 |
|
33 |
+
## Citing MobileSAM
|
34 |
|
35 |
+
If you find this project useful for your research, please consider citing the following BibTeX entry.
|
|
|
36 |
|
37 |
```bibtex
|
38 |
@article{mobile_sam,
|
|
|
42 |
year={2023}
|
43 |
}
|
44 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|