Update README.md
Browse files
README.md
CHANGED
@@ -5,7 +5,7 @@ This is a PyTorch implementation of **Mugs** proposed by our paper "**Mugs: A Mu
|
|
5 |
[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mugs-a-multi-granular-self-supervised/self-supervised-image-classification-on)](https://paperswithcode.com/sota/self-supervised-image-classification-on?p=mugs-a-multi-granular-self-supervised)
|
6 |
|
7 |
<div align="center">
|
8 |
-
<img width="
|
9 |
</div>
|
10 |
|
11 |
**<p align="center">Fig 1. Overall framework of Mugs.** In (a), for each image, two random crops of one image
|
@@ -93,9 +93,10 @@ You can choose to download only the weights of the pretrained backbone used for
|
|
93 |
</table>
|
94 |
|
95 |
<div align="center">
|
96 |
-
<img width="
|
97 |
</div>
|
98 |
|
|
|
99 |
**<p align="center">Fig 2. Comparison of linear probing accuracy on ImageNet-1K.**</p>
|
100 |
|
101 |
## Pretraining Settings
|
@@ -149,9 +150,10 @@ We are cleaning up the evalutation code and will release them when they are read
|
|
149 |
## Self-attention visualization
|
150 |
Here we provide the self-attention map of the [CLS] token on the heads of the last layer
|
151 |
<div align="center">
|
152 |
-
<img width="
|
153 |
</div>
|
154 |
|
|
|
155 |
**<p align="center">Fig 3. Self-attention from a ViT-Base/16 trained with Mugs.**</p>
|
156 |
|
157 |
|
@@ -160,10 +162,14 @@ Here we provide the T-SNE visualization of the learned feature by ViT-B/16.
|
|
160 |
We show the fish classes in ImageNet-1K, i.e., the first six classes,
|
161 |
including tench, goldfish, white shark, tiger shark, hammerhead, electric
|
162 |
ray. See more examples in Appendix.
|
163 |
-
|
164 |
-
|
|
|
|
|
165 |
</div>
|
166 |
|
|
|
|
|
167 |
**<p align="center">Fig 4. T-SNE visualization of the learned feature by ViT-B/16.**</p>
|
168 |
|
169 |
## License
|
|
|
5 |
[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mugs-a-multi-granular-self-supervised/self-supervised-image-classification-on)](https://paperswithcode.com/sota/self-supervised-image-classification-on?p=mugs-a-multi-granular-self-supervised)
|
6 |
|
7 |
<div align="center">
|
8 |
+
<img width="75%" alt="Overall framework of Mugs. " src="https://huggingface.co/zhoupans/Mugs_ViT_large_pretrained/resolve/main/exp_illustration/framework.png">
|
9 |
</div>
|
10 |
|
11 |
**<p align="center">Fig 1. Overall framework of Mugs.** In (a), for each image, two random crops of one image
|
|
|
93 |
</table>
|
94 |
|
95 |
<div align="center">
|
96 |
+
<img width="75%" alt="Comparison of linear probing accuracy on ImageNet-1K." src="https://huggingface.co/zhoupans/Mugs_ViT_large_pretrained/resolve/main/exp_illustration/comparison.png">
|
97 |
</div>
|
98 |
|
99 |
+
|
100 |
**<p align="center">Fig 2. Comparison of linear probing accuracy on ImageNet-1K.**</p>
|
101 |
|
102 |
## Pretraining Settings
|
|
|
150 |
## Self-attention visualization
|
151 |
Here we provide the self-attention map of the [CLS] token on the heads of the last layer
|
152 |
<div align="center">
|
153 |
+
<img width="75%" alt="Self-attention from a ViT-Base/16 trained with Mugs" src="https://huggingface.co/zhoupans/Mugs_ViT_large_pretrained/resolve/main/exp_illustration/attention_vis.png">
|
154 |
</div>
|
155 |
|
156 |
+
|
157 |
**<p align="center">Fig 3. Self-attention from a ViT-Base/16 trained with Mugs.**</p>
|
158 |
|
159 |
|
|
|
162 |
We show the fish classes in ImageNet-1K, i.e., the first six classes,
|
163 |
including tench, goldfish, white shark, tiger shark, hammerhead, electric
|
164 |
ray. See more examples in Appendix.
|
165 |
+
|
166 |
+
|
167 |
+
<div align="center">
|
168 |
+
<img width="90%" alt="T-SNE visualization of the learned feature by ViT-B/16." src="https://huggingface.co/zhoupans/Mugs_ViT_large_pretrained/resolve/main/exp_illustration/TSNE.png">
|
169 |
</div>
|
170 |
|
171 |
+
|
172 |
+
|
173 |
**<p align="center">Fig 4. T-SNE visualization of the learned feature by ViT-B/16.**</p>
|
174 |
|
175 |
## License
|