Update README.md
Browse files
README.md
CHANGED
@@ -3,3 +3,86 @@ license: other
|
|
3 |
license_name: adelaidet-non-commercial
|
4 |
license_link: https://github.com/zbwxp/SegVit/blob/master/LICENSE
|
5 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3 |
license_name: adelaidet-non-commercial
|
4 |
license_link: https://github.com/zbwxp/SegVit/blob/master/LICENSE
|
5 |
---
|
6 |
+
|
7 |
+
# Official Pytorch Implementation of SegViT [[code]](https://github.com/zbwxp/SegVit)
|
8 |
+
|
9 |
+
### SegViT: Semantic Segmentation with Plain Vision Transformers
|
10 |
+
|
11 |
+
Zhang, Bowen and Tian, Zhi and Tang, Quan and Chu, Xiangxiang and Wei, Xiaolin and Shen, Chunhua and Liu, Yifan.
|
12 |
+
|
13 |
+
NeurIPS 2022. [[paper]](https://arxiv.org/abs/2210.05844)
|
14 |
+
|
15 |
+
### SegViTv2: Exploring Efficient and Continual Semantic Segmentation with Plain Vision Transformers
|
16 |
+
|
17 |
+
Bowen Zhang, Liyang Liu, Minh Hieu Phan, Zhi Tian, Chunhua Shen and Yifan Liu.
|
18 |
+
|
19 |
+
IJCV 2023. [[paper]](https://arxiv.org/abs/2306.06289) [we are refactoring code for release ...]
|
20 |
+
|
21 |
+
This repository contains the official Pytorch implementation of training & evaluation code and the pretrained models for SegViT and the extended version SegViT v2.
|
22 |
+
|
23 |
+
## Highlights
|
24 |
+
* **Simple Decoder:** The Attention-to-Mask (ATM) decoder provides a simple segmentation head for Plain Vision Transformer, which is easy to extend to other downstream tasks.
|
25 |
+
* **Light Structure:** We proposed *Shrunk* structure that can save up to **40%** computational cost in a structure with ViT backbone.
|
26 |
+
* **Stronger performance:** We got state-of-the-art performance mIoU **55.2%** on ADE20K, mIoU **50.3%** on COCOStuff10K, and mIoU **65.3%** on PASCAL-Context datasets with the least amount of computational cost among counterparts using ViT backbone.
|
27 |
+
* **Scaleability** SegViT v2 employed more powerful backbones (BEiT-V2) obtained state-of-the-art performance mIoU **58.2%** (MS) on ADE20K, mIoU **53.5%** (MS) on COCOStuff10K, and mIoU **67.14%** (MS) on PASCAL-Context datasets, showcasing strong scalability.
|
28 |
+
* **Continuals Learning** We propose to adapt SegViT v2 for continual semantic segmentation, demonstrating nearly zero forgetting of previously learned knowledge.
|
29 |
+
|
30 |
+
As shown in the following figure, the similarity between the class query and the image features is transfered to the segmentation mask.
|
31 |
+
|
32 |
+
<img src="./resources/v2_figure_1.png">
|
33 |
+
<img src="./resources/teaser-01.png">
|
34 |
+
<img src="resources/atm_arch-1.png">
|
35 |
+
|
36 |
+
|
37 |
+
## Getting started
|
38 |
+
|
39 |
+
1. Install the [mmsegmentation](https://github.com/open-mmlab/mmsegmentation) library and some required packages.
|
40 |
+
|
41 |
+
```bash
|
42 |
+
pip install mmcv-full==1.4.4 mmsegmentation==0.24.0
|
43 |
+
pip install scipy timm
|
44 |
+
```
|
45 |
+
## Training
|
46 |
+
```
|
47 |
+
python tools/dist_train.sh configs/segvit/segvit_vit-l_jax_640x640_160k_ade20k.py
|
48 |
+
```
|
49 |
+
## Evaluation
|
50 |
+
```
|
51 |
+
python tools/dist_test.sh configs/segvit/segvit_vit-l_jax_640x640_160k_ade20k.py {path_to_ckpt}
|
52 |
+
```
|
53 |
+
|
54 |
+
## Datasets
|
55 |
+
Please follow the instructions of [mmsegmentation](https://github.com/open-mmlab/mmsegmentation) data preparation
|
56 |
+
|
57 |
+
## Results
|
58 |
+
| Model backbone |datasets| mIoU | mIoU (ms) | GFlops | ckpt
|
59 |
+
| ------------------ |--------------|---------------- | -------------- |--- |---
|
60 |
+
Vit-Base | ADE20k | 51.3 | 53.0 | 120.9 |[model](https://huggingface.co/Akide/SegViTv1/blob/main/ade_51.3.pth)
|
61 |
+
Vit-Large (Shrunk) | ADE20k | 53.9 | 55.1 | 373.5 | [model](https://huggingface.co/Akide/SegViTv1/blob/main/ade_shrunk_53.9.pth)
|
62 |
+
Vit-Large | ADE20k | 54.6 | 55.2 | 637.9 | [model](https://huggingface.co/Akide/SegViTv1/blob/main/ade_54.6.pth)
|
63 |
+
Vit-Large (Shrunk) | COCOStuff10K | 49.1 | 49.4 | 224.8 | [model](https://huggingface.co/Akide/SegViTv1/blob/main/COCOstuff10k_shrunk_49.1.pth)
|
64 |
+
Vit-Large | COCOStuff10K | 49.9 | 50.3| 383.9 | [model](https://huggingface.co/Akide/SegViTv1/blob/main/COCOstuff_49.9.pth)
|
65 |
+
Vit-Large (Shrunk) | PASCAL-Context (59cls)| 62.3 | 63.7 | 186.9 | [model](https://huggingface.co/Akide/SegViTv1/blob/main/PC59cls_62.3.pth)
|
66 |
+
Vit-Large | PASCAL-Context (59cls)| 64.1 | 65.3 | 321.6 | [model](https://huggingface.co/Akide/SegViTv1/blob/main/PC59cls_64.1.pth)
|
67 |
+
|
68 |
+
|
69 |
+
|
70 |
+
## License
|
71 |
+
For academic use, this project is licensed under the 2-clause BSD License - see the LICENSE file for details. For commercial use, please contact the authors.
|
72 |
+
|
73 |
+
## Citation
|
74 |
+
```
|
75 |
+
@article{zhang2022segvit,
|
76 |
+
title={SegViT: Semantic Segmentation with Plain Vision Transformers},
|
77 |
+
author={Zhang, Bowen and Tian, Zhi and Tang, Quan and Chu, Xiangxiang and Wei, Xiaolin and Shen, Chunhua and Liu, Yifan},
|
78 |
+
journal={NeurIPS},
|
79 |
+
year={2022}
|
80 |
+
}
|
81 |
+
|
82 |
+
@article{zhang2023segvitv2,
|
83 |
+
title={SegViTv2: Exploring Efficient and Continual Semantic Segmentation with Plain Vision Transformers},
|
84 |
+
author={Zhang, Bowen and Liu, Liyang and Phan, Minh Hieu and Tian, Zhi and Shen, Chunhua and Liu, Yifan},
|
85 |
+
journal={IJCV},
|
86 |
+
year={2023}
|
87 |
+
}
|
88 |
+
```
|