Akide commited on
Commit
1cfdaf6
1 Parent(s): 80a9dba

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +83 -0
README.md CHANGED
@@ -3,3 +3,86 @@ license: other
3
  license_name: adelaidet-non-commercial
4
  license_link: https://github.com/zbwxp/SegVit/blob/master/LICENSE
5
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  license_name: adelaidet-non-commercial
4
  license_link: https://github.com/zbwxp/SegVit/blob/master/LICENSE
5
  ---
6
+
7
+ # Official Pytorch Implementation of SegViT [[code]](https://github.com/zbwxp/SegVit)
8
+
9
+ ### SegViT: Semantic Segmentation with Plain Vision Transformers
10
+
11
+ Zhang, Bowen and Tian, Zhi and Tang, Quan and Chu, Xiangxiang and Wei, Xiaolin and Shen, Chunhua and Liu, Yifan.
12
+
13
+ NeurIPS 2022. [[paper]](https://arxiv.org/abs/2210.05844)
14
+
15
+ ### SegViTv2: Exploring Efficient and Continual Semantic Segmentation with Plain Vision Transformers
16
+
17
+ Bowen Zhang, Liyang Liu, Minh Hieu Phan, Zhi Tian, Chunhua Shen and Yifan Liu.
18
+
19
+ IJCV 2023. [[paper]](https://arxiv.org/abs/2306.06289) [we are refactoring code for release ...]
20
+
21
+ This repository contains the official Pytorch implementation of training & evaluation code and the pretrained models for SegViT and the extended version SegViT v2.
22
+
23
+ ## Highlights
24
+ * **Simple Decoder:** The Attention-to-Mask (ATM) decoder provides a simple segmentation head for Plain Vision Transformer, which is easy to extend to other downstream tasks.
25
+ * **Light Structure:** We proposed *Shrunk* structure that can save up to **40%** computational cost in a structure with ViT backbone.
26
+ * **Stronger performance:** We got state-of-the-art performance mIoU **55.2%** on ADE20K, mIoU **50.3%** on COCOStuff10K, and mIoU **65.3%** on PASCAL-Context datasets with the least amount of computational cost among counterparts using ViT backbone.
27
+ * **Scaleability** SegViT v2 employed more powerful backbones (BEiT-V2) obtained state-of-the-art performance mIoU **58.2%** (MS) on ADE20K, mIoU **53.5%** (MS) on COCOStuff10K, and mIoU **67.14%** (MS) on PASCAL-Context datasets, showcasing strong scalability.
28
+ * **Continuals Learning** We propose to adapt SegViT v2 for continual semantic segmentation, demonstrating nearly zero forgetting of previously learned knowledge.
29
+
30
+ As shown in the following figure, the similarity between the class query and the image features is transfered to the segmentation mask.
31
+
32
+ <img src="./resources/v2_figure_1.png">
33
+ <img src="./resources/teaser-01.png">
34
+ <img src="resources/atm_arch-1.png">
35
+
36
+
37
+ ## Getting started
38
+
39
+ 1. Install the [mmsegmentation](https://github.com/open-mmlab/mmsegmentation) library and some required packages.
40
+
41
+ ```bash
42
+ pip install mmcv-full==1.4.4 mmsegmentation==0.24.0
43
+ pip install scipy timm
44
+ ```
45
+ ## Training
46
+ ```
47
+ python tools/dist_train.sh configs/segvit/segvit_vit-l_jax_640x640_160k_ade20k.py
48
+ ```
49
+ ## Evaluation
50
+ ```
51
+ python tools/dist_test.sh configs/segvit/segvit_vit-l_jax_640x640_160k_ade20k.py {path_to_ckpt}
52
+ ```
53
+
54
+ ## Datasets
55
+ Please follow the instructions of [mmsegmentation](https://github.com/open-mmlab/mmsegmentation) data preparation
56
+
57
+ ## Results
58
+ | Model backbone |datasets| mIoU | mIoU (ms) | GFlops | ckpt
59
+ | ------------------ |--------------|---------------- | -------------- |--- |---
60
+ Vit-Base | ADE20k | 51.3 | 53.0 | 120.9 |[model](https://huggingface.co/Akide/SegViTv1/blob/main/ade_51.3.pth)
61
+ Vit-Large (Shrunk) | ADE20k | 53.9 | 55.1 | 373.5 | [model](https://huggingface.co/Akide/SegViTv1/blob/main/ade_shrunk_53.9.pth)
62
+ Vit-Large | ADE20k | 54.6 | 55.2 | 637.9 | [model](https://huggingface.co/Akide/SegViTv1/blob/main/ade_54.6.pth)
63
+ Vit-Large (Shrunk) | COCOStuff10K | 49.1 | 49.4 | 224.8 | [model](https://huggingface.co/Akide/SegViTv1/blob/main/COCOstuff10k_shrunk_49.1.pth)
64
+ Vit-Large | COCOStuff10K | 49.9 | 50.3| 383.9 | [model](https://huggingface.co/Akide/SegViTv1/blob/main/COCOstuff_49.9.pth)
65
+ Vit-Large (Shrunk) | PASCAL-Context (59cls)| 62.3 | 63.7 | 186.9 | [model](https://huggingface.co/Akide/SegViTv1/blob/main/PC59cls_62.3.pth)
66
+ Vit-Large | PASCAL-Context (59cls)| 64.1 | 65.3 | 321.6 | [model](https://huggingface.co/Akide/SegViTv1/blob/main/PC59cls_64.1.pth)
67
+
68
+
69
+
70
+ ## License
71
+ For academic use, this project is licensed under the 2-clause BSD License - see the LICENSE file for details. For commercial use, please contact the authors.
72
+
73
+ ## Citation
74
+ ```
75
+ @article{zhang2022segvit,
76
+ title={SegViT: Semantic Segmentation with Plain Vision Transformers},
77
+ author={Zhang, Bowen and Tian, Zhi and Tang, Quan and Chu, Xiangxiang and Wei, Xiaolin and Shen, Chunhua and Liu, Yifan},
78
+ journal={NeurIPS},
79
+ year={2022}
80
+ }
81
+
82
+ @article{zhang2023segvitv2,
83
+ title={SegViTv2: Exploring Efficient and Continual Semantic Segmentation with Plain Vision Transformers},
84
+ author={Zhang, Bowen and Liu, Liyang and Phan, Minh Hieu and Tian, Zhi and Shen, Chunhua and Liu, Yifan},
85
+ journal={IJCV},
86
+ year={2023}
87
+ }
88
+ ```