zongxiang commited on
Commit
c2ddc7e
1 Parent(s): 7fe0374

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +153 -3
README.md CHANGED
@@ -1,3 +1,153 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <div align="center">
2
+
3
+ # Soldier-Offier Window self-Attention (SOWA)
4
+
5
+ <a href="https://pytorch.org/get-started/locally/"><img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-ee4c2c?logo=pytorch&logoColor=white"></a>
6
+ <a href="https://pytorchlightning.ai/"><img alt="Lightning" src="https://img.shields.io/badge/-Lightning-792ee5?logo=pytorchlightning&logoColor=white"></a>
7
+ <a href="https://hydra.cc/"><img alt="Config: Hydra" src="https://img.shields.io/badge/Config-Hydra-89b8cd"></a>
8
+ <a href="https://github.com/ashleve/lightning-hydra-template"><img alt="Template" src="https://img.shields.io/badge/-Lightning--Hydra--Template-017F2F?style=flat&logo=github&labelColor=gray"></a><br>
9
+ [![Paper](http://img.shields.io/badge/paper-arxiv.2407.03634-B31B1B.svg)](https://arxiv.org/abs/2407.03634)
10
+ [![Conference](http://img.shields.io/badge/AnyConference-year-4b44ce.svg)](https://papers.nips.cc/paper/2020)
11
+
12
+ </div>
13
+
14
+ ## Description
15
+
16
+ <div align="center">
17
+ <img src="https://github.com/huzongxiang/sowa/blob/resources/fig1.png" alt="concept" style="width: 50%;">
18
+ </div>
19
+
20
+ Visual anomaly detection is critical in industrial manufacturing, but traditional methods often rely on extensive
21
+ normal datasets and custom models, limiting scalability.
22
+ Recent advancements in large-scale visual-language models have significantly improved zero/few-shot anomaly detection. However, these approaches may not fully utilize hierarchical features, potentially missing nuanced details. We
23
+ introduce a window self-attention mechanism based on the
24
+ CLIP model, combined with learnable prompts to process
25
+ multi-level features within a Soldier-Offier Window selfAttention (SOWA) framework. Our method has been tested
26
+ on five benchmark datasets, demonstrating superior performance by leading in 18 out of 20 metrics compared to existing state-of-the-art techniques.
27
+
28
+ ![architecture](https://github.com/huzongxiang/sowa/blob/resources/fig2.png)
29
+
30
+ ## Installation
31
+
32
+ #### Pip
33
+
34
+ ```bash
35
+ # clone project
36
+ git clone https://github.com/huzongxiang/sowa
37
+ cd sowa
38
+
39
+ # [OPTIONAL] create conda environment
40
+ conda create -n sowa python=3.9
41
+ conda activate sowa
42
+
43
+ # install pytorch according to instructions
44
+ # https://pytorch.org/get-started/
45
+
46
+ # install requirements
47
+ pip install -r requirements.txt
48
+ ```
49
+
50
+ #### Conda
51
+
52
+ ```bash
53
+ # clone project
54
+ git clone https://github.com/huzongxiang/sowa
55
+ cd sowa
56
+
57
+ # create conda environment and install dependencies
58
+ conda env create -f environment.yaml -n sowa
59
+
60
+ # activate conda environment
61
+ conda activate sowa
62
+ ```
63
+
64
+ ## How to run
65
+
66
+ Train model with default configuration
67
+
68
+ ```bash
69
+ # train on CPU
70
+ python src/train.py trainer=cpu data=sowa_visa model=sowa_hfwa
71
+
72
+ # train on GPU
73
+ python src/train.py trainer=gpu data=sowa_visa model=sowa_hfwa
74
+ ```
75
+
76
+ ## Results
77
+
78
+ Comparisons with few-shot (K=4) anomaly detection methods on datasets of MVTec-AD, Visa, BTAD, DAGM and DTD Synthetic.
79
+ | Metric | Dataset | WinCLIP | April-GAN | Ours |
80
+ |-----------|----------------|-------------|-------------|-------------|
81
+ | AC AUROC | MVTec-AD | 95.2±1.3 | 92.8±0.2 | 96.8±0.3 |
82
+ | | Visa | 87.3±1.8 | 92.6±0.4 | 92.9±0.2 |
83
+ | | BTAD | 87.0±0.2 | 92.1±0.2 | 94.8±0.2 |
84
+ | | DAGM | 93.8±0.2 | 96.2±1.1 | 98.9±0.3 |
85
+ | | DTD-Synthetic | 98.1±0.2 | 98.5±0.1 | 99.1±0.0 |
86
+ | AC AP | MVTec-AD | 97.3±0.6 | 96.3±0.1 | 98.3±0.3 |
87
+ | | Visa | 88.8±1.8 | 94.5±0.3 | 94.5±0.2 |
88
+ | | BTAD | 86.8±0.0 | 95.2±0.5 | 95.5±0.7 |
89
+ | | DAGM | 83.8±1.1 | 86.7±4.5 | 95.2±1.7 |
90
+ | | DTD-Synthetic | 99.1±0.1 | 99.4±0.0 | 99.6±0.0 |
91
+ | AS AUROC | MVTec-AD | 96.2±0.3 | 95.9±0.0 | 95.7±0.1 |
92
+ | | Visa | 97.2±0.2 | 96.2±0.0 | 97.1±0.0 |
93
+ | | BTAD | 95.8±0.0 | 94.4±0.1 | 97.1±0.0 |
94
+ | | DAGM | 93.8±0.1 | 88.9±0.4 | 96.9±0.0 |
95
+ | | DTD-Synthetic | 96.8±0.2 | 96.7±0.0 | 98.7±0.0 |
96
+ | AS AUPRO | MVTec-AD | 89.0±0.8 | 91.8±0.1 | 92.4±0.2 |
97
+ | | Visa | 87.6±0.9 | 90.2±0.1 | 91.4±0.0 |
98
+ | | BTAD | 66.6±0.2 | 78.2±0.1 | 81.2±0.2 |
99
+ | | DAGM | 82.4±0.3 | 77.8±0.9 | 94.4±0.1 |
100
+ | | DTD-Synthetic | 90.1±0.5 | 92.2±0.0 | 96.6±0.1 |
101
+
102
+ ​<!-- 零宽空格 -->
103
+
104
+ Performance Comparison on MVTec-AD and Visa Datasets.
105
+ | Method | Source | MVTec-AD AC AUROC | MVTec-AD AS AUROC | MVTec-AD AS PRO | Visa AC AUROC | Visa AS AUROC | Visa AS PRO |
106
+ |---------------|-------------------------|-------------------|-------------------|-----------------|---------------|---------------|-------------|
107
+ | SPADE | arXiv 2020 | 84.8±2.5 | 92.7±0.3 | 87.0±0.5 | 81.7±3.4 | 96.6±0.3 | 87.3±0.8 |
108
+ | PaDiM | ICPR 2021 | 80.4±2.4 | 92.6±0.7 | 81.3±1.9 | 72.8±2.9 | 93.2±0.5 | 72.6±1.9 |
109
+ | PatchCore | CVPR 2022 | 88.8±2.6 | 94.3±0.5 | 84.3±1.6 | 85.3±2.1 | 96.8±0.3 | 84.9±1.4 |
110
+ | WinCLIP | CVPR 2023 | 95.2±1.3 | 96.2±0.3 | 89.0±0.8 | 87.3±1.8 | 97.2±0.2 | 87.6±0.9 |
111
+ | April-GAN | CVPR 2023 VAND workshop | 92.8±0.2 | 95.9±0.0 | 91.8±0.1 | 92.6±0.4 | 96.2±0.0 | 90.2±0.1 |
112
+ | PromptAD | CVPR 2024 | 96.6±0.9 | 96.5±0.2 | - | 89.1±1.7 | 97.4±0.3 | - |
113
+ | InCTRL | CVPR 2024 | 94.5±1.8 | - | - | 87.7±1.9 | - | - |
114
+ | SOWA | Ours | 96.8±0.3 | 95.7±0.1 | 92.4±0.2 | 92.9±0.2 | 97.1±0.0 | 91.4±0.0 |
115
+
116
+
117
+ ​<!-- 零宽空格 -->
118
+
119
+ Comparisons with few-shot anomaly detection methods on datasets of MVTec-AD, Visa, BTAD, DAGM and DTD Synthetic.
120
+ <div align="center">
121
+ <img src="https://github.com/huzongxiang/sowa/blob/resources/fig5.png" alt="few-shot" style="width: 70%;">
122
+ </div>
123
+
124
+
125
+ ## Visualization
126
+ Visualization results under the few-shot setting (K=4).
127
+ <div align="center">
128
+ <img src="https://github.com/huzongxiang/sowa/blob/resources/fig6.png" alt="concept" style="width: 70%;">
129
+ </div>
130
+
131
+
132
+ ## Mechanism
133
+ Hierarchical Results on MVTec-AD Dataset. A set of images showing the real outputs of the model, illustrating how different layers (H1 to H4) process various feature modes. Each row represents a different sample, with columns showing the original image, segmentation mask, heatmap, and feature outputs from H1 to H4, and fusion.
134
+ ![mechanism](https://github.com/huzongxiang/sowa/blob/resources/fig7.png)
135
+
136
+
137
+ ## Inference Speed
138
+ Inference performance comparison of different methods on a single NVIDIA RTX3070 8GB GPU.
139
+ <div align="center">
140
+ <img src="https://github.com/huzongxiang/sowa/blob/resources/fig9.png" alt="speed" style="width: 80%;">
141
+ </div>
142
+
143
+
144
+ ## Citation
145
+ Please cite the following paper if this work helps your project:
146
+ ```
147
+ @article{hu2024sowa,
148
+ title={SOWA: Adapting Hierarchical Frozen Window Self-Attention to Visual-Language Models for Better Anomaly Detection},
149
+ author={Hu, Zongxiang and Zhang, zhaosheng},
150
+ journal={arXiv preprint arXiv:2407.03634},
151
+ year={2024}
152
+ }
153
+ ```