File size: 7,631 Bytes
c2ddc7e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
<div align="center">

# Soldier-Offier Window self-Attention (SOWA)

<a href="https://pytorch.org/get-started/locally/"><img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-ee4c2c?logo=pytorch&logoColor=white"></a>
<a href="https://pytorchlightning.ai/"><img alt="Lightning" src="https://img.shields.io/badge/-Lightning-792ee5?logo=pytorchlightning&logoColor=white"></a>
<a href="https://hydra.cc/"><img alt="Config: Hydra" src="https://img.shields.io/badge/Config-Hydra-89b8cd"></a>
<a href="https://github.com/ashleve/lightning-hydra-template"><img alt="Template" src="https://img.shields.io/badge/-Lightning--Hydra--Template-017F2F?style=flat&logo=github&labelColor=gray"></a><br>
[![Paper](http://img.shields.io/badge/paper-arxiv.2407.03634-B31B1B.svg)](https://arxiv.org/abs/2407.03634)
[![Conference](http://img.shields.io/badge/AnyConference-year-4b44ce.svg)](https://papers.nips.cc/paper/2020)

</div>

## Description

<div align="center">
  <img src="https://github.com/huzongxiang/sowa/blob/resources/fig1.png" alt="concept" style="width: 50%;">
</div>

Visual anomaly detection is critical in industrial manufacturing, but traditional methods often rely on extensive
normal datasets and custom models, limiting scalability.
Recent advancements in large-scale visual-language models have significantly improved zero/few-shot anomaly detection. However, these approaches may not fully utilize hierarchical features, potentially missing nuanced details. We
introduce a window self-attention mechanism based on the
CLIP model, combined with learnable prompts to process
multi-level features within a Soldier-Offier Window selfAttention (SOWA) framework. Our method has been tested
on five benchmark datasets, demonstrating superior performance by leading in 18 out of 20 metrics compared to existing state-of-the-art techniques.

![architecture](https://github.com/huzongxiang/sowa/blob/resources/fig2.png)

## Installation

#### Pip

```bash
# clone project
git clone https://github.com/huzongxiang/sowa
cd sowa

# [OPTIONAL] create conda environment
conda create -n sowa python=3.9
conda activate sowa

# install pytorch according to instructions
# https://pytorch.org/get-started/

# install requirements
pip install -r requirements.txt
```

#### Conda

```bash
# clone project
git clone https://github.com/huzongxiang/sowa
cd sowa

# create conda environment and install dependencies
conda env create -f environment.yaml -n sowa

# activate conda environment
conda activate sowa
```

## How to run

Train model with default configuration

```bash
# train on CPU
python src/train.py trainer=cpu data=sowa_visa model=sowa_hfwa

# train on GPU
python src/train.py trainer=gpu data=sowa_visa model=sowa_hfwa
```

## Results

Comparisons with few-shot (K=4) anomaly detection methods on datasets of MVTec-AD, Visa, BTAD, DAGM and DTD Synthetic. 
| Metric    | Dataset        | WinCLIP     | April-GAN        | Ours        |
|-----------|----------------|-------------|-------------|-------------|
| AC AUROC  | MVTec-AD       | 95.2±1.3    | 92.8±0.2    | 96.8±0.3    |
|           | Visa           | 87.3±1.8    | 92.6±0.4    | 92.9±0.2    |
|           | BTAD           | 87.0±0.2    | 92.1±0.2    | 94.8±0.2    |
|           | DAGM           | 93.8±0.2    | 96.2±1.1    | 98.9±0.3    |
|           | DTD-Synthetic  | 98.1±0.2    | 98.5±0.1    | 99.1±0.0    |
| AC AP     | MVTec-AD       | 97.3±0.6    | 96.3±0.1    | 98.3±0.3    |
|           | Visa           | 88.8±1.8    | 94.5±0.3    | 94.5±0.2    |
|           | BTAD           | 86.8±0.0    | 95.2±0.5    | 95.5±0.7    |
|           | DAGM           | 83.8±1.1    | 86.7±4.5    | 95.2±1.7    |
|           | DTD-Synthetic  | 99.1±0.1    | 99.4±0.0    | 99.6±0.0    |
| AS AUROC  | MVTec-AD       | 96.2±0.3    | 95.9±0.0    | 95.7±0.1    |
|           | Visa           | 97.2±0.2    | 96.2±0.0    | 97.1±0.0    |
|           | BTAD           | 95.8±0.0    | 94.4±0.1    | 97.1±0.0    |
|           | DAGM           | 93.8±0.1    | 88.9±0.4    | 96.9±0.0    |
|           | DTD-Synthetic  | 96.8±0.2    | 96.7±0.0    | 98.7±0.0    |
| AS AUPRO  | MVTec-AD       | 89.0±0.8    | 91.8±0.1    | 92.4±0.2    |
|           | Visa           | 87.6±0.9    | 90.2±0.1    | 91.4±0.0    |
|           | BTAD           | 66.6±0.2    | 78.2±0.1    | 81.2±0.2    |
|           | DAGM           | 82.4±0.3    | 77.8±0.9    | 94.4±0.1    |
|           | DTD-Synthetic  | 90.1±0.5    | 92.2±0.0    | 96.6±0.1    | 

​<!-- 零宽空格 -->

Performance Comparison on MVTec-AD and Visa Datasets. 
| Method        | Source                  | MVTec-AD AC AUROC | MVTec-AD AS AUROC | MVTec-AD AS PRO | Visa AC AUROC | Visa AS AUROC | Visa AS PRO |
|---------------|-------------------------|-------------------|-------------------|-----------------|---------------|---------------|-------------|
| SPADE         | arXiv 2020              | 84.8±2.5          | 92.7±0.3          | 87.0±0.5        | 81.7±3.4      | 96.6±0.3      | 87.3±0.8    |
| PaDiM         | ICPR 2021               | 80.4±2.4          | 92.6±0.7          | 81.3±1.9        | 72.8±2.9      | 93.2±0.5      | 72.6±1.9    |
| PatchCore     | CVPR 2022               | 88.8±2.6          | 94.3±0.5          | 84.3±1.6        | 85.3±2.1      | 96.8±0.3      | 84.9±1.4    |
| WinCLIP       | CVPR 2023               | 95.2±1.3          | 96.2±0.3          | 89.0±0.8        | 87.3±1.8      | 97.2±0.2      | 87.6±0.9    |
| April-GAN     | CVPR 2023 VAND workshop | 92.8±0.2          | 95.9±0.0          | 91.8±0.1        | 92.6±0.4      | 96.2±0.0      | 90.2±0.1    |
| PromptAD      | CVPR 2024               | 96.6±0.9          | 96.5±0.2          | -               | 89.1±1.7      | 97.4±0.3      | -           |
| InCTRL        | CVPR 2024               | 94.5±1.8          | -                 | -               | 87.7±1.9      | -             | -           |
| SOWA          | Ours                    | 96.8±0.3          | 95.7±0.1          | 92.4±0.2        | 92.9±0.2      | 97.1±0.0      | 91.4±0.0    | 


​<!-- 零宽空格 -->

Comparisons with few-shot anomaly detection methods on datasets of MVTec-AD, Visa, BTAD, DAGM and DTD Synthetic. 
<div align="center">
  <img src="https://github.com/huzongxiang/sowa/blob/resources/fig5.png" alt="few-shot" style="width: 70%;">
</div>


## Visualization
Visualization results under the few-shot setting (K=4). 
<div align="center">
  <img src="https://github.com/huzongxiang/sowa/blob/resources/fig6.png" alt="concept" style="width: 70%;">
</div>


## Mechanism
Hierarchical Results on MVTec-AD Dataset. A set of images showing the real outputs of the model, illustrating how different layers (H1 to H4) process various feature modes. Each row represents a different sample, with columns showing the original image, segmentation mask, heatmap, and feature outputs from H1 to H4, and fusion.
![mechanism](https://github.com/huzongxiang/sowa/blob/resources/fig7.png)


## Inference Speed
Inference performance comparison of different methods on a single NVIDIA RTX3070 8GB GPU.
<div align="center">
  <img src="https://github.com/huzongxiang/sowa/blob/resources/fig9.png" alt="speed" style="width: 80%;">
</div>


## Citation
Please cite the following paper if this work helps your project: 
```
@article{hu2024sowa,
  title={SOWA: Adapting Hierarchical Frozen Window Self-Attention to Visual-Language Models for Better Anomaly Detection},
  author={Hu, Zongxiang and Zhang, zhaosheng},
  journal={arXiv preprint arXiv:2407.03634},
  year={2024}
}
```