Upload README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,153 @@
|
|
1 |
-
|
2 |
-
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
<div align="center">
|
2 |
+
|
3 |
+
# Soldier-Offier Window self-Attention (SOWA)
|
4 |
+
|
5 |
+
<a href="https://pytorch.org/get-started/locally/"><img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-ee4c2c?logo=pytorch&logoColor=white"></a>
|
6 |
+
<a href="https://pytorchlightning.ai/"><img alt="Lightning" src="https://img.shields.io/badge/-Lightning-792ee5?logo=pytorchlightning&logoColor=white"></a>
|
7 |
+
<a href="https://hydra.cc/"><img alt="Config: Hydra" src="https://img.shields.io/badge/Config-Hydra-89b8cd"></a>
|
8 |
+
<a href="https://github.com/ashleve/lightning-hydra-template"><img alt="Template" src="https://img.shields.io/badge/-Lightning--Hydra--Template-017F2F?style=flat&logo=github&labelColor=gray"></a><br>
|
9 |
+
[![Paper](http://img.shields.io/badge/paper-arxiv.2407.03634-B31B1B.svg)](https://arxiv.org/abs/2407.03634)
|
10 |
+
[![Conference](http://img.shields.io/badge/AnyConference-year-4b44ce.svg)](https://papers.nips.cc/paper/2020)
|
11 |
+
|
12 |
+
</div>
|
13 |
+
|
14 |
+
## Description
|
15 |
+
|
16 |
+
<div align="center">
|
17 |
+
<img src="https://github.com/huzongxiang/sowa/blob/resources/fig1.png" alt="concept" style="width: 50%;">
|
18 |
+
</div>
|
19 |
+
|
20 |
+
Visual anomaly detection is critical in industrial manufacturing, but traditional methods often rely on extensive
|
21 |
+
normal datasets and custom models, limiting scalability.
|
22 |
+
Recent advancements in large-scale visual-language models have significantly improved zero/few-shot anomaly detection. However, these approaches may not fully utilize hierarchical features, potentially missing nuanced details. We
|
23 |
+
introduce a window self-attention mechanism based on the
|
24 |
+
CLIP model, combined with learnable prompts to process
|
25 |
+
multi-level features within a Soldier-Offier Window selfAttention (SOWA) framework. Our method has been tested
|
26 |
+
on five benchmark datasets, demonstrating superior performance by leading in 18 out of 20 metrics compared to existing state-of-the-art techniques.
|
27 |
+
|
28 |
+
![architecture](https://github.com/huzongxiang/sowa/blob/resources/fig2.png)
|
29 |
+
|
30 |
+
## Installation
|
31 |
+
|
32 |
+
#### Pip
|
33 |
+
|
34 |
+
```bash
|
35 |
+
# clone project
|
36 |
+
git clone https://github.com/huzongxiang/sowa
|
37 |
+
cd sowa
|
38 |
+
|
39 |
+
# [OPTIONAL] create conda environment
|
40 |
+
conda create -n sowa python=3.9
|
41 |
+
conda activate sowa
|
42 |
+
|
43 |
+
# install pytorch according to instructions
|
44 |
+
# https://pytorch.org/get-started/
|
45 |
+
|
46 |
+
# install requirements
|
47 |
+
pip install -r requirements.txt
|
48 |
+
```
|
49 |
+
|
50 |
+
#### Conda
|
51 |
+
|
52 |
+
```bash
|
53 |
+
# clone project
|
54 |
+
git clone https://github.com/huzongxiang/sowa
|
55 |
+
cd sowa
|
56 |
+
|
57 |
+
# create conda environment and install dependencies
|
58 |
+
conda env create -f environment.yaml -n sowa
|
59 |
+
|
60 |
+
# activate conda environment
|
61 |
+
conda activate sowa
|
62 |
+
```
|
63 |
+
|
64 |
+
## How to run
|
65 |
+
|
66 |
+
Train model with default configuration
|
67 |
+
|
68 |
+
```bash
|
69 |
+
# train on CPU
|
70 |
+
python src/train.py trainer=cpu data=sowa_visa model=sowa_hfwa
|
71 |
+
|
72 |
+
# train on GPU
|
73 |
+
python src/train.py trainer=gpu data=sowa_visa model=sowa_hfwa
|
74 |
+
```
|
75 |
+
|
76 |
+
## Results
|
77 |
+
|
78 |
+
Comparisons with few-shot (K=4) anomaly detection methods on datasets of MVTec-AD, Visa, BTAD, DAGM and DTD Synthetic.
|
79 |
+
| Metric | Dataset | WinCLIP | April-GAN | Ours |
|
80 |
+
|-----------|----------------|-------------|-------------|-------------|
|
81 |
+
| AC AUROC | MVTec-AD | 95.2±1.3 | 92.8±0.2 | 96.8±0.3 |
|
82 |
+
| | Visa | 87.3±1.8 | 92.6±0.4 | 92.9±0.2 |
|
83 |
+
| | BTAD | 87.0±0.2 | 92.1±0.2 | 94.8±0.2 |
|
84 |
+
| | DAGM | 93.8±0.2 | 96.2±1.1 | 98.9±0.3 |
|
85 |
+
| | DTD-Synthetic | 98.1±0.2 | 98.5±0.1 | 99.1±0.0 |
|
86 |
+
| AC AP | MVTec-AD | 97.3±0.6 | 96.3±0.1 | 98.3±0.3 |
|
87 |
+
| | Visa | 88.8±1.8 | 94.5±0.3 | 94.5±0.2 |
|
88 |
+
| | BTAD | 86.8±0.0 | 95.2±0.5 | 95.5±0.7 |
|
89 |
+
| | DAGM | 83.8±1.1 | 86.7±4.5 | 95.2±1.7 |
|
90 |
+
| | DTD-Synthetic | 99.1±0.1 | 99.4±0.0 | 99.6±0.0 |
|
91 |
+
| AS AUROC | MVTec-AD | 96.2±0.3 | 95.9±0.0 | 95.7±0.1 |
|
92 |
+
| | Visa | 97.2±0.2 | 96.2±0.0 | 97.1±0.0 |
|
93 |
+
| | BTAD | 95.8±0.0 | 94.4±0.1 | 97.1±0.0 |
|
94 |
+
| | DAGM | 93.8±0.1 | 88.9±0.4 | 96.9±0.0 |
|
95 |
+
| | DTD-Synthetic | 96.8±0.2 | 96.7±0.0 | 98.7±0.0 |
|
96 |
+
| AS AUPRO | MVTec-AD | 89.0±0.8 | 91.8±0.1 | 92.4±0.2 |
|
97 |
+
| | Visa | 87.6±0.9 | 90.2±0.1 | 91.4±0.0 |
|
98 |
+
| | BTAD | 66.6±0.2 | 78.2±0.1 | 81.2±0.2 |
|
99 |
+
| | DAGM | 82.4±0.3 | 77.8±0.9 | 94.4±0.1 |
|
100 |
+
| | DTD-Synthetic | 90.1±0.5 | 92.2±0.0 | 96.6±0.1 |
|
101 |
+
|
102 |
+
<!-- 零宽空格 -->
|
103 |
+
|
104 |
+
Performance Comparison on MVTec-AD and Visa Datasets.
|
105 |
+
| Method | Source | MVTec-AD AC AUROC | MVTec-AD AS AUROC | MVTec-AD AS PRO | Visa AC AUROC | Visa AS AUROC | Visa AS PRO |
|
106 |
+
|---------------|-------------------------|-------------------|-------------------|-----------------|---------------|---------------|-------------|
|
107 |
+
| SPADE | arXiv 2020 | 84.8±2.5 | 92.7±0.3 | 87.0±0.5 | 81.7±3.4 | 96.6±0.3 | 87.3±0.8 |
|
108 |
+
| PaDiM | ICPR 2021 | 80.4±2.4 | 92.6±0.7 | 81.3±1.9 | 72.8±2.9 | 93.2±0.5 | 72.6±1.9 |
|
109 |
+
| PatchCore | CVPR 2022 | 88.8±2.6 | 94.3±0.5 | 84.3±1.6 | 85.3±2.1 | 96.8±0.3 | 84.9±1.4 |
|
110 |
+
| WinCLIP | CVPR 2023 | 95.2±1.3 | 96.2±0.3 | 89.0±0.8 | 87.3±1.8 | 97.2±0.2 | 87.6±0.9 |
|
111 |
+
| April-GAN | CVPR 2023 VAND workshop | 92.8±0.2 | 95.9±0.0 | 91.8±0.1 | 92.6±0.4 | 96.2±0.0 | 90.2±0.1 |
|
112 |
+
| PromptAD | CVPR 2024 | 96.6±0.9 | 96.5±0.2 | - | 89.1±1.7 | 97.4±0.3 | - |
|
113 |
+
| InCTRL | CVPR 2024 | 94.5±1.8 | - | - | 87.7±1.9 | - | - |
|
114 |
+
| SOWA | Ours | 96.8±0.3 | 95.7±0.1 | 92.4±0.2 | 92.9±0.2 | 97.1±0.0 | 91.4±0.0 |
|
115 |
+
|
116 |
+
|
117 |
+
<!-- 零宽空格 -->
|
118 |
+
|
119 |
+
Comparisons with few-shot anomaly detection methods on datasets of MVTec-AD, Visa, BTAD, DAGM and DTD Synthetic.
|
120 |
+
<div align="center">
|
121 |
+
<img src="https://github.com/huzongxiang/sowa/blob/resources/fig5.png" alt="few-shot" style="width: 70%;">
|
122 |
+
</div>
|
123 |
+
|
124 |
+
|
125 |
+
## Visualization
|
126 |
+
Visualization results under the few-shot setting (K=4).
|
127 |
+
<div align="center">
|
128 |
+
<img src="https://github.com/huzongxiang/sowa/blob/resources/fig6.png" alt="concept" style="width: 70%;">
|
129 |
+
</div>
|
130 |
+
|
131 |
+
|
132 |
+
## Mechanism
|
133 |
+
Hierarchical Results on MVTec-AD Dataset. A set of images showing the real outputs of the model, illustrating how different layers (H1 to H4) process various feature modes. Each row represents a different sample, with columns showing the original image, segmentation mask, heatmap, and feature outputs from H1 to H4, and fusion.
|
134 |
+
![mechanism](https://github.com/huzongxiang/sowa/blob/resources/fig7.png)
|
135 |
+
|
136 |
+
|
137 |
+
## Inference Speed
|
138 |
+
Inference performance comparison of different methods on a single NVIDIA RTX3070 8GB GPU.
|
139 |
+
<div align="center">
|
140 |
+
<img src="https://github.com/huzongxiang/sowa/blob/resources/fig9.png" alt="speed" style="width: 80%;">
|
141 |
+
</div>
|
142 |
+
|
143 |
+
|
144 |
+
## Citation
|
145 |
+
Please cite the following paper if this work helps your project:
|
146 |
+
```
|
147 |
+
@article{hu2024sowa,
|
148 |
+
title={SOWA: Adapting Hierarchical Frozen Window Self-Attention to Visual-Language Models for Better Anomaly Detection},
|
149 |
+
author={Hu, Zongxiang and Zhang, zhaosheng},
|
150 |
+
journal={arXiv preprint arXiv:2407.03634},
|
151 |
+
year={2024}
|
152 |
+
}
|
153 |
+
```
|