Image Classification
English
cloudwalker commited on
Commit
e943c45
1 Parent(s): 96f7a41

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +181 -1
README.md CHANGED
@@ -9,4 +9,184 @@ language:
9
  metrics:
10
  - accuracy
11
  pipeline_tag: image-classification
12
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
  metrics:
10
  - accuracy
11
  pipeline_tag: image-classification
12
+ ---
13
+ # WaveMix
14
+ [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/wavemix-lite-a-resource-efficient-neural/image-classification-on-emnist-balanced)](https://paperswithcode.com/sota/image-classification-on-emnist-balanced?p=wavemix-lite-a-resource-efficient-neural)
15
+ [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/wavemix-lite-a-resource-efficient-neural/image-classification-on-emnist-byclass)](https://paperswithcode.com/sota/image-classification-on-emnist-byclass?p=wavemix-lite-a-resource-efficient-neural)
16
+ [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/wavemix-lite-a-resource-efficient-neural/image-classification-on-emnist-bymerge)](https://paperswithcode.com/sota/image-classification-on-emnist-bymerge?p=wavemix-lite-a-resource-efficient-neural)
17
+ [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/wavemix-lite-a-resource-efficient-neural/image-classification-on-emnist-digits)](https://paperswithcode.com/sota/image-classification-on-emnist-digits?p=wavemix-lite-a-resource-efficient-neural)
18
+ [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/wavemix-lite-a-resource-efficient-neural/image-classification-on-emnist-letters)](https://paperswithcode.com/sota/image-classification-on-emnist-letters?p=wavemix-lite-a-resource-efficient-neural)
19
+ [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/wavemix-lite-a-resource-efficient-neural/image-classification-on-inat2021-mini)](https://paperswithcode.com/sota/image-classification-on-inat2021-mini?p=wavemix-lite-a-resource-efficient-neural)
20
+ [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/wavemix-lite-a-resource-efficient-neural/scene-classification-on-places365-standard)](https://paperswithcode.com/sota/scene-classification-on-places365-standard?p=wavemix-lite-a-resource-efficient-neural)
21
+ [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/wavemix-lite-a-resource-efficient-neural/semantic-segmentation-on-cityscapes-val)](https://paperswithcode.com/sota/semantic-segmentation-on-cityscapes-val?p=wavemix-lite-a-resource-efficient-neural)
22
+ [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/wavemix-lite-a-resource-efficient-neural/image-classification-on-caltech-256)](https://paperswithcode.com/sota/image-classification-on-caltech-256?p=wavemix-lite-a-resource-efficient-neural)
23
+ [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/wavemix-lite-a-resource-efficient-neural/image-classification-on-places365-standard)](https://paperswithcode.com/sota/image-classification-on-places365-standard?p=wavemix-lite-a-resource-efficient-neural)
24
+ [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/wavemix-lite-a-resource-efficient-neural/image-classification-on-svhn)](https://paperswithcode.com/sota/image-classification-on-svhn?p=wavemix-lite-a-resource-efficient-neural)
25
+ [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/wavemix-lite-a-resource-efficient-neural/image-classification-on-tiny-imagenet-1)](https://paperswithcode.com/sota/image-classification-on-tiny-imagenet-1?p=wavemix-lite-a-resource-efficient-neural)
26
+ [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/wavemix-lite-a-resource-efficient-neural/image-classification-on-fashion-mnist)](https://paperswithcode.com/sota/image-classification-on-fashion-mnist?p=wavemix-lite-a-resource-efficient-neural)
27
+
28
+
29
+
30
+
31
+
32
+ ## Resource-efficient Token Mixing for Images using 2D Discrete Wavelet Transform
33
+
34
+ ### WaveMix Architecture
35
+ ![image](https://user-images.githubusercontent.com/15833382/226090639-b4571494-7d2d-4bcb-81e3-127916339dfe.png)
36
+
37
+ ### WaveMix-Lite
38
+ ![image](https://user-images.githubusercontent.com/15833382/226090664-d844e4f1-854a-43b3-8106-78307f187fe8.png)
39
+
40
+ We propose WaveMix– a novel neural architecture for computer vision that is resource-efficient yet generalizable and scalable. WaveMix networks achieve comparable or better accuracy than the state-of-the-art convolutional neural networks, vision transformers, and token mixers for several tasks, establishing new benchmarks for segmentation on Cityscapes; and for classification on Places-365, f ive EMNIST datasets, and iNAT-mini. Remarkably, WaveMix architectures require fewer parameters to achieve these benchmarks compared to the previous state-of-the-art. Moreover, when controlled for the number of parameters, WaveMix requires lesser GPU RAM, which translates to savings in time, cost, and energy. To achieve these gains we used multi-level two-dimensional discrete wavelet transform (2D-DWT) in WaveMix blocks, which has the following advantages: (1) It reorganizes spatial information based on three strong image priors– scale-invariance, shift-invariance, and sparseness of edges, (2) in a lossless manner without adding parameters, (3) while also reducing the spatial sizes of feature maps, which reduces the memory and time required for forward and backward passes, and (4) expanding the receptive field faster than convolutions do. The whole architecture is a stack of self-similar and resolution-preserving WaveMix blocks, which allows architectural f lexibility for various tasks and levels of resource availability.
41
+
42
+
43
+ | Task | Dataset | Metric | Value |
44
+ |-----------------------|-------------|----------|--------|
45
+ | Semantic Segmentation | Cityscapes | Single-scale mIoU | 82.70% (SOTA) |
46
+ | Image Classification | ImageNet-1k | Accuracy | 74.93% |
47
+
48
+ ### Parameter Efficiency
49
+ | Task | Model | Parameters |
50
+ |------------------------------|-------------------------------------------------|------------|
51
+ | 99% Accu. in MNIST | WaveMix Lite-8/10 | 3566 |
52
+ | 90% Accu. in Fashion MNIST | WaveMix Lite-8/5 | 7156 |
53
+ | 80% Accu. in CIFAR-10 | WaveMix Lite-32/7 | 37058 |
54
+ | 90% Accu. in CIFAR-10 | WaveMix Lite-64/6 | 520106 |
55
+
56
+ The high parameter efficiency is obtained by replacing Deconvolution layers with Upsampling
57
+
58
+ This is an implementation of code from the following papers : [Openreview Paper](https://openreview.net/forum?id=tBoSm4hUWV), [ArXiv Paper 1](https://arxiv.org/abs/2203.03689), [ArXiv Paper 2](https://arxiv.org/abs/2205.14375)
59
+
60
+ ## Install
61
+
62
+ ```bash
63
+ $ pip install wavemix
64
+ ```
65
+
66
+ ## Usage
67
+ ### Semantic Segmentation
68
+
69
+ ```python
70
+ import torch, wavemix
71
+ from wavemix.SemSegment import WaveMix
72
+ import torch
73
+
74
+ model = WaveMix(
75
+ num_classes= 20,
76
+ depth= 16,
77
+ mult= 2,
78
+ ff_channel= 256,
79
+ final_dim= 256,
80
+ dropout= 0.5,
81
+ level=4,
82
+ stride=2
83
+ )
84
+
85
+ img = torch.randn(1, 3, 256, 256)
86
+
87
+ preds = model(img) # (1, 20, 256, 256)
88
+ ```
89
+
90
+ ### Image Classification
91
+
92
+ ```python
93
+ import torch, wavemix
94
+ from wavemix.classification import WaveMix
95
+ import torch
96
+
97
+ model = WaveMix(
98
+ num_classes= 1000,
99
+ depth= 16,
100
+ mult= 2,
101
+ ff_channel= 192,
102
+ final_dim= 192,
103
+ dropout= 0.5,
104
+ level=3,
105
+ patch_size=4,
106
+ )
107
+ img = torch.randn(1, 3, 256, 256)
108
+
109
+ preds = model(img) # (1, 1000)
110
+ ```
111
+
112
+ ### Single Image Super-resolution
113
+
114
+ ```python
115
+ import wavemix, torch
116
+ from wavemix.sisr import WaveMix
117
+
118
+ model = WaveMix(
119
+ depth = 4,
120
+ mult = 2,
121
+ ff_channel = 144,
122
+ final_dim = 144,
123
+ dropout = 0.5,
124
+ level=1,
125
+ )
126
+
127
+ img = torch.randn(1, 3, 256, 256)
128
+ out = model(img) # (1, 3, 512, 512)
129
+ ```
130
+
131
+ ### To use a single Waveblock
132
+
133
+ ```python
134
+ import wavemix, torch
135
+ from wavemix import Level1Waveblock
136
+
137
+ ```
138
+
139
+ ## Parameters
140
+
141
+ - `num_classes`: int.
142
+ Number of classes to classify/segment.
143
+ - `depth`: int.
144
+ Number of WaveMix blocks.
145
+ - `mult`: int.
146
+ Expansion of channels in the MLP (FeedForward) layer.
147
+ - `ff_channel`: int.
148
+ No. of output channels from the MLP (FeedForward) layer.
149
+ - `final_dim`: int.
150
+ Final dimension of output tensor after initial Conv layers. Channel dimension when tensor is fed to WaveBlocks.
151
+ - `dropout`: float between `[0, 1]`, default `0.`.
152
+ Dropout rate.
153
+ - `level`: int.
154
+ Number of levels of 2D wavelet transform to be used in Waveblocks. Currently supports levels from 1 to 4.
155
+ - `stride`: int.
156
+ Stride used in the initial convolutional layers to reduce the input resolution before being fed to Waveblocks.
157
+ - `initial_conv`: str.
158
+ Deciding between strided convolution or patchifying convolutions in the intial conv layer. Used for classification. 'pachify' or 'strided'.
159
+ - `patch_size`: int.
160
+ Size of each non-overlaping patch in case of patchifying convolution. Should be a multiple of 4.
161
+
162
+
163
+ #### Cite the following papers
164
+ ```
165
+ @misc{
166
+ p2022wavemix,
167
+ title={WaveMix: Multi-Resolution Token Mixing for Images},
168
+ author={Pranav Jeevan P and Amit Sethi},
169
+ year={2022},
170
+ url={https://openreview.net/forum?id=tBoSm4hUWV}
171
+ }
172
+
173
+ @misc{jeevan2022wavemix,
174
+ title={WaveMix: Resource-efficient Token Mixing for Images},
175
+ author={Pranav Jeevan and Amit Sethi},
176
+ year={2022},
177
+ eprint={2203.03689},
178
+ archivePrefix={arXiv},
179
+ primaryClass={cs.CV}
180
+ }
181
+
182
+ @misc{jeevan2023wavemix,
183
+ title={WaveMix: A Resource-efficient Neural Network for Image Analysis},
184
+ author={Pranav Jeevan and Kavitha Viswanathan and Anandu A S and Amit Sethi},
185
+ year={2023},
186
+ eprint={2205.14375},
187
+ archivePrefix={arXiv},
188
+ primaryClass={cs.CV}
189
+ }
190
+
191
+ ```
192
+