ms57rd commited on
Commit
e53b39b
1 Parent(s): f7b9425

Upload 3 files

Browse files

The Res-VMamba weight in paper https://arxiv.org/abs/2402.15761 , which was trained on CNFOOD-241-Chen.

Files changed (3) hide show
  1. ckpt_epoch_166.pth +3 -0
  2. config.json +99 -0
  3. log_rank0.txt +1233 -0
ckpt_epoch_166.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:457b615d41c79698d8f2eafbe51959d8c1b5d53187605765d5f79558639c1ac3
3
+ size 711402283
config.json ADDED
@@ -0,0 +1,99 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ AMP_ENABLE: true
2
+ AMP_OPT_LEVEL: ''
3
+ AUG:
4
+ AUTO_AUGMENT: rand-m9-mstd0.5-inc1
5
+ COLOR_JITTER: 0.4
6
+ CUTMIX: 1.0
7
+ CUTMIX_MINMAX: null
8
+ MIXUP: 0.8
9
+ MIXUP_MODE: batch
10
+ MIXUP_PROB: 1.0
11
+ MIXUP_SWITCH_PROB: 0.5
12
+ RECOUNT: 1
13
+ REMODE: pixel
14
+ REPROB: 0.25
15
+ BASE:
16
+ - ''
17
+ DATA:
18
+ BATCH_SIZE: 128
19
+ CACHE_MODE: part
20
+ DATASET: imagenet
21
+ DATA_PATH: /home/public_3T/food_data/CNFOOD-241
22
+ IMG_SIZE: 224
23
+ INTERPOLATION: bicubic
24
+ MASK_PATCH_SIZE: 32
25
+ MASK_RATIO: 0.6
26
+ NUM_WORKERS: 8
27
+ PIN_MEMORY: true
28
+ ZIP_MODE: false
29
+ ENABLE_AMP: false
30
+ EVAL_MODE: false
31
+ FUSED_LAYERNORM: false
32
+ FUSED_WINDOW_PROCESS: false
33
+ LOCAL_RANK: 0
34
+ MODEL:
35
+ DROP_PATH_RATE: 0.3
36
+ DROP_RATE: 0.0
37
+ LABEL_SMOOTHING: 0.1
38
+ MMCKPT: false
39
+ NAME: vssm_small
40
+ NUM_CLASSES: 241
41
+ PRETRAINED: ./res_vmamba_cnf241_result_2/vssm_small/default/ckpt_epoch_12.pth
42
+ RESUME: ''
43
+ TYPE: vssm
44
+ VSSM:
45
+ DEPTHS:
46
+ - 2
47
+ - 2
48
+ - 27
49
+ - 2
50
+ DOWNSAMPLE: v1
51
+ DT_RANK: auto
52
+ D_STATE: 16
53
+ EMBED_DIM: 96
54
+ IN_CHANS: 3
55
+ MLP_RATIO: 0.0
56
+ PATCH_NORM: true
57
+ PATCH_SIZE: 4
58
+ SHARED_SSM: false
59
+ SOFTMAX: false
60
+ SSM_RATIO: 2.0
61
+ OUTPUT: ./res_vmamba_cnf241_result_best/vssm_small/default
62
+ PRINT_FREQ: 10
63
+ SAVE_FREQ: 1
64
+ SEED: 0
65
+ TAG: default
66
+ TEST:
67
+ CROP: true
68
+ SEQUENTIAL: false
69
+ SHUFFLE: false
70
+ THROUGHPUT_MODE: false
71
+ TRAIN:
72
+ ACCUMULATION_STEPS: 1
73
+ AUTO_RESUME: true
74
+ BASE_LR: 0.000125
75
+ CLIP_GRAD: 5.0
76
+ EPOCHS: 300
77
+ LAYER_DECAY: 1.0
78
+ LR_SCHEDULER:
79
+ DECAY_EPOCHS: 30
80
+ DECAY_RATE: 0.1
81
+ GAMMA: 0.1
82
+ MULTISTEPS: []
83
+ NAME: cosine
84
+ WARMUP_PREFIX: true
85
+ MIN_LR: 1.25e-06
86
+ MOE:
87
+ SAVE_MASTER: false
88
+ OPTIMIZER:
89
+ BETAS:
90
+ - 0.9
91
+ - 0.999
92
+ EPS: 1.0e-08
93
+ MOMENTUM: 0.9
94
+ NAME: adamw
95
+ START_EPOCH: 0
96
+ USE_CHECKPOINT: false
97
+ WARMUP_EPOCHS: 20
98
+ WARMUP_LR: 1.25e-07
99
+ WEIGHT_DECAY: 0.05
log_rank0.txt ADDED
@@ -0,0 +1,1233 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [2024-02-22 17:55:19 vssm_small] (main.py 401): INFO Full config saved to ./res_vmamba_cnf241_result_best/vssm_small/default/config.json
2
+ [2024-02-22 17:55:19 vssm_small] (main.py 404): INFO AMP_ENABLE: true
3
+ AMP_OPT_LEVEL: ''
4
+ AUG:
5
+ AUTO_AUGMENT: rand-m9-mstd0.5-inc1
6
+ COLOR_JITTER: 0.4
7
+ CUTMIX: 1.0
8
+ CUTMIX_MINMAX: null
9
+ MIXUP: 0.8
10
+ MIXUP_MODE: batch
11
+ MIXUP_PROB: 1.0
12
+ MIXUP_SWITCH_PROB: 0.5
13
+ RECOUNT: 1
14
+ REMODE: pixel
15
+ REPROB: 0.25
16
+ BASE:
17
+ - ''
18
+ DATA:
19
+ BATCH_SIZE: 128
20
+ CACHE_MODE: part
21
+ DATASET: imagenet
22
+ DATA_PATH: /home/public_3T/food_data/CNFOOD-241
23
+ IMG_SIZE: 224
24
+ INTERPOLATION: bicubic
25
+ MASK_PATCH_SIZE: 32
26
+ MASK_RATIO: 0.6
27
+ NUM_WORKERS: 8
28
+ PIN_MEMORY: true
29
+ ZIP_MODE: false
30
+ ENABLE_AMP: false
31
+ EVAL_MODE: false
32
+ FUSED_LAYERNORM: false
33
+ FUSED_WINDOW_PROCESS: false
34
+ LOCAL_RANK: 0
35
+ MODEL:
36
+ DROP_PATH_RATE: 0.3
37
+ DROP_RATE: 0.0
38
+ LABEL_SMOOTHING: 0.1
39
+ MMCKPT: false
40
+ NAME: vssm_small
41
+ NUM_CLASSES: 241
42
+ PRETRAINED: ./res_vmamba_cnf241_result_2/vssm_small/default/ckpt_epoch_12.pth
43
+ RESUME: ''
44
+ TYPE: vssm
45
+ VSSM:
46
+ DEPTHS:
47
+ - 2
48
+ - 2
49
+ - 27
50
+ - 2
51
+ DOWNSAMPLE: v1
52
+ DT_RANK: auto
53
+ D_STATE: 16
54
+ EMBED_DIM: 96
55
+ IN_CHANS: 3
56
+ MLP_RATIO: 0.0
57
+ PATCH_NORM: true
58
+ PATCH_SIZE: 4
59
+ SHARED_SSM: false
60
+ SOFTMAX: false
61
+ SSM_RATIO: 2.0
62
+ OUTPUT: ./res_vmamba_cnf241_result_best/vssm_small/default
63
+ PRINT_FREQ: 10
64
+ SAVE_FREQ: 1
65
+ SEED: 0
66
+ TAG: default
67
+ TEST:
68
+ CROP: true
69
+ SEQUENTIAL: false
70
+ SHUFFLE: false
71
+ THROUGHPUT_MODE: false
72
+ TRAIN:
73
+ ACCUMULATION_STEPS: 1
74
+ AUTO_RESUME: true
75
+ BASE_LR: 0.000125
76
+ CLIP_GRAD: 5.0
77
+ EPOCHS: 300
78
+ LAYER_DECAY: 1.0
79
+ LR_SCHEDULER:
80
+ DECAY_EPOCHS: 30
81
+ DECAY_RATE: 0.1
82
+ GAMMA: 0.1
83
+ MULTISTEPS: []
84
+ NAME: cosine
85
+ WARMUP_PREFIX: true
86
+ MIN_LR: 1.25e-06
87
+ MOE:
88
+ SAVE_MASTER: false
89
+ OPTIMIZER:
90
+ BETAS:
91
+ - 0.9
92
+ - 0.999
93
+ EPS: 1.0e-08
94
+ MOMENTUM: 0.9
95
+ NAME: adamw
96
+ START_EPOCH: 0
97
+ USE_CHECKPOINT: false
98
+ WARMUP_EPOCHS: 20
99
+ WARMUP_LR: 1.25e-07
100
+ WEIGHT_DECAY: 0.05
101
+
102
+ [2024-02-22 17:55:19 vssm_small] (main.py 405): INFO {"cfg": "configs/vssm/vssm_small_224.yaml", "opts": null, "batch_size": 128, "data_path": "/home/public_3T/food_data/CNFOOD-241", "zip": false, "cache_mode": "part", "pretrained": "./res_vmamba_cnf241_result_2/vssm_small/default/ckpt_epoch_12.pth", "resume": null, "accumulation_steps": null, "use_checkpoint": false, "disable_amp": false, "amp_opt_level": null, "output": "./res_vmamba_cnf241_result_best", "tag": null, "eval": false, "throughput": false, "local_rank": 0, "fused_layernorm": false, "optim": null, "model_ema": true, "model_ema_decay": 0.9999, "model_ema_force_cpu": false}
103
+ [2024-02-22 17:55:20 vssm_small] (main.py 112): INFO Creating model:vssm/vssm_small
104
+ [2024-02-22 17:55:20 vssm_small] (main.py 118): INFO VSSM(
105
+ (patch_embed): Sequential(
106
+ (0): Conv2d(3, 96, kernel_size=(4, 4), stride=(4, 4))
107
+ (1): Permute()
108
+ (2): LayerNorm((96,), eps=1e-05, elementwise_affine=True)
109
+ )
110
+ (layers): ModuleList(
111
+ (0): Sequential(
112
+ (blocks): Sequential(
113
+ (0): VSSBlock(
114
+ (norm): LayerNorm((96,), eps=1e-05, elementwise_affine=True)
115
+ (op): SS2D(
116
+ (out_norm): LayerNorm((192,), eps=1e-05, elementwise_affine=True)
117
+ (in_proj): Linear(in_features=96, out_features=384, bias=False)
118
+ (act): SiLU()
119
+ (conv2d): Conv2d(192, 192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=192)
120
+ (out_proj): Linear(in_features=192, out_features=96, bias=False)
121
+ (dropout): Identity()
122
+ )
123
+ (drop_path): timm.DropPath(0.0)
124
+ )
125
+ (1): VSSBlock(
126
+ (norm): LayerNorm((96,), eps=1e-05, elementwise_affine=True)
127
+ (op): SS2D(
128
+ (out_norm): LayerNorm((192,), eps=1e-05, elementwise_affine=True)
129
+ (in_proj): Linear(in_features=96, out_features=384, bias=False)
130
+ (act): SiLU()
131
+ (conv2d): Conv2d(192, 192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=192)
132
+ (out_proj): Linear(in_features=192, out_features=96, bias=False)
133
+ (dropout): Identity()
134
+ )
135
+ (drop_path): timm.DropPath(0.00937500037252903)
136
+ )
137
+ )
138
+ (downsample): PatchMerging2D(
139
+ (reduction): Linear(in_features=384, out_features=192, bias=False)
140
+ (norm): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
141
+ )
142
+ )
143
+ (1): Sequential(
144
+ (blocks): Sequential(
145
+ (0): VSSBlock(
146
+ (norm): LayerNorm((192,), eps=1e-05, elementwise_affine=True)
147
+ (op): SS2D(
148
+ (out_norm): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
149
+ (in_proj): Linear(in_features=192, out_features=768, bias=False)
150
+ (act): SiLU()
151
+ (conv2d): Conv2d(384, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=384)
152
+ (out_proj): Linear(in_features=384, out_features=192, bias=False)
153
+ (dropout): Identity()
154
+ )
155
+ (drop_path): timm.DropPath(0.01875000074505806)
156
+ )
157
+ (1): VSSBlock(
158
+ (norm): LayerNorm((192,), eps=1e-05, elementwise_affine=True)
159
+ (op): SS2D(
160
+ (out_norm): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
161
+ (in_proj): Linear(in_features=192, out_features=768, bias=False)
162
+ (act): SiLU()
163
+ (conv2d): Conv2d(384, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=384)
164
+ (out_proj): Linear(in_features=384, out_features=192, bias=False)
165
+ (dropout): Identity()
166
+ )
167
+ (drop_path): timm.DropPath(0.02812500111758709)
168
+ )
169
+ )
170
+ (downsample): PatchMerging2D(
171
+ (reduction): Linear(in_features=768, out_features=384, bias=False)
172
+ (norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
173
+ )
174
+ )
175
+ (2): Sequential(
176
+ (blocks): Sequential(
177
+ (0): VSSBlock(
178
+ (norm): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
179
+ (op): SS2D(
180
+ (out_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
181
+ (in_proj): Linear(in_features=384, out_features=1536, bias=False)
182
+ (act): SiLU()
183
+ (conv2d): Conv2d(768, 768, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=768)
184
+ (out_proj): Linear(in_features=768, out_features=384, bias=False)
185
+ (dropout): Identity()
186
+ )
187
+ (drop_path): timm.DropPath(0.03750000149011612)
188
+ )
189
+ (1): VSSBlock(
190
+ (norm): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
191
+ (op): SS2D(
192
+ (out_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
193
+ (in_proj): Linear(in_features=384, out_features=1536, bias=False)
194
+ (act): SiLU()
195
+ (conv2d): Conv2d(768, 768, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=768)
196
+ (out_proj): Linear(in_features=768, out_features=384, bias=False)
197
+ (dropout): Identity()
198
+ )
199
+ (drop_path): timm.DropPath(0.046875)
200
+ )
201
+ (2): VSSBlock(
202
+ (norm): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
203
+ (op): SS2D(
204
+ (out_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
205
+ (in_proj): Linear(in_features=384, out_features=1536, bias=False)
206
+ (act): SiLU()
207
+ (conv2d): Conv2d(768, 768, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=768)
208
+ (out_proj): Linear(in_features=768, out_features=384, bias=False)
209
+ (dropout): Identity()
210
+ )
211
+ (drop_path): timm.DropPath(0.05625000223517418)
212
+ )
213
+ (3): VSSBlock(
214
+ (norm): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
215
+ (op): SS2D(
216
+ (out_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
217
+ (in_proj): Linear(in_features=384, out_features=1536, bias=False)
218
+ (act): SiLU()
219
+ (conv2d): Conv2d(768, 768, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=768)
220
+ (out_proj): Linear(in_features=768, out_features=384, bias=False)
221
+ (dropout): Identity()
222
+ )
223
+ (drop_path): timm.DropPath(0.06562500447034836)
224
+ )
225
+ (4): VSSBlock(
226
+ (norm): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
227
+ (op): SS2D(
228
+ (out_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
229
+ (in_proj): Linear(in_features=384, out_features=1536, bias=False)
230
+ (act): SiLU()
231
+ (conv2d): Conv2d(768, 768, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=768)
232
+ (out_proj): Linear(in_features=768, out_features=384, bias=False)
233
+ (dropout): Identity()
234
+ )
235
+ (drop_path): timm.DropPath(0.07500000298023224)
236
+ )
237
+ (5): VSSBlock(
238
+ (norm): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
239
+ (op): SS2D(
240
+ (out_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
241
+ (in_proj): Linear(in_features=384, out_features=1536, bias=False)
242
+ (act): SiLU()
243
+ (conv2d): Conv2d(768, 768, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=768)
244
+ (out_proj): Linear(in_features=768, out_features=384, bias=False)
245
+ (dropout): Identity()
246
+ )
247
+ (drop_path): timm.DropPath(0.08437500149011612)
248
+ )
249
+ (6): VSSBlock(
250
+ (norm): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
251
+ (op): SS2D(
252
+ (out_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
253
+ (in_proj): Linear(in_features=384, out_features=1536, bias=False)
254
+ (act): SiLU()
255
+ (conv2d): Conv2d(768, 768, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=768)
256
+ (out_proj): Linear(in_features=768, out_features=384, bias=False)
257
+ (dropout): Identity()
258
+ )
259
+ (drop_path): timm.DropPath(0.09375)
260
+ )
261
+ (7): VSSBlock(
262
+ (norm): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
263
+ (op): SS2D(
264
+ (out_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
265
+ (in_proj): Linear(in_features=384, out_features=1536, bias=False)
266
+ (act): SiLU()
267
+ (conv2d): Conv2d(768, 768, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=768)
268
+ (out_proj): Linear(in_features=768, out_features=384, bias=False)
269
+ (dropout): Identity()
270
+ )
271
+ (drop_path): timm.DropPath(0.10312500596046448)
272
+ )
273
+ (8): VSSBlock(
274
+ (norm): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
275
+ (op): SS2D(
276
+ (out_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
277
+ (in_proj): Linear(in_features=384, out_features=1536, bias=False)
278
+ (act): SiLU()
279
+ (conv2d): Conv2d(768, 768, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=768)
280
+ (out_proj): Linear(in_features=768, out_features=384, bias=False)
281
+ (dropout): Identity()
282
+ )
283
+ (drop_path): timm.DropPath(0.11250000447034836)
284
+ )
285
+ (9): VSSBlock(
286
+ (norm): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
287
+ (op): SS2D(
288
+ (out_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
289
+ (in_proj): Linear(in_features=384, out_features=1536, bias=False)
290
+ (act): SiLU()
291
+ (conv2d): Conv2d(768, 768, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=768)
292
+ (out_proj): Linear(in_features=768, out_features=384, bias=False)
293
+ (dropout): Identity()
294
+ )
295
+ (drop_path): timm.DropPath(0.12187500298023224)
296
+ )
297
+ (10): VSSBlock(
298
+ (norm): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
299
+ (op): SS2D(
300
+ (out_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
301
+ (in_proj): Linear(in_features=384, out_features=1536, bias=False)
302
+ (act): SiLU()
303
+ (conv2d): Conv2d(768, 768, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=768)
304
+ (out_proj): Linear(in_features=768, out_features=384, bias=False)
305
+ (dropout): Identity()
306
+ )
307
+ (drop_path): timm.DropPath(0.13125000894069672)
308
+ )
309
+ (11): VSSBlock(
310
+ (norm): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
311
+ (op): SS2D(
312
+ (out_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
313
+ (in_proj): Linear(in_features=384, out_features=1536, bias=False)
314
+ (act): SiLU()
315
+ (conv2d): Conv2d(768, 768, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=768)
316
+ (out_proj): Linear(in_features=768, out_features=384, bias=False)
317
+ (dropout): Identity()
318
+ )
319
+ (drop_path): timm.DropPath(0.140625)
320
+ )
321
+ (12): VSSBlock(
322
+ (norm): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
323
+ (op): SS2D(
324
+ (out_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
325
+ (in_proj): Linear(in_features=384, out_features=1536, bias=False)
326
+ (act): SiLU()
327
+ (conv2d): Conv2d(768, 768, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=768)
328
+ (out_proj): Linear(in_features=768, out_features=384, bias=False)
329
+ (dropout): Identity()
330
+ )
331
+ (drop_path): timm.DropPath(0.15000000596046448)
332
+ )
333
+ (13): VSSBlock(
334
+ (norm): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
335
+ (op): SS2D(
336
+ (out_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
337
+ (in_proj): Linear(in_features=384, out_features=1536, bias=False)
338
+ (act): SiLU()
339
+ (conv2d): Conv2d(768, 768, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=768)
340
+ (out_proj): Linear(in_features=768, out_features=384, bias=False)
341
+ (dropout): Identity()
342
+ )
343
+ (drop_path): timm.DropPath(0.15937501192092896)
344
+ )
345
+ (14): VSSBlock(
346
+ (norm): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
347
+ (op): SS2D(
348
+ (out_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
349
+ (in_proj): Linear(in_features=384, out_features=1536, bias=False)
350
+ (act): SiLU()
351
+ (conv2d): Conv2d(768, 768, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=768)
352
+ (out_proj): Linear(in_features=768, out_features=384, bias=False)
353
+ (dropout): Identity()
354
+ )
355
+ (drop_path): timm.DropPath(0.16875000298023224)
356
+ )
357
+ (15): VSSBlock(
358
+ (norm): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
359
+ (op): SS2D(
360
+ (out_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
361
+ (in_proj): Linear(in_features=384, out_features=1536, bias=False)
362
+ (act): SiLU()
363
+ (conv2d): Conv2d(768, 768, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=768)
364
+ (out_proj): Linear(in_features=768, out_features=384, bias=False)
365
+ (dropout): Identity()
366
+ )
367
+ (drop_path): timm.DropPath(0.17812500894069672)
368
+ )
369
+ (16): VSSBlock(
370
+ (norm): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
371
+ (op): SS2D(
372
+ (out_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
373
+ (in_proj): Linear(in_features=384, out_features=1536, bias=False)
374
+ (act): SiLU()
375
+ (conv2d): Conv2d(768, 768, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=768)
376
+ (out_proj): Linear(in_features=768, out_features=384, bias=False)
377
+ (dropout): Identity()
378
+ )
379
+ (drop_path): timm.DropPath(0.1875)
380
+ )
381
+ (17): VSSBlock(
382
+ (norm): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
383
+ (op): SS2D(
384
+ (out_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
385
+ (in_proj): Linear(in_features=384, out_features=1536, bias=False)
386
+ (act): SiLU()
387
+ (conv2d): Conv2d(768, 768, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=768)
388
+ (out_proj): Linear(in_features=768, out_features=384, bias=False)
389
+ (dropout): Identity()
390
+ )
391
+ (drop_path): timm.DropPath(0.19687500596046448)
392
+ )
393
+ (18): VSSBlock(
394
+ (norm): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
395
+ (op): SS2D(
396
+ (out_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
397
+ (in_proj): Linear(in_features=384, out_features=1536, bias=False)
398
+ (act): SiLU()
399
+ (conv2d): Conv2d(768, 768, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=768)
400
+ (out_proj): Linear(in_features=768, out_features=384, bias=False)
401
+ (dropout): Identity()
402
+ )
403
+ (drop_path): timm.DropPath(0.20625001192092896)
404
+ )
405
+ (19): VSSBlock(
406
+ (norm): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
407
+ (op): SS2D(
408
+ (out_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
409
+ (in_proj): Linear(in_features=384, out_features=1536, bias=False)
410
+ (act): SiLU()
411
+ (conv2d): Conv2d(768, 768, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=768)
412
+ (out_proj): Linear(in_features=768, out_features=384, bias=False)
413
+ (dropout): Identity()
414
+ )
415
+ (drop_path): timm.DropPath(0.21562501788139343)
416
+ )
417
+ (20): VSSBlock(
418
+ (norm): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
419
+ (op): SS2D(
420
+ (out_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
421
+ (in_proj): Linear(in_features=384, out_features=1536, bias=False)
422
+ (act): SiLU()
423
+ (conv2d): Conv2d(768, 768, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=768)
424
+ (out_proj): Linear(in_features=768, out_features=384, bias=False)
425
+ (dropout): Identity()
426
+ )
427
+ (drop_path): timm.DropPath(0.22500000894069672)
428
+ )
429
+ (21): VSSBlock(
430
+ (norm): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
431
+ (op): SS2D(
432
+ (out_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
433
+ (in_proj): Linear(in_features=384, out_features=1536, bias=False)
434
+ (act): SiLU()
435
+ (conv2d): Conv2d(768, 768, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=768)
436
+ (out_proj): Linear(in_features=768, out_features=384, bias=False)
437
+ (dropout): Identity()
438
+ )
439
+ (drop_path): timm.DropPath(0.2343750149011612)
440
+ )
441
+ (22): VSSBlock(
442
+ (norm): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
443
+ (op): SS2D(
444
+ (out_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
445
+ (in_proj): Linear(in_features=384, out_features=1536, bias=False)
446
+ (act): SiLU()
447
+ (conv2d): Conv2d(768, 768, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=768)
448
+ (out_proj): Linear(in_features=768, out_features=384, bias=False)
449
+ (dropout): Identity()
450
+ )
451
+ (drop_path): timm.DropPath(0.24375000596046448)
452
+ )
453
+ (23): VSSBlock(
454
+ (norm): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
455
+ (op): SS2D(
456
+ (out_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
457
+ (in_proj): Linear(in_features=384, out_features=1536, bias=False)
458
+ (act): SiLU()
459
+ (conv2d): Conv2d(768, 768, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=768)
460
+ (out_proj): Linear(in_features=768, out_features=384, bias=False)
461
+ (dropout): Identity()
462
+ )
463
+ (drop_path): timm.DropPath(0.25312501192092896)
464
+ )
465
+ (24): VSSBlock(
466
+ (norm): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
467
+ (op): SS2D(
468
+ (out_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
469
+ (in_proj): Linear(in_features=384, out_features=1536, bias=False)
470
+ (act): SiLU()
471
+ (conv2d): Conv2d(768, 768, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=768)
472
+ (out_proj): Linear(in_features=768, out_features=384, bias=False)
473
+ (dropout): Identity()
474
+ )
475
+ (drop_path): timm.DropPath(0.26250001788139343)
476
+ )
477
+ (25): VSSBlock(
478
+ (norm): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
479
+ (op): SS2D(
480
+ (out_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
481
+ (in_proj): Linear(in_features=384, out_features=1536, bias=False)
482
+ (act): SiLU()
483
+ (conv2d): Conv2d(768, 768, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=768)
484
+ (out_proj): Linear(in_features=768, out_features=384, bias=False)
485
+ (dropout): Identity()
486
+ )
487
+ (drop_path): timm.DropPath(0.2718750238418579)
488
+ )
489
+ (26): VSSBlock(
490
+ (norm): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
491
+ (op): SS2D(
492
+ (out_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
493
+ (in_proj): Linear(in_features=384, out_features=1536, bias=False)
494
+ (act): SiLU()
495
+ (conv2d): Conv2d(768, 768, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=768)
496
+ (out_proj): Linear(in_features=768, out_features=384, bias=False)
497
+ (dropout): Identity()
498
+ )
499
+ (drop_path): timm.DropPath(0.28125)
500
+ )
501
+ )
502
+ (downsample): PatchMerging2D(
503
+ (reduction): Linear(in_features=1536, out_features=768, bias=False)
504
+ (norm): LayerNorm((1536,), eps=1e-05, elementwise_affine=True)
505
+ )
506
+ )
507
+ (3): Sequential(
508
+ (blocks): Sequential(
509
+ (0): VSSBlock(
510
+ (norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
511
+ (op): SS2D(
512
+ (out_norm): LayerNorm((1536,), eps=1e-05, elementwise_affine=True)
513
+ (in_proj): Linear(in_features=768, out_features=3072, bias=False)
514
+ (act): SiLU()
515
+ (conv2d): Conv2d(1536, 1536, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1536)
516
+ (out_proj): Linear(in_features=1536, out_features=768, bias=False)
517
+ (dropout): Identity()
518
+ )
519
+ (drop_path): timm.DropPath(0.2906250059604645)
520
+ )
521
+ (1): VSSBlock(
522
+ (norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
523
+ (op): SS2D(
524
+ (out_norm): LayerNorm((1536,), eps=1e-05, elementwise_affine=True)
525
+ (in_proj): Linear(in_features=768, out_features=3072, bias=False)
526
+ (act): SiLU()
527
+ (conv2d): Conv2d(1536, 1536, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1536)
528
+ (out_proj): Linear(in_features=1536, out_features=768, bias=False)
529
+ (dropout): Identity()
530
+ )
531
+ (drop_path): timm.DropPath(0.30000001192092896)
532
+ )
533
+ )
534
+ (downsample): Identity()
535
+ )
536
+ )
537
+ (classifier): Sequential(
538
+ (norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
539
+ (permute): Permute()
540
+ (avgpool): AdaptiveAvgPool2d(output_size=1)
541
+ (flatten): Flatten(start_dim=1, end_dim=-1)
542
+ (head): Linear(in_features=768, out_features=1000, bias=True)
543
+ )
544
+ )
545
+ [2024-02-22 17:55:20 vssm_small] (main.py 120): INFO number of params: 44417416
546
+ [2024-02-22 17:55:22 vssm_small] (main.py 123): INFO number of GFLOPs: 11.231522784
547
+ [2024-02-22 17:55:22 vssm_small] (main.py 167): INFO auto resuming from ./res_vmamba_cnf241_result_best/vssm_small/default/ckpt_epoch_166.pth
548
+ [2024-02-22 17:55:22 vssm_small] (utils.py 18): INFO ==============> Resuming form ./res_vmamba_cnf241_result_best/vssm_small/default/ckpt_epoch_166.pth....................
549
+ [2024-02-22 17:55:23 vssm_small] (utils.py 27): INFO resuming model: <All keys matched successfully>
550
+ [2024-02-22 17:55:23 vssm_small] (utils.py 34): INFO resuming model_ema: <All keys matched successfully>
551
+ [2024-02-22 17:55:24 vssm_small] (utils.py 48): INFO => loaded successfully './res_vmamba_cnf241_result_best/vssm_small/default/ckpt_epoch_166.pth' (epoch 166)
552
+ [2024-02-22 17:55:34 vssm_small] (main.py 324): INFO Test: [0/402] Time 10.582 (10.582) Loss 0.4614 (0.4614) Acc@1 89.844 (89.844) Acc@5 97.656 (97.656) Mem 7155MB
553
+ [2024-02-22 17:55:39 vssm_small] (main.py 324): INFO Test: [10/402] Time 0.483 (1.401) Loss 1.1680 (0.9504) Acc@1 76.562 (80.114) Acc@5 92.188 (94.105) Mem 7155MB
554
+ [2024-02-22 17:55:44 vssm_small] (main.py 324): INFO Test: [20/402] Time 0.482 (0.964) Loss 1.6357 (0.9252) Acc@1 52.344 (78.720) Acc@5 92.188 (95.164) Mem 7155MB
555
+ [2024-02-22 17:55:49 vssm_small] (main.py 324): INFO Test: [30/402] Time 0.483 (0.809) Loss 0.9429 (0.9577) Acc@1 76.562 (77.848) Acc@5 98.438 (95.237) Mem 7155MB
556
+ [2024-02-22 17:55:53 vssm_small] (main.py 324): INFO Test: [40/402] Time 0.483 (0.729) Loss 1.0166 (0.9836) Acc@1 72.656 (76.791) Acc@5 96.875 (95.560) Mem 7155MB
557
+ [2024-02-22 17:55:58 vssm_small] (main.py 324): INFO Test: [50/402] Time 0.483 (0.681) Loss 0.6353 (0.9501) Acc@1 83.594 (77.788) Acc@5 97.656 (95.527) Mem 7155MB
558
+ [2024-02-22 17:56:03 vssm_small] (main.py 324): INFO Test: [60/402] Time 0.482 (0.648) Loss 0.7671 (0.9817) Acc@1 80.469 (77.011) Acc@5 96.875 (95.197) Mem 7155MB
559
+ [2024-02-22 17:56:08 vssm_small] (main.py 324): INFO Test: [70/402] Time 0.482 (0.625) Loss 0.8667 (0.9259) Acc@1 77.344 (78.444) Acc@5 96.094 (95.478) Mem 7155MB
560
+ [2024-02-22 17:56:13 vssm_small] (main.py 324): INFO Test: [80/402] Time 0.483 (0.607) Loss 0.4990 (0.9246) Acc@1 88.281 (78.511) Acc@5 99.219 (95.515) Mem 7155MB
561
+ [2024-02-22 17:56:18 vssm_small] (main.py 324): INFO Test: [90/402] Time 0.483 (0.594) Loss 0.3621 (0.8928) Acc@1 92.188 (79.327) Acc@5 100.000 (95.639) Mem 7155MB
562
+ [2024-02-22 17:56:22 vssm_small] (main.py 324): INFO Test: [100/402] Time 0.482 (0.583) Loss 1.1924 (0.9116) Acc@1 78.125 (79.038) Acc@5 90.625 (95.359) Mem 7155MB
563
+ [2024-02-22 17:56:27 vssm_small] (main.py 324): INFO Test: [110/402] Time 0.483 (0.574) Loss 0.4204 (0.9173) Acc@1 85.938 (78.899) Acc@5 99.219 (95.341) Mem 7155MB
564
+ [2024-02-22 17:56:32 vssm_small] (main.py 324): INFO Test: [120/402] Time 0.483 (0.566) Loss 2.3027 (0.9299) Acc@1 46.094 (78.622) Acc@5 83.594 (95.274) Mem 7155MB
565
+ [2024-02-22 17:56:37 vssm_small] (main.py 324): INFO Test: [130/402] Time 0.482 (0.560) Loss 0.4097 (0.9233) Acc@1 91.406 (78.715) Acc@5 99.219 (95.366) Mem 7155MB
566
+ [2024-02-22 17:56:42 vssm_small] (main.py 324): INFO Test: [140/402] Time 0.483 (0.554) Loss 2.3652 (0.9762) Acc@1 47.656 (77.543) Acc@5 75.781 (94.891) Mem 7155MB
567
+ [2024-02-22 17:56:47 vssm_small] (main.py 324): INFO Test: [150/402] Time 0.482 (0.549) Loss 0.9092 (0.9862) Acc@1 78.906 (77.266) Acc@5 93.750 (94.868) Mem 7155MB
568
+ [2024-02-22 17:56:51 vssm_small] (main.py 324): INFO Test: [160/402] Time 0.482 (0.545) Loss 1.4434 (0.9815) Acc@1 61.719 (77.300) Acc@5 93.750 (94.978) Mem 7155MB
569
+ [2024-02-22 17:56:56 vssm_small] (main.py 324): INFO Test: [170/402] Time 0.483 (0.542) Loss 0.6587 (0.9758) Acc@1 89.062 (77.421) Acc@5 94.531 (95.006) Mem 7155MB
570
+ [2024-02-22 17:57:01 vssm_small] (main.py 324): INFO Test: [180/402] Time 0.482 (0.538) Loss 1.8477 (0.9902) Acc@1 53.125 (77.175) Acc@5 93.750 (94.864) Mem 7155MB
571
+ [2024-02-22 17:57:06 vssm_small] (main.py 324): INFO Test: [190/402] Time 0.483 (0.535) Loss 0.4958 (0.9845) Acc@1 89.844 (77.356) Acc@5 96.875 (94.818) Mem 7155MB
572
+ [2024-02-22 17:57:11 vssm_small] (main.py 324): INFO Test: [200/402] Time 0.482 (0.533) Loss 0.3074 (0.9707) Acc@1 95.312 (77.697) Acc@5 97.656 (94.916) Mem 7155MB
573
+ [2024-02-22 17:57:16 vssm_small] (main.py 324): INFO Test: [210/402] Time 0.483 (0.530) Loss 0.5928 (0.9563) Acc@1 83.594 (78.058) Acc@5 99.219 (95.016) Mem 7155MB
574
+ [2024-02-22 17:57:20 vssm_small] (main.py 324): INFO Test: [220/402] Time 0.482 (0.528) Loss 1.1055 (0.9381) Acc@1 76.562 (78.450) Acc@5 92.969 (95.178) Mem 7155MB
575
+ [2024-02-22 17:57:25 vssm_small] (main.py 324): INFO Test: [230/402] Time 0.482 (0.526) Loss 2.1230 (0.9502) Acc@1 60.156 (78.436) Acc@5 78.906 (94.913) Mem 7155MB
576
+ [2024-02-22 17:57:30 vssm_small] (main.py 324): INFO Test: [240/402] Time 0.483 (0.524) Loss 1.1201 (0.9431) Acc@1 67.188 (78.618) Acc@5 98.438 (94.972) Mem 7155MB
577
+ [2024-02-22 17:57:35 vssm_small] (main.py 324): INFO Test: [250/402] Time 0.483 (0.523) Loss 1.8711 (0.9650) Acc@1 53.906 (78.019) Acc@5 95.312 (94.933) Mem 7155MB
578
+ [2024-02-22 17:57:40 vssm_small] (main.py 324): INFO Test: [260/402] Time 0.482 (0.521) Loss 0.9282 (0.9637) Acc@1 76.562 (78.023) Acc@5 99.219 (94.983) Mem 7155MB
579
+ [2024-02-22 17:57:44 vssm_small] (main.py 324): INFO Test: [270/402] Time 0.483 (0.520) Loss 1.1191 (0.9527) Acc@1 68.750 (78.269) Acc@5 94.531 (95.056) Mem 7155MB
580
+ [2024-02-22 17:57:49 vssm_small] (main.py 324): INFO Test: [280/402] Time 0.483 (0.519) Loss 2.3047 (0.9652) Acc@1 19.531 (77.755) Acc@5 92.188 (95.032) Mem 7155MB
581
+ [2024-02-22 17:57:54 vssm_small] (main.py 324): INFO Test: [290/402] Time 0.482 (0.517) Loss 0.8774 (0.9767) Acc@1 80.469 (77.489) Acc@5 93.750 (94.912) Mem 7155MB
582
+ [2024-02-22 17:57:59 vssm_small] (main.py 324): INFO Test: [300/402] Time 0.483 (0.516) Loss 1.0645 (0.9802) Acc@1 80.469 (77.512) Acc@5 92.188 (94.817) Mem 7155MB
583
+ [2024-02-22 17:58:04 vssm_small] (main.py 324): INFO Test: [310/402] Time 0.483 (0.515) Loss 1.0410 (0.9764) Acc@1 79.688 (77.462) Acc@5 96.094 (94.903) Mem 7155MB
584
+ [2024-02-22 17:58:09 vssm_small] (main.py 324): INFO Test: [320/402] Time 0.483 (0.514) Loss 1.0859 (0.9653) Acc@1 76.562 (77.743) Acc@5 89.844 (94.960) Mem 7155MB
585
+ [2024-02-22 17:58:13 vssm_small] (main.py 324): INFO Test: [330/402] Time 0.483 (0.513) Loss 1.0596 (0.9657) Acc@1 73.438 (77.714) Acc@5 95.312 (95.001) Mem 7155MB
586
+ [2024-02-22 17:58:18 vssm_small] (main.py 324): INFO Test: [340/402] Time 0.482 (0.512) Loss 0.3967 (0.9663) Acc@1 90.625 (77.699) Acc@5 100.000 (95.028) Mem 7155MB
587
+ [2024-02-22 17:58:23 vssm_small] (main.py 324): INFO Test: [350/402] Time 0.483 (0.511) Loss 1.2148 (0.9637) Acc@1 68.750 (77.773) Acc@5 96.875 (95.050) Mem 7155MB
588
+ [2024-02-22 17:58:28 vssm_small] (main.py 324): INFO Test: [360/402] Time 0.483 (0.511) Loss 0.9941 (0.9685) Acc@1 79.688 (77.571) Acc@5 95.312 (95.074) Mem 7155MB
589
+ [2024-02-22 17:58:33 vssm_small] (main.py 324): INFO Test: [370/402] Time 0.482 (0.510) Loss 0.9004 (0.9689) Acc@1 83.594 (77.552) Acc@5 94.531 (95.081) Mem 7155MB
590
+ [2024-02-22 17:58:38 vssm_small] (main.py 324): INFO Test: [380/402] Time 0.482 (0.509) Loss 0.7358 (0.9634) Acc@1 82.812 (77.690) Acc@5 97.656 (95.114) Mem 7155MB
591
+ [2024-02-22 17:58:42 vssm_small] (main.py 324): INFO Test: [390/402] Time 0.482 (0.508) Loss 1.0068 (0.9605) Acc@1 77.344 (77.807) Acc@5 94.531 (95.113) Mem 7155MB
592
+ [2024-02-22 17:58:47 vssm_small] (main.py 324): INFO Test: [400/402] Time 0.482 (0.508) Loss 0.2834 (0.9484) Acc@1 95.312 (78.141) Acc@5 99.219 (95.184) Mem 7155MB
593
+ [2024-02-22 17:58:48 vssm_small] (main.py 331): INFO * Acc@1 78.150 Acc@5 95.186
594
+ [2024-02-22 17:58:48 vssm_small] (main.py 174): INFO Accuracy of the network on the 51354 test images: 78.1%
595
+ [2024-02-22 17:58:57 vssm_small] (main.py 324): INFO Test: [0/402] Time 8.919 (8.919) Loss 0.4504 (0.4504) Acc@1 91.406 (91.406) Acc@5 98.438 (98.438) Mem 7155MB
596
+ [2024-02-22 17:59:02 vssm_small] (main.py 324): INFO Test: [10/402] Time 0.483 (1.250) Loss 0.9731 (0.8429) Acc@1 83.594 (82.599) Acc@5 91.406 (94.957) Mem 7155MB
597
+ [2024-02-22 17:59:07 vssm_small] (main.py 324): INFO Test: [20/402] Time 0.483 (0.884) Loss 1.3154 (0.8200) Acc@1 60.938 (81.510) Acc@5 94.531 (95.573) Mem 7155MB
598
+ [2024-02-22 17:59:12 vssm_small] (main.py 324): INFO Test: [30/402] Time 0.483 (0.755) Loss 0.8833 (0.8867) Acc@1 76.562 (79.410) Acc@5 96.875 (95.514) Mem 7155MB
599
+ [2024-02-22 17:59:16 vssm_small] (main.py 324): INFO Test: [40/402] Time 0.482 (0.688) Loss 0.8809 (0.8790) Acc@1 74.219 (79.002) Acc@5 98.438 (95.922) Mem 7155MB
600
+ [2024-02-22 17:59:21 vssm_small] (main.py 324): INFO Test: [50/402] Time 0.483 (0.648) Loss 0.5254 (0.8631) Acc@1 89.062 (79.611) Acc@5 97.656 (95.956) Mem 7155MB
601
+ [2024-02-22 17:59:26 vssm_small] (main.py 324): INFO Test: [60/402] Time 0.483 (0.621) Loss 0.6147 (0.8904) Acc@1 85.156 (78.855) Acc@5 97.656 (95.671) Mem 7155MB
602
+ [2024-02-22 17:59:31 vssm_small] (main.py 324): INFO Test: [70/402] Time 0.483 (0.601) Loss 1.0029 (0.8448) Acc@1 77.344 (80.095) Acc@5 96.875 (96.006) Mem 7155MB
603
+ [2024-02-22 17:59:36 vssm_small] (main.py 324): INFO Test: [80/402] Time 0.483 (0.587) Loss 0.5259 (0.8449) Acc@1 85.156 (80.102) Acc@5 98.438 (96.007) Mem 7155MB
604
+ [2024-02-22 17:59:41 vssm_small] (main.py 324): INFO Test: [90/402] Time 0.483 (0.575) Loss 0.2947 (0.8155) Acc@1 94.531 (80.872) Acc@5 100.000 (96.162) Mem 7155MB
605
+ [2024-02-22 17:59:45 vssm_small] (main.py 324): INFO Test: [100/402] Time 0.483 (0.566) Loss 1.2002 (0.8335) Acc@1 76.562 (80.554) Acc@5 92.188 (95.978) Mem 7155MB
606
+ [2024-02-22 17:59:50 vssm_small] (main.py 324): INFO Test: [110/402] Time 0.483 (0.559) Loss 0.4329 (0.8417) Acc@1 86.719 (80.342) Acc@5 100.000 (95.967) Mem 7155MB
607
+ [2024-02-22 17:59:55 vssm_small] (main.py 324): INFO Test: [120/402] Time 0.483 (0.552) Loss 2.2422 (0.8554) Acc@1 47.656 (80.139) Acc@5 84.375 (95.874) Mem 7155MB
608
+ [2024-02-22 18:00:00 vssm_small] (main.py 324): INFO Test: [130/402] Time 0.483 (0.547) Loss 0.4048 (0.8500) Acc@1 90.625 (80.200) Acc@5 99.219 (95.974) Mem 7155MB
609
+ [2024-02-22 18:00:05 vssm_small] (main.py 324): INFO Test: [140/402] Time 0.483 (0.543) Loss 2.1191 (0.9016) Acc@1 52.344 (79.039) Acc@5 83.594 (95.495) Mem 7155MB
610
+ [2024-02-22 18:00:09 vssm_small] (main.py 324): INFO Test: [150/402] Time 0.483 (0.539) Loss 0.8765 (0.9130) Acc@1 78.906 (78.715) Acc@5 94.531 (95.442) Mem 7155MB
611
+ [2024-02-22 18:00:14 vssm_small] (main.py 324): INFO Test: [160/402] Time 0.483 (0.535) Loss 1.3135 (0.9056) Acc@1 67.969 (78.872) Acc@5 95.312 (95.541) Mem 7155MB
612
+ [2024-02-22 18:00:19 vssm_small] (main.py 324): INFO Test: [170/402] Time 0.483 (0.532) Loss 0.5923 (0.8953) Acc@1 90.625 (79.094) Acc@5 95.312 (95.591) Mem 7155MB
613
+ [2024-02-22 18:00:24 vssm_small] (main.py 324): INFO Test: [180/402] Time 0.483 (0.529) Loss 1.8027 (0.9146) Acc@1 53.125 (78.699) Acc@5 94.531 (95.395) Mem 7155MB
614
+ [2024-02-22 18:00:29 vssm_small] (main.py 324): INFO Test: [190/402] Time 0.483 (0.527) Loss 0.4436 (0.9099) Acc@1 91.406 (78.865) Acc@5 97.656 (95.357) Mem 7155MB
615
+ [2024-02-22 18:00:34 vssm_small] (main.py 324): INFO Test: [200/402] Time 0.483 (0.525) Loss 0.2937 (0.8963) Acc@1 96.875 (79.190) Acc@5 98.438 (95.464) Mem 7155MB
616
+ [2024-02-22 18:00:38 vssm_small] (main.py 324): INFO Test: [210/402] Time 0.483 (0.523) Loss 0.5981 (0.8853) Acc@1 84.375 (79.465) Acc@5 98.438 (95.542) Mem 7155MB
617
+ [2024-02-22 18:00:43 vssm_small] (main.py 324): INFO Test: [220/402] Time 0.483 (0.521) Loss 1.0889 (0.8694) Acc@1 77.344 (79.811) Acc@5 92.969 (95.680) Mem 7155MB
618
+ [2024-02-22 18:00:48 vssm_small] (main.py 324): INFO Test: [230/402] Time 0.483 (0.519) Loss 1.9727 (0.8842) Acc@1 60.156 (79.708) Acc@5 78.906 (95.394) Mem 7155MB
619
+ [2024-02-22 18:00:53 vssm_small] (main.py 324): INFO Test: [240/402] Time 0.482 (0.518) Loss 1.2422 (0.8778) Acc@1 60.156 (79.853) Acc@5 98.438 (95.445) Mem 7155MB
620
+ [2024-02-22 18:00:58 vssm_small] (main.py 324): INFO Test: [250/402] Time 0.483 (0.516) Loss 1.4551 (0.8951) Acc@1 60.938 (79.358) Acc@5 95.312 (95.412) Mem 7155MB
621
+ [2024-02-22 18:01:03 vssm_small] (main.py 324): INFO Test: [260/402] Time 0.483 (0.515) Loss 0.8667 (0.8933) Acc@1 78.906 (79.331) Acc@5 98.438 (95.489) Mem 7155MB
622
+ [2024-02-22 18:01:07 vssm_small] (main.py 324): INFO Test: [270/402] Time 0.482 (0.514) Loss 0.9072 (0.8828) Acc@1 77.344 (79.578) Acc@5 96.094 (95.543) Mem 7155MB
623
+ [2024-02-22 18:01:12 vssm_small] (main.py 324): INFO Test: [280/402] Time 0.483 (0.513) Loss 2.3594 (0.8950) Acc@1 19.531 (79.101) Acc@5 92.188 (95.527) Mem 7155MB
624
+ [2024-02-22 18:01:17 vssm_small] (main.py 324): INFO Test: [290/402] Time 0.483 (0.512) Loss 0.8384 (0.9058) Acc@1 82.031 (78.845) Acc@5 95.312 (95.455) Mem 7155MB
625
+ [2024-02-22 18:01:22 vssm_small] (main.py 324): INFO Test: [300/402] Time 0.482 (0.511) Loss 0.9658 (0.9067) Acc@1 81.250 (78.914) Acc@5 93.750 (95.396) Mem 7155MB
626
+ [2024-02-22 18:01:27 vssm_small] (main.py 324): INFO Test: [310/402] Time 0.483 (0.510) Loss 1.0488 (0.9032) Acc@1 81.250 (78.861) Acc@5 96.094 (95.481) Mem 7155MB
627
+ [2024-02-22 18:01:32 vssm_small] (main.py 324): INFO Test: [320/402] Time 0.483 (0.509) Loss 0.8892 (0.8902) Acc@1 82.031 (79.193) Acc@5 92.188 (95.536) Mem 7155MB
628
+ [2024-02-22 18:01:36 vssm_small] (main.py 324): INFO Test: [330/402] Time 0.483 (0.508) Loss 0.8677 (0.8919) Acc@1 79.688 (79.145) Acc@5 97.656 (95.567) Mem 7155MB
629
+ [2024-02-22 18:01:41 vssm_small] (main.py 324): INFO Test: [340/402] Time 0.483 (0.507) Loss 0.3433 (0.8911) Acc@1 89.844 (79.138) Acc@5 100.000 (95.597) Mem 7155MB
630
+ [2024-02-22 18:01:46 vssm_small] (main.py 324): INFO Test: [350/402] Time 0.483 (0.507) Loss 0.8315 (0.8892) Acc@1 78.906 (79.200) Acc@5 98.438 (95.620) Mem 7155MB
631
+ [2024-02-22 18:01:51 vssm_small] (main.py 324): INFO Test: [360/402] Time 0.483 (0.506) Loss 0.9419 (0.8932) Acc@1 78.125 (79.006) Acc@5 96.094 (95.654) Mem 7155MB
632
+ [2024-02-22 18:01:56 vssm_small] (main.py 324): INFO Test: [370/402] Time 0.483 (0.505) Loss 0.8735 (0.8931) Acc@1 82.031 (78.997) Acc@5 94.531 (95.652) Mem 7155MB
633
+ [2024-02-22 18:02:01 vssm_small] (main.py 324): INFO Test: [380/402] Time 0.483 (0.505) Loss 0.7627 (0.8883) Acc@1 82.031 (79.138) Acc@5 97.656 (95.682) Mem 7155MB
634
+ [2024-02-22 18:02:05 vssm_small] (main.py 324): INFO Test: [390/402] Time 0.483 (0.504) Loss 0.9995 (0.8870) Acc@1 78.125 (79.218) Acc@5 93.750 (95.666) Mem 7155MB
635
+ [2024-02-22 18:02:10 vssm_small] (main.py 324): INFO Test: [400/402] Time 0.482 (0.504) Loss 0.2445 (0.8753) Acc@1 96.094 (79.534) Acc@5 99.219 (95.722) Mem 7155MB
636
+ [2024-02-22 18:02:11 vssm_small] (main.py 331): INFO * Acc@1 79.544 Acc@5 95.724
637
+ [2024-02-22 18:02:11 vssm_small] (main.py 177): INFO Accuracy of the network ema on the 51354 test images: 79.5%
638
+ [2024-02-22 18:02:11 vssm_small] (main.py 196): INFO Start training
639
+ [2024-02-22 18:02:22 vssm_small] (main.py 274): INFO Train: [167/300][0/933] eta 2:59:28 lr 0.000058 wd 0.0500 time 11.5413 (11.5413) data time 8.5294 (8.5294) loss 3.2837 (3.2837) grad_norm 7.2747 (7.2747) loss_scale 32768.0000 (32768.0000) mem 50097MB
640
+ [2024-02-22 18:02:38 vssm_small] (main.py 274): INFO Train: [167/300][10/933] eta 0:38:04 lr 0.000058 wd 0.0500 time 1.5679 (2.4753) data time 0.0005 (0.7759) loss 2.0028 (3.0696) grad_norm 5.6496 (6.8319) loss_scale 32768.0000 (32768.0000) mem 50285MB
641
+ [2024-02-22 18:03:56 vssm_small] (main.py 401): INFO Full config saved to ./res_vmamba_cnf241_result_best/vssm_small/default/config.json
642
+ [2024-02-22 18:03:56 vssm_small] (main.py 404): INFO AMP_ENABLE: true
643
+ AMP_OPT_LEVEL: ''
644
+ AUG:
645
+ AUTO_AUGMENT: rand-m9-mstd0.5-inc1
646
+ COLOR_JITTER: 0.4
647
+ CUTMIX: 1.0
648
+ CUTMIX_MINMAX: null
649
+ MIXUP: 0.8
650
+ MIXUP_MODE: batch
651
+ MIXUP_PROB: 1.0
652
+ MIXUP_SWITCH_PROB: 0.5
653
+ RECOUNT: 1
654
+ REMODE: pixel
655
+ REPROB: 0.25
656
+ BASE:
657
+ - ''
658
+ DATA:
659
+ BATCH_SIZE: 128
660
+ CACHE_MODE: part
661
+ DATASET: imagenet
662
+ DATA_PATH: /home/public_3T/food_data/CNFOOD-241
663
+ IMG_SIZE: 224
664
+ INTERPOLATION: bicubic
665
+ MASK_PATCH_SIZE: 32
666
+ MASK_RATIO: 0.6
667
+ NUM_WORKERS: 8
668
+ PIN_MEMORY: true
669
+ ZIP_MODE: false
670
+ ENABLE_AMP: false
671
+ EVAL_MODE: false
672
+ FUSED_LAYERNORM: false
673
+ FUSED_WINDOW_PROCESS: false
674
+ LOCAL_RANK: 0
675
+ MODEL:
676
+ DROP_PATH_RATE: 0.3
677
+ DROP_RATE: 0.0
678
+ LABEL_SMOOTHING: 0.1
679
+ MMCKPT: false
680
+ NAME: vssm_small
681
+ NUM_CLASSES: 241
682
+ PRETRAINED: ./res_vmamba_cnf241_result_2/vssm_small/default/ckpt_epoch_12.pth
683
+ RESUME: ''
684
+ TYPE: vssm
685
+ VSSM:
686
+ DEPTHS:
687
+ - 2
688
+ - 2
689
+ - 27
690
+ - 2
691
+ DOWNSAMPLE: v1
692
+ DT_RANK: auto
693
+ D_STATE: 16
694
+ EMBED_DIM: 96
695
+ IN_CHANS: 3
696
+ MLP_RATIO: 0.0
697
+ PATCH_NORM: true
698
+ PATCH_SIZE: 4
699
+ SHARED_SSM: false
700
+ SOFTMAX: false
701
+ SSM_RATIO: 2.0
702
+ OUTPUT: ./res_vmamba_cnf241_result_best/vssm_small/default
703
+ PRINT_FREQ: 10
704
+ SAVE_FREQ: 1
705
+ SEED: 0
706
+ TAG: default
707
+ TEST:
708
+ CROP: true
709
+ SEQUENTIAL: false
710
+ SHUFFLE: false
711
+ THROUGHPUT_MODE: false
712
+ TRAIN:
713
+ ACCUMULATION_STEPS: 1
714
+ AUTO_RESUME: true
715
+ BASE_LR: 0.000125
716
+ CLIP_GRAD: 5.0
717
+ EPOCHS: 300
718
+ LAYER_DECAY: 1.0
719
+ LR_SCHEDULER:
720
+ DECAY_EPOCHS: 30
721
+ DECAY_RATE: 0.1
722
+ GAMMA: 0.1
723
+ MULTISTEPS: []
724
+ NAME: cosine
725
+ WARMUP_PREFIX: true
726
+ MIN_LR: 1.25e-06
727
+ MOE:
728
+ SAVE_MASTER: false
729
+ OPTIMIZER:
730
+ BETAS:
731
+ - 0.9
732
+ - 0.999
733
+ EPS: 1.0e-08
734
+ MOMENTUM: 0.9
735
+ NAME: adamw
736
+ START_EPOCH: 0
737
+ USE_CHECKPOINT: false
738
+ WARMUP_EPOCHS: 20
739
+ WARMUP_LR: 1.25e-07
740
+ WEIGHT_DECAY: 0.05
741
+
742
+ [2024-02-22 18:03:56 vssm_small] (main.py 405): INFO {"cfg": "configs/vssm/vssm_small_224.yaml", "opts": null, "batch_size": 128, "data_path": "/home/public_3T/food_data/CNFOOD-241", "zip": false, "cache_mode": "part", "pretrained": "./res_vmamba_cnf241_result_2/vssm_small/default/ckpt_epoch_12.pth", "resume": null, "accumulation_steps": null, "use_checkpoint": false, "disable_amp": false, "amp_opt_level": null, "output": "./res_vmamba_cnf241_result_best", "tag": null, "eval": false, "throughput": false, "local_rank": 0, "fused_layernorm": false, "optim": null, "model_ema": true, "model_ema_decay": 0.9999, "model_ema_force_cpu": false}
743
+ [2024-02-22 18:03:56 vssm_small] (main.py 112): INFO Creating model:vssm/vssm_small
744
+ [2024-02-22 18:03:57 vssm_small] (main.py 118): INFO VSSM(
745
+ (patch_embed): Sequential(
746
+ (0): Conv2d(3, 96, kernel_size=(4, 4), stride=(4, 4))
747
+ (1): Permute()
748
+ (2): LayerNorm((96,), eps=1e-05, elementwise_affine=True)
749
+ )
750
+ (layers): ModuleList(
751
+ (0): Sequential(
752
+ (blocks): Sequential(
753
+ (0): VSSBlock(
754
+ (norm): LayerNorm((96,), eps=1e-05, elementwise_affine=True)
755
+ (op): SS2D(
756
+ (out_norm): LayerNorm((192,), eps=1e-05, elementwise_affine=True)
757
+ (in_proj): Linear(in_features=96, out_features=384, bias=False)
758
+ (act): SiLU()
759
+ (conv2d): Conv2d(192, 192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=192)
760
+ (out_proj): Linear(in_features=192, out_features=96, bias=False)
761
+ (dropout): Identity()
762
+ )
763
+ (drop_path): timm.DropPath(0.0)
764
+ )
765
+ (1): VSSBlock(
766
+ (norm): LayerNorm((96,), eps=1e-05, elementwise_affine=True)
767
+ (op): SS2D(
768
+ (out_norm): LayerNorm((192,), eps=1e-05, elementwise_affine=True)
769
+ (in_proj): Linear(in_features=96, out_features=384, bias=False)
770
+ (act): SiLU()
771
+ (conv2d): Conv2d(192, 192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=192)
772
+ (out_proj): Linear(in_features=192, out_features=96, bias=False)
773
+ (dropout): Identity()
774
+ )
775
+ (drop_path): timm.DropPath(0.00937500037252903)
776
+ )
777
+ )
778
+ (downsample): PatchMerging2D(
779
+ (reduction): Linear(in_features=384, out_features=192, bias=False)
780
+ (norm): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
781
+ )
782
+ )
783
+ (1): Sequential(
784
+ (blocks): Sequential(
785
+ (0): VSSBlock(
786
+ (norm): LayerNorm((192,), eps=1e-05, elementwise_affine=True)
787
+ (op): SS2D(
788
+ (out_norm): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
789
+ (in_proj): Linear(in_features=192, out_features=768, bias=False)
790
+ (act): SiLU()
791
+ (conv2d): Conv2d(384, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=384)
792
+ (out_proj): Linear(in_features=384, out_features=192, bias=False)
793
+ (dropout): Identity()
794
+ )
795
+ (drop_path): timm.DropPath(0.01875000074505806)
796
+ )
797
+ (1): VSSBlock(
798
+ (norm): LayerNorm((192,), eps=1e-05, elementwise_affine=True)
799
+ (op): SS2D(
800
+ (out_norm): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
801
+ (in_proj): Linear(in_features=192, out_features=768, bias=False)
802
+ (act): SiLU()
803
+ (conv2d): Conv2d(384, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=384)
804
+ (out_proj): Linear(in_features=384, out_features=192, bias=False)
805
+ (dropout): Identity()
806
+ )
807
+ (drop_path): timm.DropPath(0.02812500111758709)
808
+ )
809
+ )
810
+ (downsample): PatchMerging2D(
811
+ (reduction): Linear(in_features=768, out_features=384, bias=False)
812
+ (norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
813
+ )
814
+ )
815
+ (2): Sequential(
816
+ (blocks): Sequential(
817
+ (0): VSSBlock(
818
+ (norm): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
819
+ (op): SS2D(
820
+ (out_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
821
+ (in_proj): Linear(in_features=384, out_features=1536, bias=False)
822
+ (act): SiLU()
823
+ (conv2d): Conv2d(768, 768, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=768)
824
+ (out_proj): Linear(in_features=768, out_features=384, bias=False)
825
+ (dropout): Identity()
826
+ )
827
+ (drop_path): timm.DropPath(0.03750000149011612)
828
+ )
829
+ (1): VSSBlock(
830
+ (norm): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
831
+ (op): SS2D(
832
+ (out_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
833
+ (in_proj): Linear(in_features=384, out_features=1536, bias=False)
834
+ (act): SiLU()
835
+ (conv2d): Conv2d(768, 768, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=768)
836
+ (out_proj): Linear(in_features=768, out_features=384, bias=False)
837
+ (dropout): Identity()
838
+ )
839
+ (drop_path): timm.DropPath(0.046875)
840
+ )
841
+ (2): VSSBlock(
842
+ (norm): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
843
+ (op): SS2D(
844
+ (out_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
845
+ (in_proj): Linear(in_features=384, out_features=1536, bias=False)
846
+ (act): SiLU()
847
+ (conv2d): Conv2d(768, 768, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=768)
848
+ (out_proj): Linear(in_features=768, out_features=384, bias=False)
849
+ (dropout): Identity()
850
+ )
851
+ (drop_path): timm.DropPath(0.05625000223517418)
852
+ )
853
+ (3): VSSBlock(
854
+ (norm): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
855
+ (op): SS2D(
856
+ (out_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
857
+ (in_proj): Linear(in_features=384, out_features=1536, bias=False)
858
+ (act): SiLU()
859
+ (conv2d): Conv2d(768, 768, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=768)
860
+ (out_proj): Linear(in_features=768, out_features=384, bias=False)
861
+ (dropout): Identity()
862
+ )
863
+ (drop_path): timm.DropPath(0.06562500447034836)
864
+ )
865
+ (4): VSSBlock(
866
+ (norm): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
867
+ (op): SS2D(
868
+ (out_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
869
+ (in_proj): Linear(in_features=384, out_features=1536, bias=False)
870
+ (act): SiLU()
871
+ (conv2d): Conv2d(768, 768, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=768)
872
+ (out_proj): Linear(in_features=768, out_features=384, bias=False)
873
+ (dropout): Identity()
874
+ )
875
+ (drop_path): timm.DropPath(0.07500000298023224)
876
+ )
877
+ (5): VSSBlock(
878
+ (norm): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
879
+ (op): SS2D(
880
+ (out_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
881
+ (in_proj): Linear(in_features=384, out_features=1536, bias=False)
882
+ (act): SiLU()
883
+ (conv2d): Conv2d(768, 768, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=768)
884
+ (out_proj): Linear(in_features=768, out_features=384, bias=False)
885
+ (dropout): Identity()
886
+ )
887
+ (drop_path): timm.DropPath(0.08437500149011612)
888
+ )
889
+ (6): VSSBlock(
890
+ (norm): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
891
+ (op): SS2D(
892
+ (out_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
893
+ (in_proj): Linear(in_features=384, out_features=1536, bias=False)
894
+ (act): SiLU()
895
+ (conv2d): Conv2d(768, 768, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=768)
896
+ (out_proj): Linear(in_features=768, out_features=384, bias=False)
897
+ (dropout): Identity()
898
+ )
899
+ (drop_path): timm.DropPath(0.09375)
900
+ )
901
+ (7): VSSBlock(
902
+ (norm): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
903
+ (op): SS2D(
904
+ (out_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
905
+ (in_proj): Linear(in_features=384, out_features=1536, bias=False)
906
+ (act): SiLU()
907
+ (conv2d): Conv2d(768, 768, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=768)
908
+ (out_proj): Linear(in_features=768, out_features=384, bias=False)
909
+ (dropout): Identity()
910
+ )
911
+ (drop_path): timm.DropPath(0.10312500596046448)
912
+ )
913
+ (8): VSSBlock(
914
+ (norm): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
915
+ (op): SS2D(
916
+ (out_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
917
+ (in_proj): Linear(in_features=384, out_features=1536, bias=False)
918
+ (act): SiLU()
919
+ (conv2d): Conv2d(768, 768, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=768)
920
+ (out_proj): Linear(in_features=768, out_features=384, bias=False)
921
+ (dropout): Identity()
922
+ )
923
+ (drop_path): timm.DropPath(0.11250000447034836)
924
+ )
925
+ (9): VSSBlock(
926
+ (norm): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
927
+ (op): SS2D(
928
+ (out_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
929
+ (in_proj): Linear(in_features=384, out_features=1536, bias=False)
930
+ (act): SiLU()
931
+ (conv2d): Conv2d(768, 768, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=768)
932
+ (out_proj): Linear(in_features=768, out_features=384, bias=False)
933
+ (dropout): Identity()
934
+ )
935
+ (drop_path): timm.DropPath(0.12187500298023224)
936
+ )
937
+ (10): VSSBlock(
938
+ (norm): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
939
+ (op): SS2D(
940
+ (out_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
941
+ (in_proj): Linear(in_features=384, out_features=1536, bias=False)
942
+ (act): SiLU()
943
+ (conv2d): Conv2d(768, 768, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=768)
944
+ (out_proj): Linear(in_features=768, out_features=384, bias=False)
945
+ (dropout): Identity()
946
+ )
947
+ (drop_path): timm.DropPath(0.13125000894069672)
948
+ )
949
+ (11): VSSBlock(
950
+ (norm): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
951
+ (op): SS2D(
952
+ (out_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
953
+ (in_proj): Linear(in_features=384, out_features=1536, bias=False)
954
+ (act): SiLU()
955
+ (conv2d): Conv2d(768, 768, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=768)
956
+ (out_proj): Linear(in_features=768, out_features=384, bias=False)
957
+ (dropout): Identity()
958
+ )
959
+ (drop_path): timm.DropPath(0.140625)
960
+ )
961
+ (12): VSSBlock(
962
+ (norm): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
963
+ (op): SS2D(
964
+ (out_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
965
+ (in_proj): Linear(in_features=384, out_features=1536, bias=False)
966
+ (act): SiLU()
967
+ (conv2d): Conv2d(768, 768, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=768)
968
+ (out_proj): Linear(in_features=768, out_features=384, bias=False)
969
+ (dropout): Identity()
970
+ )
971
+ (drop_path): timm.DropPath(0.15000000596046448)
972
+ )
973
+ (13): VSSBlock(
974
+ (norm): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
975
+ (op): SS2D(
976
+ (out_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
977
+ (in_proj): Linear(in_features=384, out_features=1536, bias=False)
978
+ (act): SiLU()
979
+ (conv2d): Conv2d(768, 768, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=768)
980
+ (out_proj): Linear(in_features=768, out_features=384, bias=False)
981
+ (dropout): Identity()
982
+ )
983
+ (drop_path): timm.DropPath(0.15937501192092896)
984
+ )
985
+ (14): VSSBlock(
986
+ (norm): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
987
+ (op): SS2D(
988
+ (out_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
989
+ (in_proj): Linear(in_features=384, out_features=1536, bias=False)
990
+ (act): SiLU()
991
+ (conv2d): Conv2d(768, 768, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=768)
992
+ (out_proj): Linear(in_features=768, out_features=384, bias=False)
993
+ (dropout): Identity()
994
+ )
995
+ (drop_path): timm.DropPath(0.16875000298023224)
996
+ )
997
+ (15): VSSBlock(
998
+ (norm): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
999
+ (op): SS2D(
1000
+ (out_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
1001
+ (in_proj): Linear(in_features=384, out_features=1536, bias=False)
1002
+ (act): SiLU()
1003
+ (conv2d): Conv2d(768, 768, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=768)
1004
+ (out_proj): Linear(in_features=768, out_features=384, bias=False)
1005
+ (dropout): Identity()
1006
+ )
1007
+ (drop_path): timm.DropPath(0.17812500894069672)
1008
+ )
1009
+ (16): VSSBlock(
1010
+ (norm): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
1011
+ (op): SS2D(
1012
+ (out_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
1013
+ (in_proj): Linear(in_features=384, out_features=1536, bias=False)
1014
+ (act): SiLU()
1015
+ (conv2d): Conv2d(768, 768, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=768)
1016
+ (out_proj): Linear(in_features=768, out_features=384, bias=False)
1017
+ (dropout): Identity()
1018
+ )
1019
+ (drop_path): timm.DropPath(0.1875)
1020
+ )
1021
+ (17): VSSBlock(
1022
+ (norm): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
1023
+ (op): SS2D(
1024
+ (out_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
1025
+ (in_proj): Linear(in_features=384, out_features=1536, bias=False)
1026
+ (act): SiLU()
1027
+ (conv2d): Conv2d(768, 768, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=768)
1028
+ (out_proj): Linear(in_features=768, out_features=384, bias=False)
1029
+ (dropout): Identity()
1030
+ )
1031
+ (drop_path): timm.DropPath(0.19687500596046448)
1032
+ )
1033
+ (18): VSSBlock(
1034
+ (norm): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
1035
+ (op): SS2D(
1036
+ (out_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
1037
+ (in_proj): Linear(in_features=384, out_features=1536, bias=False)
1038
+ (act): SiLU()
1039
+ (conv2d): Conv2d(768, 768, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=768)
1040
+ (out_proj): Linear(in_features=768, out_features=384, bias=False)
1041
+ (dropout): Identity()
1042
+ )
1043
+ (drop_path): timm.DropPath(0.20625001192092896)
1044
+ )
1045
+ (19): VSSBlock(
1046
+ (norm): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
1047
+ (op): SS2D(
1048
+ (out_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
1049
+ (in_proj): Linear(in_features=384, out_features=1536, bias=False)
1050
+ (act): SiLU()
1051
+ (conv2d): Conv2d(768, 768, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=768)
1052
+ (out_proj): Linear(in_features=768, out_features=384, bias=False)
1053
+ (dropout): Identity()
1054
+ )
1055
+ (drop_path): timm.DropPath(0.21562501788139343)
1056
+ )
1057
+ (20): VSSBlock(
1058
+ (norm): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
1059
+ (op): SS2D(
1060
+ (out_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
1061
+ (in_proj): Linear(in_features=384, out_features=1536, bias=False)
1062
+ (act): SiLU()
1063
+ (conv2d): Conv2d(768, 768, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=768)
1064
+ (out_proj): Linear(in_features=768, out_features=384, bias=False)
1065
+ (dropout): Identity()
1066
+ )
1067
+ (drop_path): timm.DropPath(0.22500000894069672)
1068
+ )
1069
+ (21): VSSBlock(
1070
+ (norm): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
1071
+ (op): SS2D(
1072
+ (out_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
1073
+ (in_proj): Linear(in_features=384, out_features=1536, bias=False)
1074
+ (act): SiLU()
1075
+ (conv2d): Conv2d(768, 768, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=768)
1076
+ (out_proj): Linear(in_features=768, out_features=384, bias=False)
1077
+ (dropout): Identity()
1078
+ )
1079
+ (drop_path): timm.DropPath(0.2343750149011612)
1080
+ )
1081
+ (22): VSSBlock(
1082
+ (norm): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
1083
+ (op): SS2D(
1084
+ (out_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
1085
+ (in_proj): Linear(in_features=384, out_features=1536, bias=False)
1086
+ (act): SiLU()
1087
+ (conv2d): Conv2d(768, 768, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=768)
1088
+ (out_proj): Linear(in_features=768, out_features=384, bias=False)
1089
+ (dropout): Identity()
1090
+ )
1091
+ (drop_path): timm.DropPath(0.24375000596046448)
1092
+ )
1093
+ (23): VSSBlock(
1094
+ (norm): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
1095
+ (op): SS2D(
1096
+ (out_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
1097
+ (in_proj): Linear(in_features=384, out_features=1536, bias=False)
1098
+ (act): SiLU()
1099
+ (conv2d): Conv2d(768, 768, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=768)
1100
+ (out_proj): Linear(in_features=768, out_features=384, bias=False)
1101
+ (dropout): Identity()
1102
+ )
1103
+ (drop_path): timm.DropPath(0.25312501192092896)
1104
+ )
1105
+ (24): VSSBlock(
1106
+ (norm): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
1107
+ (op): SS2D(
1108
+ (out_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
1109
+ (in_proj): Linear(in_features=384, out_features=1536, bias=False)
1110
+ (act): SiLU()
1111
+ (conv2d): Conv2d(768, 768, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=768)
1112
+ (out_proj): Linear(in_features=768, out_features=384, bias=False)
1113
+ (dropout): Identity()
1114
+ )
1115
+ (drop_path): timm.DropPath(0.26250001788139343)
1116
+ )
1117
+ (25): VSSBlock(
1118
+ (norm): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
1119
+ (op): SS2D(
1120
+ (out_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
1121
+ (in_proj): Linear(in_features=384, out_features=1536, bias=False)
1122
+ (act): SiLU()
1123
+ (conv2d): Conv2d(768, 768, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=768)
1124
+ (out_proj): Linear(in_features=768, out_features=384, bias=False)
1125
+ (dropout): Identity()
1126
+ )
1127
+ (drop_path): timm.DropPath(0.2718750238418579)
1128
+ )
1129
+ (26): VSSBlock(
1130
+ (norm): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
1131
+ (op): SS2D(
1132
+ (out_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
1133
+ (in_proj): Linear(in_features=384, out_features=1536, bias=False)
1134
+ (act): SiLU()
1135
+ (conv2d): Conv2d(768, 768, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=768)
1136
+ (out_proj): Linear(in_features=768, out_features=384, bias=False)
1137
+ (dropout): Identity()
1138
+ )
1139
+ (drop_path): timm.DropPath(0.28125)
1140
+ )
1141
+ )
1142
+ (downsample): PatchMerging2D(
1143
+ (reduction): Linear(in_features=1536, out_features=768, bias=False)
1144
+ (norm): LayerNorm((1536,), eps=1e-05, elementwise_affine=True)
1145
+ )
1146
+ )
1147
+ (3): Sequential(
1148
+ (blocks): Sequential(
1149
+ (0): VSSBlock(
1150
+ (norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
1151
+ (op): SS2D(
1152
+ (out_norm): LayerNorm((1536,), eps=1e-05, elementwise_affine=True)
1153
+ (in_proj): Linear(in_features=768, out_features=3072, bias=False)
1154
+ (act): SiLU()
1155
+ (conv2d): Conv2d(1536, 1536, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1536)
1156
+ (out_proj): Linear(in_features=1536, out_features=768, bias=False)
1157
+ (dropout): Identity()
1158
+ )
1159
+ (drop_path): timm.DropPath(0.2906250059604645)
1160
+ )
1161
+ (1): VSSBlock(
1162
+ (norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
1163
+ (op): SS2D(
1164
+ (out_norm): LayerNorm((1536,), eps=1e-05, elementwise_affine=True)
1165
+ (in_proj): Linear(in_features=768, out_features=3072, bias=False)
1166
+ (act): SiLU()
1167
+ (conv2d): Conv2d(1536, 1536, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1536)
1168
+ (out_proj): Linear(in_features=1536, out_features=768, bias=False)
1169
+ (dropout): Identity()
1170
+ )
1171
+ (drop_path): timm.DropPath(0.30000001192092896)
1172
+ )
1173
+ )
1174
+ (downsample): Identity()
1175
+ )
1176
+ )
1177
+ (classifier): Sequential(
1178
+ (norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
1179
+ (permute): Permute()
1180
+ (avgpool): AdaptiveAvgPool2d(output_size=1)
1181
+ (flatten): Flatten(start_dim=1, end_dim=-1)
1182
+ (head): Linear(in_features=768, out_features=1000, bias=True)
1183
+ )
1184
+ )
1185
+ [2024-02-22 18:03:57 vssm_small] (main.py 120): INFO number of params: 44417416
1186
+ [2024-02-22 18:03:58 vssm_small] (main.py 123): INFO number of GFLOPs: 11.231522784
1187
+ [2024-02-22 18:03:58 vssm_small] (main.py 167): INFO auto resuming from ./res_vmamba_cnf241_result_best/vssm_small/default/ckpt_epoch_166.pth
1188
+ [2024-02-22 18:03:58 vssm_small] (utils.py 18): INFO ==============> Resuming form ./res_vmamba_cnf241_result_best/vssm_small/default/ckpt_epoch_166.pth....................
1189
+ [2024-02-22 18:04:00 vssm_small] (utils.py 27): INFO resuming model: <All keys matched successfully>
1190
+ [2024-02-22 18:04:00 vssm_small] (utils.py 34): INFO resuming model_ema: <All keys matched successfully>
1191
+ [2024-02-22 18:04:00 vssm_small] (utils.py 48): INFO => loaded successfully './res_vmamba_cnf241_result_best/vssm_small/default/ckpt_epoch_166.pth' (epoch 166)
1192
+ [2024-02-22 18:04:11 vssm_small] (main.py 324): INFO Test: [0/164] Time 10.625 (10.625) Loss 0.3557 (0.3557) Acc@1 92.188 (92.188) Acc@5 99.219 (99.219) Mem 7155MB
1193
+ [2024-02-22 18:04:16 vssm_small] (main.py 324): INFO Test: [10/164] Time 0.483 (1.405) Loss 0.9014 (0.9043) Acc@1 75.000 (78.622) Acc@5 96.094 (95.099) Mem 7155MB
1194
+ [2024-02-22 18:04:21 vssm_small] (main.py 324): INFO Test: [20/164] Time 0.483 (0.966) Loss 1.3975 (1.0178) Acc@1 73.438 (75.818) Acc@5 89.844 (95.238) Mem 7155MB
1195
+ [2024-02-22 18:04:25 vssm_small] (main.py 324): INFO Test: [30/164] Time 0.483 (0.810) Loss 0.5830 (0.9851) Acc@1 86.719 (77.344) Acc@5 96.875 (94.960) Mem 7155MB
1196
+ [2024-02-22 18:04:30 vssm_small] (main.py 324): INFO Test: [40/164] Time 0.483 (0.730) Loss 0.9902 (0.9247) Acc@1 77.344 (78.925) Acc@5 95.312 (95.332) Mem 7155MB
1197
+ [2024-02-22 18:04:35 vssm_small] (main.py 324): INFO Test: [50/164] Time 0.483 (0.682) Loss 1.3584 (0.9845) Acc@1 69.531 (77.681) Acc@5 92.969 (94.838) Mem 7155MB
1198
+ [2024-02-22 18:04:40 vssm_small] (main.py 324): INFO Test: [60/164] Time 0.483 (0.649) Loss 1.1582 (1.0527) Acc@1 70.312 (75.999) Acc@5 96.875 (94.237) Mem 7155MB
1199
+ [2024-02-22 18:04:45 vssm_small] (main.py 324): INFO Test: [70/164] Time 0.483 (0.626) Loss 0.4968 (1.0371) Acc@1 89.844 (76.320) Acc@5 96.875 (94.311) Mem 7155MB
1200
+ [2024-02-22 18:04:50 vssm_small] (main.py 324): INFO Test: [80/164] Time 0.483 (0.608) Loss 0.4280 (1.0560) Acc@1 91.406 (75.965) Acc@5 98.438 (94.088) Mem 7155MB
1201
+ [2024-02-22 18:04:54 vssm_small] (main.py 324): INFO Test: [90/164] Time 0.483 (0.594) Loss 1.0479 (1.0186) Acc@1 71.875 (76.829) Acc@5 99.219 (94.420) Mem 7155MB
1202
+ [2024-02-22 18:04:59 vssm_small] (main.py 324): INFO Test: [100/164] Time 0.483 (0.583) Loss 0.5444 (1.0171) Acc@1 82.812 (77.158) Acc@5 100.000 (94.307) Mem 7155MB
1203
+ [2024-02-22 18:05:04 vssm_small] (main.py 324): INFO Test: [110/164] Time 0.483 (0.574) Loss 1.3740 (1.0362) Acc@1 67.188 (76.464) Acc@5 96.875 (94.348) Mem 7155MB
1204
+ [2024-02-22 18:05:09 vssm_small] (main.py 324): INFO Test: [120/164] Time 0.483 (0.567) Loss 2.1602 (1.0386) Acc@1 33.594 (76.220) Acc@5 89.844 (94.441) Mem 7155MB
1205
+ [2024-02-22 18:05:14 vssm_small] (main.py 324): INFO Test: [130/164] Time 0.483 (0.560) Loss 1.2930 (1.0532) Acc@1 57.812 (75.889) Acc@5 98.438 (94.281) Mem 7155MB
1206
+ [2024-02-22 18:05:19 vssm_small] (main.py 324): INFO Test: [140/164] Time 0.483 (0.555) Loss 0.7490 (1.0376) Acc@1 81.250 (76.141) Acc@5 96.094 (94.437) Mem 7155MB
1207
+ [2024-02-22 18:05:23 vssm_small] (main.py 324): INFO Test: [150/164] Time 0.482 (0.550) Loss 1.1650 (1.0309) Acc@1 69.531 (76.293) Acc@5 98.438 (94.521) Mem 7155MB
1208
+ [2024-02-22 18:05:28 vssm_small] (main.py 324): INFO Test: [160/164] Time 0.483 (0.546) Loss 0.5903 (1.0296) Acc@1 89.844 (76.305) Acc@5 95.312 (94.580) Mem 7155MB
1209
+ [2024-02-22 18:05:31 vssm_small] (main.py 331): INFO * Acc@1 76.541 Acc@5 94.638
1210
+ [2024-02-22 18:05:31 vssm_small] (main.py 174): INFO Accuracy of the network on the 20943 test images: 76.5%
1211
+ [2024-02-22 18:05:39 vssm_small] (main.py 324): INFO Test: [0/164] Time 8.835 (8.835) Loss 0.4526 (0.4526) Acc@1 89.844 (89.844) Acc@5 99.219 (99.219) Mem 7155MB
1212
+ [2024-02-22 18:05:44 vssm_small] (main.py 324): INFO Test: [10/164] Time 0.482 (1.242) Loss 1.1172 (0.8497) Acc@1 67.969 (79.830) Acc@5 96.094 (95.739) Mem 7155MB
1213
+ [2024-02-22 18:05:49 vssm_small] (main.py 324): INFO Test: [20/164] Time 0.483 (0.880) Loss 1.3506 (0.9275) Acc@1 72.656 (77.567) Acc@5 92.188 (96.168) Mem 7155MB
1214
+ [2024-02-22 18:05:54 vssm_small] (main.py 324): INFO Test: [30/164] Time 0.483 (0.752) Loss 0.6631 (0.9005) Acc@1 84.375 (79.133) Acc@5 96.875 (95.640) Mem 7155MB
1215
+ [2024-02-22 18:05:59 vssm_small] (main.py 324): INFO Test: [40/164] Time 0.483 (0.686) Loss 0.8730 (0.8447) Acc@1 78.906 (80.640) Acc@5 96.094 (95.941) Mem 7155MB
1216
+ [2024-02-22 18:06:03 vssm_small] (main.py 324): INFO Test: [50/164] Time 0.483 (0.646) Loss 1.4102 (0.9097) Acc@1 66.406 (79.350) Acc@5 92.969 (95.343) Mem 7155MB
1217
+ [2024-02-22 18:06:08 vssm_small] (main.py 324): INFO Test: [60/164] Time 0.482 (0.620) Loss 1.1191 (0.9768) Acc@1 67.969 (77.818) Acc@5 96.875 (94.762) Mem 7155MB
1218
+ [2024-02-22 18:06:13 vssm_small] (main.py 324): INFO Test: [70/164] Time 0.482 (0.600) Loss 0.4170 (0.9548) Acc@1 90.625 (78.191) Acc@5 96.094 (94.971) Mem 7155MB
1219
+ [2024-02-22 18:06:18 vssm_small] (main.py 324): INFO Test: [80/164] Time 0.482 (0.586) Loss 0.4082 (0.9766) Acc@1 91.406 (77.778) Acc@5 98.438 (94.676) Mem 7155MB
1220
+ [2024-02-22 18:06:23 vssm_small] (main.py 324): INFO Test: [90/164] Time 0.482 (0.574) Loss 1.0576 (0.9459) Acc@1 72.656 (78.546) Acc@5 99.219 (94.943) Mem 7155MB
1221
+ [2024-02-22 18:06:28 vssm_small] (main.py 324): INFO Test: [100/164] Time 0.483 (0.565) Loss 0.5508 (0.9468) Acc@1 84.375 (78.860) Acc@5 100.000 (94.825) Mem 7155MB
1222
+ [2024-02-22 18:06:32 vssm_small] (main.py 324): INFO Test: [110/164] Time 0.483 (0.558) Loss 1.1367 (0.9615) Acc@1 68.750 (78.202) Acc@5 97.656 (94.869) Mem 7155MB
1223
+ [2024-02-22 18:06:37 vssm_small] (main.py 324): INFO Test: [120/164] Time 0.482 (0.552) Loss 2.1855 (0.9641) Acc@1 29.688 (77.893) Acc@5 91.406 (94.990) Mem 7155MB
1224
+ [2024-02-22 18:06:42 vssm_small] (main.py 324): INFO Test: [130/164] Time 0.483 (0.546) Loss 1.2090 (0.9734) Acc@1 60.156 (77.642) Acc@5 99.219 (94.931) Mem 7155MB
1225
+ [2024-02-22 18:06:47 vssm_small] (main.py 324): INFO Test: [140/164] Time 0.483 (0.542) Loss 0.6606 (0.9576) Acc@1 82.812 (77.876) Acc@5 99.219 (95.107) Mem 7155MB
1226
+ [2024-02-22 18:06:52 vssm_small] (main.py 324): INFO Test: [150/164] Time 0.482 (0.538) Loss 0.9053 (0.9501) Acc@1 77.344 (78.084) Acc@5 98.438 (95.183) Mem 7155MB
1227
+ [2024-02-22 18:06:57 vssm_small] (main.py 324): INFO Test: [160/164] Time 0.482 (0.534) Loss 0.5884 (0.9481) Acc@1 86.719 (78.023) Acc@5 96.875 (95.259) Mem 7155MB
1228
+ [2024-02-22 18:06:58 vssm_small] (main.py 331): INFO * Acc@1 78.260 Acc@5 95.306
1229
+ [2024-02-22 18:06:58 vssm_small] (main.py 177): INFO Accuracy of the network ema on the 20943 test images: 78.3%
1230
+ [2024-02-22 18:06:58 vssm_small] (main.py 196): INFO Start training
1231
+ [2024-02-22 18:07:10 vssm_small] (main.py 274): INFO Train: [167/300][0/933] eta 3:02:27 lr 0.000058 wd 0.0500 time 11.7339 (11.7339) data time 8.6883 (8.6883) loss 3.2837 (3.2837) grad_norm 7.2753 (7.2753) loss_scale 32768.0000 (32768.0000) mem 50097MB
1232
+ [2024-02-22 18:07:26 vssm_small] (main.py 274): INFO Train: [167/300][10/933] eta 0:38:21 lr 0.000058 wd 0.0500 time 1.5683 (2.4935) data time 0.0006 (0.7903) loss 2.0028 (3.0696) grad_norm 5.6539 (6.8326) loss_scale 32768.0000 (32768.0000) mem 50285MB
1233
+ [2024-02-22 18:07:41 vssm_small] (main.py 274): INFO Train: [167/300][20/933] eta 0:31:15 lr 0.000058 wd 0.0500 time 1.5680 (2.0546) data time 0.0006 (0.4143) loss 2.7450 (2.9151) grad_norm 4.8270 (6.2856) loss_scale 32768.0000 (32768.0000) mem 50285MB