guanxiongsun commited on
Commit
8c50f70
1 Parent(s): fa9254a
README.md ADDED
@@ -0,0 +1,183 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Video Feature Enhancement with PyTorch
2
+
3
+ [![License](https://img.shields.io/badge/license-BSD-blue.svg)](LICENSE)
4
+
5
+ This repo contains the code for the paper:
6
+ [MAMBA](https://arxiv.org/abs/2401.09923), STPN, TDViT, EOVOD
7
+
8
+
9
+ Additionally, we provide archive files of two widely-used datasets, ImageNetVID and GOT-10K. The official links of these datasets are not accessible or deleted. We hope these resources can help future research.
10
+
11
+ ## Progress
12
+
13
+ - [x] [MAMBA](https://arxiv.org/abs/2401.09923)
14
+ - [ ] STPN
15
+ - [ ] TDViT
16
+ - [ ] EOVOD
17
+
18
+ ## Main Results
19
+
20
+ | Model | Backbone | AP50 | AP (fast) | AP (med) | AP (slow) | Link |
21
+ | :----------------: | :--------: | :--: | :-------: | :------: | :-------: | :------------------------------------------------------------------------------------------: |
22
+ | FasterRCNN | ResNet-101 | 76.7 | 52.3 | 74.1 | 84.9 | [model](https://drive.google.com/file/d/1W17f9GC60rHU47lUeOEfU--Ra-LTw3Tq/view?usp=sharing), [reference](https://github.com/Scalsol/mega.pytorch/tree/master?tab=readme-ov-file#main-results)|
23
+ | SELSA | ResNet-101 | 81.5 | -- | -- | -- | [model](https://download.openmmlab.com/mmtracking/vid/selsa/selsa_faster_rcnn_r101_dc5_1x_imagenetvid/selsa_faster_rcnn_r101_dc5_1x_imagenetvid_20201218_172724-aa961bcc.pth), [reference](https://github.com/open-mmlab/mmtracking/tree/master/configs/vid/selsa) |
24
+ | MEGA | ResNet-101 | 82.9 |62.7 |81.6 |89.4 | [model](https://drive.google.com/file/d/1ZnAdFafF1vW9Lnpw-RPF1AD_csw61lBY/view?usp=sharing), [reference](https://github.com/Scalsol/mega.pytorch/tree/master) |
25
+ | **MAMBA** | ResNet-101 | 83.8 | 65.3 | 83.8 | 89.5 | [model](), [paper](https://arxiv.org/abs/2401.09923)|
26
+
27
+ ## Installation
28
+ The code are tested with the following environments:
29
+
30
+ ### Tested environments:
31
+
32
+ - python 3.8
33
+ - pytorch 1.10.1
34
+ - cuda 11.3
35
+ - mmcv-full 1.3.17
36
+
37
+ ### Option 1: Step-by-step installation
38
+
39
+ ```bash
40
+ conda create --name vfe -y python=3.8
41
+ conda activate vfe
42
+
43
+ # install PyTorch with cuda support
44
+ conda install pytorch==1.10.1 torchvision==0.11.2 torchaudio==0.10.1 cudatoolkit=11.3 -c pytorch -c conda-forge
45
+
46
+ # install mmcv-full 1.3.17
47
+ pip install mmcv-full==1.3.17 -f https://download.openmmlab.com/mmcv/dist/cu113/torch1.10/index.html
48
+
49
+ # install other requirements
50
+ pip install -r requirements.txt
51
+
52
+ # install mmpycocotools
53
+ pip install mmpycocotools
54
+ ```
55
+
56
+ See [here](https://github.com/open-mmlab/mmcv#installation) for different versions of MMCV compatible to different PyTorch and CUDA versions.
57
+
58
+ ## Data preparation
59
+
60
+ ### Download Datasets
61
+
62
+ The original links of ImageNetVID dataset are either broken or unavailible. Here, we provide the new link to download the file for the furture reference of the community. Please download ILSVRC2015 DET and ILSVRC2015 VID datasets from [here]().
63
+
64
+ **Note:** The links expire every 30 days. If they expire, please contact me to renew it.
65
+
66
+ After that, we recommend to symlink the path to the datasets to `datasets/`. And the path structure should be as follows:
67
+
68
+ ./data/ILSVRC/
69
+ ./data/ILSVRC/Annotations/DET
70
+ ./data/ILSVRC/Annotations/VID
71
+ ./data/ILSVRC/Data/DET
72
+ ./data/ILSVRC/Data/VID
73
+ ./data/ILSVRC/ImageSets
74
+
75
+ **Note**: List txt files under `ImageSets` folder can be obtained from
76
+ [here](https://github.com/msracver/Flow-Guided-Feature-Aggregation/tree/master/data/ILSVRC2015/ImageSets).
77
+
78
+ ### Convert Annotations
79
+
80
+ We use [CocoVID](mmdet/datasets/parsers/coco_video_parser.py) to maintain all datasets in this codebase. In this case, you need to convert the official annotations to this style. We provide scripts and the usages are as following:
81
+
82
+ ```bash
83
+ # ImageNet DET
84
+ python ./tools/convert_datasets/ilsvrc/imagenet2coco_det.py -i ./data/ILSVRC -o ./data/ILSVRC/annotations
85
+
86
+ # ImageNet VID
87
+ python ./tools/convert_datasets/ilsvrc/imagenet2coco_vid.py -i ./data/ILSVRC -o ./data/ILSVRC/annotations
88
+
89
+ ```
90
+
91
+ ## Usage
92
+
93
+ ### Inference
94
+
95
+ This section will show how to test existing models on supported datasets.
96
+ The following testing environments are supported:
97
+
98
+ - single GPU
99
+ - single node multiple GPU
100
+
101
+ During testing, different tasks share the same API and we only support `samples_per_gpu = 1`.
102
+
103
+ You can use the following commands for testing:
104
+
105
+ ```shell
106
+ # single-gpu testing
107
+ python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [--out ${RESULT_FILE}] [--eval ${EVAL_METRICS}]
108
+
109
+ # multi-gpu testing
110
+ ./tools/dist_test.sh ${CONFIG_FILE} ${GPU_NUM} [--checkpoint ${CHECKPOINT_FILE}] [--out ${RESULT_FILE}] [--eval ${EVAL_METRICS}]
111
+ ```
112
+
113
+ Optional arguments:
114
+
115
+ - `CHECKPOINT_FILE`: Filename of the checkpoint. You do not need to define it when applying some MOT methods but specify the checkpoints in the config.
116
+ - `RESULT_FILE`: Filename of the output results in pickle format. If not specified, the results will not be saved to a file.
117
+ - `EVAL_METRICS`: Items to be evaluated on the results. Allowed values depend on the dataset, e.g., `bbox` is available for ImageNet VID, `track` is available for LaSOT, `bbox` and `track` are both suitable for MOT17.
118
+ - `--cfg-options`: If specified, the key-value pair optional cfg will be merged into config file
119
+ - `--eval-options`: If specified, the key-value pair optional eval cfg will be kwargs for dataset.evaluate() function, it’s only for evaluation
120
+ - `--format-only`: If specified, the results will be formatted to the official format.
121
+
122
+ #### Examples of testing VID model
123
+
124
+ Assume that you have already downloaded the checkpoints to the directory `work_dirs/`.
125
+
126
+ 1. Test MAMBA on ImageNet VID, and evaluate the bbox mAP.
127
+
128
+ ```shell
129
+ python tools/test.py configs/vid/mamba/mamba_r101_dc5_6x.py \
130
+ --checkpoint work_dirs/mamba_r101_dc5_6x/epoch_6_model.pth \
131
+ --out results.pkl \
132
+ --eval bbox
133
+ ```
134
+
135
+ 2. Test MAMBA with 8 GPUs on ImageNet VID, and evaluate the bbox mAP.
136
+
137
+ ```shell
138
+ ./tools/dist_test.sh configs/vid/mamba/mamba_r101_dc5_6x.py 8 \
139
+ --checkpoint work_dirs/mamba_r101_dc5_6x/epoch_6_model.pth \
140
+ --out results.pkl \
141
+ --eval bbox
142
+ ```
143
+
144
+ ### Training
145
+
146
+ #### Training on a single GPU
147
+
148
+ ```shell
149
+ python tools/train.py ${CONFIG_FILE} [optional arguments]
150
+ ```
151
+
152
+ During training, log files and checkpoints will be saved to the working directory, which is specified by `work_dir` in the config file or via CLI argument `--work-dir`.
153
+
154
+ #### Training on multiple GPUs
155
+
156
+ We provide `tools/dist_train.sh` to launch training on multiple GPUs.
157
+ The basic usage is as follows.
158
+
159
+ ```shell
160
+ bash ./tools/dist_train.sh \
161
+ ${CONFIG_FILE} \
162
+ ${GPU_NUM} \
163
+ [optional arguments]
164
+ ```
165
+
166
+ #### Examples of training VID model
167
+
168
+ 1. Train MAMBA on ImageNet VID and ImageNet DET with single GPU, then evaluate the bbox mAP at the last epoch.
169
+
170
+ ```shell
171
+ python tools/train.py configs/vid/mamba/mamba_r101_dc5_6x.py
172
+ ```
173
+
174
+ 2. Train MAMBA on ImageNet VID and ImageNet DET with 8 GPUs, then evaluate the bbox mAP at the last epoch.
175
+
176
+ ```shell
177
+ ./tools/dist_train.sh configs/vid/mamba/mamba_r101_dc5_6x.py 8
178
+ ```
179
+
180
+ ## Reference
181
+
182
+ The codebase is implemented based on two popular open-source repos:
183
+ [mmdetection](https://github.com/open-mmlab/mmdetection) and [mmtracking](https://github.com/open-mmlab/mmtracking) in [PyTorch](https://pytorch.org/).
work_dirs/.DS_Store ADDED
Binary file (6.15 kB). View file
 
work_dirs/mamba_r101_dc5_6x/20240129_190545.log ADDED
The diff for this file is too large to render. See raw diff
 
work_dirs/mamba_r101_dc5_6x/20240129_190545.log.json ADDED
The diff for this file is too large to render. See raw diff
 
work_dirs/mamba_r101_dc5_6x/epoch_6_model.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d652da944645d3f2f5cfcb1a88b2d29df8f02d9269e5fa4f940bf32bd44a0881
3
+ size 359129121
work_dirs/mamba_r101_dc5_6x/mamba_r101_dc5_6x.py ADDED
@@ -0,0 +1,298 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ model = dict(
2
+ detector=dict(
3
+ type='FasterRCNN',
4
+ backbone=dict(
5
+ type='ResNet',
6
+ depth=101,
7
+ num_stages=4,
8
+ out_indices=(3, ),
9
+ strides=(1, 2, 2, 1),
10
+ dilations=(1, 1, 1, 2),
11
+ frozen_stages=1,
12
+ norm_cfg=dict(type='BN', requires_grad=True),
13
+ norm_eval=True,
14
+ style='pytorch',
15
+ init_cfg=dict(
16
+ type='Pretrained', checkpoint='torchvision://resnet101')),
17
+ neck=dict(
18
+ type='ChannelMapper',
19
+ in_channels=[2048],
20
+ out_channels=512,
21
+ kernel_size=3),
22
+ rpn_head=dict(
23
+ type='RPNHead',
24
+ in_channels=512,
25
+ feat_channels=512,
26
+ anchor_generator=dict(
27
+ type='AnchorGenerator',
28
+ scales=[4, 8, 16, 32],
29
+ ratios=[0.5, 1.0, 2.0],
30
+ strides=[16]),
31
+ bbox_coder=dict(
32
+ type='DeltaXYWHBBoxCoder',
33
+ target_means=[0.0, 0.0, 0.0, 0.0],
34
+ target_stds=[1.0, 1.0, 1.0, 1.0]),
35
+ loss_cls=dict(
36
+ type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
37
+ loss_bbox=dict(
38
+ type='SmoothL1Loss', beta=0.1111111111111111,
39
+ loss_weight=1.0)),
40
+ roi_head=dict(
41
+ type='MambaRoIHead',
42
+ bbox_roi_extractor=dict(
43
+ type='SingleRoIExtractor',
44
+ roi_layer=dict(
45
+ type='RoIAlign', output_size=7, sampling_ratio=2),
46
+ out_channels=512,
47
+ featmap_strides=[16]),
48
+ bbox_head=dict(
49
+ type='MambaBBoxHead',
50
+ in_channels=512,
51
+ fc_out_channels=1024,
52
+ roi_feat_size=7,
53
+ num_classes=30,
54
+ bbox_coder=dict(
55
+ type='DeltaXYWHBBoxCoder',
56
+ target_means=[0.0, 0.0, 0.0, 0.0],
57
+ target_stds=[0.2, 0.2, 0.2, 0.2]),
58
+ reg_class_agnostic=False,
59
+ loss_cls=dict(
60
+ type='CrossEntropyLoss',
61
+ use_sigmoid=False,
62
+ loss_weight=1.0),
63
+ loss_bbox=dict(type='SmoothL1Loss', beta=1.0, loss_weight=1.0),
64
+ num_shared_fcs=2,
65
+ topk=75,
66
+ aggregator=dict(
67
+ type='MambaAggregator',
68
+ in_channels=1024,
69
+ num_attention_blocks=16))),
70
+ train_cfg=dict(
71
+ rpn=dict(
72
+ assigner=dict(
73
+ type='MaxIoUAssigner',
74
+ pos_iou_thr=0.7,
75
+ neg_iou_thr=0.3,
76
+ min_pos_iou=0.3,
77
+ ignore_iof_thr=-1),
78
+ sampler=dict(
79
+ type='RandomSampler',
80
+ num=256,
81
+ pos_fraction=0.5,
82
+ neg_pos_ub=-1,
83
+ add_gt_as_proposals=False),
84
+ allowed_border=0,
85
+ pos_weight=-1,
86
+ debug=False),
87
+ rpn_proposal=dict(
88
+ nms_pre=6000,
89
+ max_per_img=600,
90
+ nms=dict(type='nms', iou_threshold=0.7),
91
+ min_bbox_size=0),
92
+ rcnn=dict(
93
+ assigner=dict(
94
+ type='MaxIoUAssigner',
95
+ pos_iou_thr=0.5,
96
+ neg_iou_thr=0.5,
97
+ min_pos_iou=0.5,
98
+ ignore_iof_thr=-1),
99
+ sampler=dict(
100
+ type='RandomSampler',
101
+ num=256,
102
+ pos_fraction=0.25,
103
+ neg_pos_ub=-1,
104
+ add_gt_as_proposals=True),
105
+ pos_weight=-1,
106
+ debug=False)),
107
+ test_cfg=dict(
108
+ rpn=dict(
109
+ nms_pre=6000,
110
+ max_per_img=300,
111
+ nms=dict(type='nms', iou_threshold=0.7),
112
+ min_bbox_size=0),
113
+ rcnn=dict(
114
+ score_thr=0.0001,
115
+ nms=dict(type='nms', iou_threshold=0.5),
116
+ max_per_img=100))),
117
+ type='MAMBA')
118
+ dataset_type = 'ImagenetVIDDataset'
119
+ data_root = 'data/ILSVRC/'
120
+ img_norm_cfg = dict(
121
+ mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
122
+ train_pipeline = [
123
+ dict(type='LoadMultiImagesFromFile'),
124
+ dict(type='SeqLoadAnnotations', with_bbox=True, with_track=True),
125
+ dict(type='SeqResize', img_scale=(1000, 600), keep_ratio=True),
126
+ dict(type='SeqRandomFlip', share_params=True, flip_ratio=0.5),
127
+ dict(
128
+ type='SeqNormalize',
129
+ mean=[123.675, 116.28, 103.53],
130
+ std=[58.395, 57.12, 57.375],
131
+ to_rgb=True),
132
+ dict(type='SeqPad', size_divisor=16),
133
+ dict(
134
+ type='VideoCollect',
135
+ keys=['img', 'gt_bboxes', 'gt_labels', 'gt_instance_ids']),
136
+ dict(type='ConcatVideoReferences'),
137
+ dict(type='SeqDefaultFormatBundle', ref_prefix='ref')
138
+ ]
139
+ test_pipeline = [
140
+ dict(type='LoadMultiImagesFromFile'),
141
+ dict(type='SeqResize', img_scale=(1000, 600), keep_ratio=True),
142
+ dict(type='SeqRandomFlip', share_params=True, flip_ratio=0.0),
143
+ dict(
144
+ type='SeqNormalize',
145
+ mean=[123.675, 116.28, 103.53],
146
+ std=[58.395, 57.12, 57.375],
147
+ to_rgb=True),
148
+ dict(type='SeqPad', size_divisor=16),
149
+ dict(
150
+ type='VideoCollect',
151
+ keys=['img'],
152
+ meta_keys=('num_left_ref_imgs', 'frame_stride')),
153
+ dict(type='ConcatVideoReferences'),
154
+ dict(type='MultiImagesToTensor', ref_prefix='ref'),
155
+ dict(type='ToList')
156
+ ]
157
+ data = dict(
158
+ samples_per_gpu=1,
159
+ workers_per_gpu=4,
160
+ train=[
161
+ dict(
162
+ type='ImagenetVIDDataset',
163
+ ann_file='data/ILSVRC/annotations/imagenet_vid_train.json',
164
+ img_prefix='data/ILSVRC/Data/VID',
165
+ ref_img_sampler=dict(
166
+ num_ref_imgs=2,
167
+ frame_range=1000,
168
+ filter_key_img=True,
169
+ method='bilateral_uniform'),
170
+ pipeline=[
171
+ dict(type='LoadMultiImagesFromFile'),
172
+ dict(
173
+ type='SeqLoadAnnotations', with_bbox=True,
174
+ with_track=True),
175
+ dict(type='SeqResize', img_scale=(1000, 600), keep_ratio=True),
176
+ dict(type='SeqRandomFlip', share_params=True, flip_ratio=0.5),
177
+ dict(
178
+ type='SeqNormalize',
179
+ mean=[123.675, 116.28, 103.53],
180
+ std=[58.395, 57.12, 57.375],
181
+ to_rgb=True),
182
+ dict(type='SeqPad', size_divisor=16),
183
+ dict(
184
+ type='VideoCollect',
185
+ keys=['img', 'gt_bboxes', 'gt_labels', 'gt_instance_ids']),
186
+ dict(type='ConcatVideoReferences'),
187
+ dict(type='SeqDefaultFormatBundle', ref_prefix='ref')
188
+ ]),
189
+ dict(
190
+ type='ImagenetVIDDataset',
191
+ load_as_video=False,
192
+ ann_file='data/ILSVRC/annotations/imagenet_det_30plus1cls.json',
193
+ img_prefix='data/ILSVRC/Data/DET',
194
+ ref_img_sampler=dict(
195
+ num_ref_imgs=2,
196
+ frame_range=0,
197
+ filter_key_img=False,
198
+ method='bilateral_uniform'),
199
+ pipeline=[
200
+ dict(type='LoadMultiImagesFromFile'),
201
+ dict(
202
+ type='SeqLoadAnnotations', with_bbox=True,
203
+ with_track=True),
204
+ dict(type='SeqResize', img_scale=(1000, 600), keep_ratio=True),
205
+ dict(type='SeqRandomFlip', share_params=True, flip_ratio=0.5),
206
+ dict(
207
+ type='SeqNormalize',
208
+ mean=[123.675, 116.28, 103.53],
209
+ std=[58.395, 57.12, 57.375],
210
+ to_rgb=True),
211
+ dict(type='SeqPad', size_divisor=16),
212
+ dict(
213
+ type='VideoCollect',
214
+ keys=['img', 'gt_bboxes', 'gt_labels', 'gt_instance_ids']),
215
+ dict(type='ConcatVideoReferences'),
216
+ dict(type='SeqDefaultFormatBundle', ref_prefix='ref')
217
+ ])
218
+ ],
219
+ val=dict(
220
+ type='ImagenetVIDDataset',
221
+ ann_file='data/ILSVRC/annotations/imagenet_vid_val.json',
222
+ img_prefix='data/ILSVRC/Data/VID',
223
+ ref_img_sampler=dict(
224
+ num_ref_imgs=14,
225
+ frame_range=[-7, 7],
226
+ stride=1,
227
+ method='test_with_adaptive_stride'),
228
+ pipeline=[
229
+ dict(type='LoadMultiImagesFromFile'),
230
+ dict(type='SeqResize', img_scale=(1000, 600), keep_ratio=True),
231
+ dict(type='SeqRandomFlip', share_params=True, flip_ratio=0.0),
232
+ dict(
233
+ type='SeqNormalize',
234
+ mean=[123.675, 116.28, 103.53],
235
+ std=[58.395, 57.12, 57.375],
236
+ to_rgb=True),
237
+ dict(type='SeqPad', size_divisor=16),
238
+ dict(
239
+ type='VideoCollect',
240
+ keys=['img'],
241
+ meta_keys=('num_left_ref_imgs', 'frame_stride')),
242
+ dict(type='ConcatVideoReferences'),
243
+ dict(type='MultiImagesToTensor', ref_prefix='ref'),
244
+ dict(type='ToList')
245
+ ],
246
+ test_mode=True,
247
+ shuffle_video_frames=True),
248
+ test=dict(
249
+ type='ImagenetVIDDataset',
250
+ ann_file='data/ILSVRC/annotations/imagenet_vid_val.json',
251
+ img_prefix='data/ILSVRC/Data/VID',
252
+ ref_img_sampler=dict(
253
+ num_ref_imgs=14,
254
+ frame_range=[-7, 7],
255
+ stride=1,
256
+ method='test_with_adaptive_stride'),
257
+ pipeline=[
258
+ dict(type='LoadMultiImagesFromFile'),
259
+ dict(type='SeqResize', img_scale=(1000, 600), keep_ratio=True),
260
+ dict(type='SeqRandomFlip', share_params=True, flip_ratio=0.0),
261
+ dict(
262
+ type='SeqNormalize',
263
+ mean=[123.675, 116.28, 103.53],
264
+ std=[58.395, 57.12, 57.375],
265
+ to_rgb=True),
266
+ dict(type='SeqPad', size_divisor=16),
267
+ dict(
268
+ type='VideoCollect',
269
+ keys=['img'],
270
+ meta_keys=('num_left_ref_imgs', 'frame_stride')),
271
+ dict(type='ConcatVideoReferences'),
272
+ dict(type='MultiImagesToTensor', ref_prefix='ref'),
273
+ dict(type='ToList')
274
+ ],
275
+ test_mode=True,
276
+ shuffle_video_frames=True))
277
+ checkpoint_config = dict(interval=3)
278
+ log_config = dict(interval=50, hooks=[dict(type='TextLoggerHook')])
279
+ custom_hooks = [dict(type='NumClassCheckHook')]
280
+ dist_params = dict(backend='nccl')
281
+ log_level = 'INFO'
282
+ load_from = None
283
+ resume_from = 'work_dirs/mamba_r101_dc5_6x/epoch_3.pth'
284
+ workflow = [('train', 1)]
285
+ optimizer = dict(type='SGD', lr=0.001, momentum=0.9, weight_decay=0.0001)
286
+ optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))
287
+ lr_config = dict(
288
+ policy='step',
289
+ warmup='linear',
290
+ warmup_iters=500,
291
+ warmup_ratio=0.3333333333333333,
292
+ step=[4])
293
+ runner = dict(type='EpochBasedRunner', max_epochs=6)
294
+ is_video_model = True
295
+ total_epochs = 6
296
+ evaluation = dict(metric=['bbox'], vid_style=True, interval=1)
297
+ work_dir = './work_dirs/mamba_r101_dc5_6x'
298
+ gpu_ids = range(0, 8)