niobures's picture
mmaction2
d3dbf03 verified
# Learn about Configs
We use python files as configs, incorporate modular and inheritance design into our config system, which is convenient to conduct various experiments.
You can find all the provided configs under `$MMAction2/configs`. If you wish to inspect the config file,
you may run `python tools/analysis_tools/print_config.py /PATH/TO/CONFIG` to see the complete config.
<!-- TOC -->
- [Learn about Configs](#learn-about-configs)
- [Modify config through script arguments](#modify-config-through-script-arguments)
- [Config File Structure](#config-file-structure)
- [Config File Naming Convention](#config-file-naming-convention)
- [Config System for Action Recognition](#config-system-for-action-recognition)
- [Config System for Spatio-Temporal Action Detection](#config-system-for-spatio-temporal-action-detection)
- [Config System for Action localization](#config-system-for-action-localization)
<!-- TOC -->
## Modify config through script arguments
When submitting jobs using `tools/train.py` or `tools/test.py`, you may specify `--cfg-options` to in-place modify the config.
- Update config keys of dict.
The config options can be specified following the order of the dict keys in the original config.
For example, `--cfg-options model.backbone.norm_eval=False` changes the all BN modules in model backbones to `train` mode.
- Update keys inside a list of configs.
Some config dicts are composed as a list in your config. For example, the training pipeline `train_pipeline` is normally a list
e.g. `[dict(type='SampleFrames'), ...]`. If you want to change `'SampleFrames'` to `'DenseSampleFrames'` in the pipeline,
you may specify `--cfg-options train_pipeline.0.type=DenseSampleFrames`.
- Update values of list/tuples.
If the value to be updated is a list or a tuple. For example, the config file normally sets `model.data_preprocessor.mean=[123.675, 116.28, 103.53]`. If you want to
change this key, you may specify `--cfg-options model.data_preprocessor.mean="[128,128,128]"`. Note that the quotation mark " is necessary to support list/tuple data types.
## Config File Structure
There are 3 basic component types under `configs/_base_`, models, schedules, default_runtime.
Many methods could be easily constructed with one of each like TSN, I3D, SlowOnly, etc.
The configs that are composed by components from `_base_` are called _primitive_.
For all configs under the same folder, it is recommended to have only **one** _primitive_ config. All other configs should inherit from the _primitive_ config. In this way, the maximum of inheritance level is 3.
For easy understanding, we recommend contributors to inherit from exiting methods.
For example, if some modification is made based on TSN, users may first inherit the basic TSN structure by specifying `_base_ = ../tsn/tsn_imagenet-pretrained-r50_8xb32-1x1x3-100e_kinetics400-rgb.py`, then modify the necessary fields in the config files.
If you are building an entirely new method that does not share the structure with any of the existing methods, you may create a folder under `configs/TASK`.
Please refer to [mmengine](https://mmengine.readthedocs.io/en/latest/tutorials/config.html) for detailed documentation.
## Config File Naming Convention
We follow the style below to name config files. Contributors are advised to follow the same style. The config file names are divided into several parts. Logically, different parts are concatenated by underscores `'_'`, and settings in the same part are concatenated by dashes `'-'`.
```
{algorithm info}_{module info}_{training info}_{data info}.py
```
`{xxx}` is required field and `[yyy]` is optional.
- `{algorithm info}`:
- `{model}`: model type, e.g. `tsn`, `i3d`, `swin`, `vit`, etc.
- `[model setting]`: specific setting for some models, e.g. `base`, `p16`, `w877`, etc.
- `{module info}`:
- `[pretained info]`: pretrained information, e.g. `kinetics400-pretrained`, `in1k-pre`, etc.
- `{backbone}`: backbone type. e.g. `r50` (ResNet-50), etc.
- `[backbone setting]`: specific setting for some backbones, e.g. `nl-dot-product`, `bnfrozen`, `nopool`, etc.
- `{training info}`:
- `{gpu x batch_per_gpu]}`: GPUs and samples per GPU.
- `{pipeline setting}`: frame sample setting, e.g. `dense`, `{clip_len}x{frame_interval}x{num_clips}`, `u48`, etc.
- `{schedule}`: training schedule, e.g. `coslr-20e`.
- `{data info}`:
- `{dataset}`: dataset name, e.g. `kinetics400`, `mmit`, etc.
- `{modality}`: data modality, e.g. `rgb`, `flow`, `keypoint-2d`, etc.
### Config System for Action Recognition
We incorporate modular design into our config system,
which is convenient to conduct various experiments.
- An Example of TSN
To help the users have a basic idea of a complete config structure and the modules in an action recognition system,
we make brief comments on the config of TSN as the following.
For more detailed usage and alternative for per parameter in each module, please refer to the API documentation.
```python
# model settings
model = dict( # Config of the model
type='Recognizer2D', # Class name of the recognizer
backbone=dict( # Dict for backbone
type='ResNet', # Name of the backbone
pretrained='torchvision://resnet50', # The url/site of the pretrained model
depth=50, # Depth of ResNet model
norm_eval=False), # Whether to set BN layers to eval mode when training
cls_head=dict( # Dict for classification head
type='TSNHead', # Name of classification head
num_classes=400, # Number of classes to be classified.
in_channels=2048, # The input channels of classification head.
spatial_type='avg', # Type of pooling in spatial dimension
consensus=dict(type='AvgConsensus', dim=1), # Config of consensus module
dropout_ratio=0.4, # Probability in dropout layer
init_std=0.01, # Std value for linear layer initiation
average_clips='prob'), # Method to average multiple clip results
data_preprocessor=dict( # Dict for data preprocessor
type='ActionDataPreprocessor', # Name of data preprocessor
mean=[123.675, 116.28, 103.53], # Mean values of different channels to normalize
std=[58.395, 57.12, 57.375], # Std values of different channels to normalize
format_shape='NCHW'), # Final image shape format
# model training and testing settings
train_cfg=None, # Config of training hyperparameters for TSN
test_cfg=None) # Config for testing hyperparameters for TSN.
# dataset settings
dataset_type = 'RawframeDataset' # Type of dataset for training, validation and testing
data_root = 'data/kinetics400/rawframes_train/' # Root path to data for training
data_root_val = 'data/kinetics400/rawframes_val/' # Root path to data for validation and testing
ann_file_train = 'data/kinetics400/kinetics400_train_list_rawframes.txt' # Path to the annotation file for training
ann_file_val = 'data/kinetics400/kinetics400_val_list_rawframes.txt' # Path to the annotation file for validation
ann_file_test = 'data/kinetics400/kinetics400_val_list_rawframes.txt' # Path to the annotation file for testing
train_pipeline = [ # Training data processing pipeline
dict( # Config of SampleFrames
type='SampleFrames', # Sample frames pipeline, sampling frames from video
clip_len=1, # Frames of each sampled output clip
frame_interval=1, # Temporal interval of adjacent sampled frames
num_clips=3), # Number of clips to be sampled
dict( # Config of RawFrameDecode
type='RawFrameDecode'), # Load and decode Frames pipeline, picking raw frames with given indices
dict( # Config of Resize
type='Resize', # Resize pipeline
scale=(-1, 256)), # The scale to resize images
dict( # Config of MultiScaleCrop
type='MultiScaleCrop', # Multi scale crop pipeline, cropping images with a list of randomly selected scales
input_size=224, # Input size of the network
scales=(1, 0.875, 0.75, 0.66), # Scales of width and height to be selected
random_crop=False, # Whether to randomly sample cropping bbox
max_wh_scale_gap=1), # Maximum gap of w and h scale levels
dict( # Config of Resize
type='Resize', # Resize pipeline
scale=(224, 224), # The scale to resize images
keep_ratio=False), # Whether to resize with changing the aspect ratio
dict( # Config of Flip
type='Flip', # Flip Pipeline
flip_ratio=0.5), # Probability of implementing flip
dict( # Config of FormatShape
type='FormatShape', # Format shape pipeline, Format final image shape to the given input_format
input_format='NCHW'), # Final image shape format
dict(type='PackActionInputs') # Config of PackActionInputs
]
val_pipeline = [ # Validation data processing pipeline
dict( # Config of SampleFrames
type='SampleFrames', # Sample frames pipeline, sampling frames from video
clip_len=1, # Frames of each sampled output clip
frame_interval=1, # Temporal interval of adjacent sampled frames
num_clips=3, # Number of clips to be sampled
test_mode=True), # Whether to set test mode in sampling
dict( # Config of RawFrameDecode
type='RawFrameDecode'), # Load and decode Frames pipeline, picking raw frames with given indices
dict( # Config of Resize
type='Resize', # Resize pipeline
scale=(-1, 256)), # The scale to resize images
dict( # Config of CenterCrop
type='CenterCrop', # Center crop pipeline, cropping the center area from images
crop_size=224), # The size to crop images
dict( # Config of Flip
type='Flip', # Flip pipeline
flip_ratio=0), # Probability of implementing flip
dict( # Config of FormatShape
type='FormatShape', # Format shape pipeline, Format final image shape to the given input_format
input_format='NCHW'), # Final image shape format
dict(type='PackActionInputs') # Config of PackActionInputs
]
test_pipeline = [ # Testing data processing pipeline
dict( # Config of SampleFrames
type='SampleFrames', # Sample frames pipeline, sampling frames from video
clip_len=1, # Frames of each sampled output clip
frame_interval=1, # Temporal interval of adjacent sampled frames
num_clips=25, # Number of clips to be sampled
test_mode=True), # Whether to set test mode in sampling
dict( # Config of RawFrameDecode
type='RawFrameDecode'), # Load and decode Frames pipeline, picking raw frames with given indices
dict( # Config of Resize
type='Resize', # Resize pipeline
scale=(-1, 256)), # The scale to resize images
dict( # Config of TenCrop
type='TenCrop', # Ten crop pipeline, cropping ten area from images
crop_size=224), # The size to crop images
dict( # Config of Flip
type='Flip', # Flip pipeline
flip_ratio=0), # Probability of implementing flip
dict( # Config of FormatShape
type='FormatShape', # Format shape pipeline, Format final image shape to the given input_format
input_format='NCHW'), # Final image shape format
dict(type='PackActionInputs') # Config of PackActionInputs
]
train_dataloader = dict( # Config of train dataloader
batch_size=32, # Batch size of each single GPU during training
num_workers=8, # Workers to pre-fetch data for each single GPU during training
persistent_workers=True, # If `True`, the dataloader will not shut down the worker processes after an epoch end, which can accelerate training speed
sampler=dict(
type='DefaultSampler', # DefaultSampler which supports both distributed and non-distributed training. Refer to https://github.com/open-mmlab/mmengine/blob/main/mmengine/dataset/sampler.py
shuffle=True), # Randomly shuffle the training data in each epoch
dataset=dict( # Config of train dataset
type=dataset_type,
ann_file=ann_file_train, # Path of annotation file
data_prefix=dict(img=data_root), # Prefix of frame path
pipeline=train_pipeline))
val_dataloader = dict( # Config of validation dataloader
batch_size=1, # Batch size of each single GPU during validation
num_workers=8, # Workers to pre-fetch data for each single GPU during validation
persistent_workers=True, # If `True`, the dataloader will not shut down the worker processes after an epoch end
sampler=dict(
type='DefaultSampler',
shuffle=False), # Not shuffle during validation and testing
dataset=dict( # Config of validation dataset
type=dataset_type,
ann_file=ann_file_val, # Path of annotation file
data_prefix=dict(img=data_root_val), # Prefix of frame path
pipeline=val_pipeline,
test_mode=True))
test_dataloader = dict( # Config of test dataloader
batch_size=32, # Batch size of each single GPU during testing
num_workers=8, # Workers to pre-fetch data for each single GPU during testing
persistent_workers=True, # If `True`, the dataloader will not shut down the worker processes after an epoch end
sampler=dict(
type='DefaultSampler',
shuffle=False), # Not shuffle during validation and testing
dataset=dict( # Config of test dataset
type=dataset_type,
ann_file=ann_file_val, # Path of annotation file
data_prefix=dict(img=data_root_val), # Prefix of frame path
pipeline=test_pipeline,
test_mode=True))
# evaluation settings
val_evaluator = dict(type='AccMetric') # Config of validation evaluator
test_evaluator = val_evaluator # Config of testing evaluator
train_cfg = dict( # Config of training loop
type='EpochBasedTrainLoop', # Name of training loop
max_epochs=100, # Total training epochs
val_begin=1, # The epoch that begins validating
val_interval=1) # Validation interval
val_cfg = dict( # Config of validation loop
type='ValLoop') # Name of validation loop
test_cfg = dict( # Config of testing loop
type='TestLoop') # Name of testing loop
# learning policy
param_scheduler = [ # Parameter scheduler for updating optimizer parameters, support dict or list
dict(type='MultiStepLR', # Decays the learning rate once the number of epoch reaches one of the milestones
begin=0, # Step at which to start updating the learning rate
end=100, # Step at which to stop updating the learning rate
by_epoch=True, # Whether the scheduled learning rate is updated by epochs
milestones=[40, 80], # Steps to decay the learning rate
gamma=0.1)] # Multiplicative factor of learning rate decay
# optimizer
optim_wrapper = dict( # Config of optimizer wrapper
type='OptimWrapper', # Name of optimizer wrapper, switch to AmpOptimWrapper to enable mixed precision training
optimizer=dict( # Config of optimizer. Support all kinds of optimizers in PyTorch. Refer to https://pytorch.org/docs/stable/optim.html#algorithms
type='SGD', # Name of optimizer
lr=0.01, # Learning rate
momentum=0.9, # Momentum factor
weight_decay=0.0001), # Weight decay
clip_grad=dict(max_norm=40, norm_type=2)) # Config of gradient clip
# runtime settings
default_scope = 'mmaction' # The default registry scope to find modules. Refer to https://mmengine.readthedocs.io/en/latest/tutorials/registry.html
default_hooks = dict( # Hooks to execute default actions like updating model parameters and saving checkpoints.
runtime_info=dict(type='RuntimeInfoHook'), # The hook to updates runtime information into message hub
timer=dict(type='IterTimerHook'), # The logger used to record time spent during iteration
logger=dict(
type='LoggerHook', # The logger used to record logs during training/validation/testing phase
interval=20, # Interval to print the log
ignore_last=False), # Ignore the log of last iterations in each epoch
param_scheduler=dict(type='ParamSchedulerHook'), # The hook to update some hyper-parameters in optimizer
checkpoint=dict(
type='CheckpointHook', # The hook to save checkpoints periodically
interval=3, # The saving period
save_best='auto', # Specified metric to mearsure the best checkpoint during evaluation
max_keep_ckpts=3), # The maximum checkpoints to keep
sampler_seed=dict(type='DistSamplerSeedHook'), # Data-loading sampler for distributed training
sync_buffers=dict(type='SyncBuffersHook')) # Synchronize model buffers at the end of each epoch
env_cfg = dict( # Dict for setting environment
cudnn_benchmark=False, # Whether to enable cudnn benchmark
mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0), # Parameters to setup multiprocessing
dist_cfg=dict(backend='nccl')) # Parameters to setup distributed environment, the port can also be set
log_processor = dict(
type='LogProcessor', # Log processor used to format log information
window_size=20, # Default smooth interval
by_epoch=True) # Whether to format logs with epoch type
vis_backends = [ # List of visualization backends
dict(type='LocalVisBackend')] # Local visualization backend
visualizer = dict( # Config of visualizer
type='ActionVisualizer', # Name of visualizer
vis_backends=vis_backends)
log_level = 'INFO' # The level of logging
load_from = None # Load model checkpoint as a pre-trained model from a given path. This will not resume training.
resume = False # Whether to resume from the checkpoint defined in `load_from`. If `load_from` is None, it will resume the latest checkpoint in the `work_dir`.
```
### Config System for Spatio-Temporal Action Detection
We incorporate modular design into our config system, which is convenient to conduct various experiments.
- An Example of FastRCNN
To help the users have a basic idea of a complete config structure and the modules in a spatio-temporal action detection system,
we make brief comments on the config of FastRCNN as the following.
For more detailed usage and alternative for per parameter in each module, please refer to the API documentation.
```python
# model setting
model = dict( # Config of the model
type='FastRCNN', # Class name of the detector
_scope_='mmdet', # The scope of current config
backbone=dict( # Dict for backbone
type='ResNet3dSlowOnly', # Name of the backbone
depth=50, # Depth of ResNet model
pretrained=None, # The url/site of the pretrained model
pretrained2d=False, # If the pretrained model is 2D
lateral=False, # If the backbone is with lateral connections
num_stages=4, # Stages of ResNet model
conv1_kernel=(1, 7, 7), # Conv1 kernel size
conv1_stride_t=1, # Conv1 temporal stride
pool1_stride_t=1, # Pool1 temporal stride
spatial_strides=(1, 2, 2, 1)), # The spatial stride for each ResNet stage
roi_head=dict( # Dict for roi_head
type='AVARoIHead', # Name of the roi_head
bbox_roi_extractor=dict( # Dict for bbox_roi_extractor
type='SingleRoIExtractor3D', # Name of the bbox_roi_extractor
roi_layer_type='RoIAlign', # Type of the RoI op
output_size=8, # Output feature size of the RoI op
with_temporal_pool=True), # If temporal dim is pooled
bbox_head=dict( # Dict for bbox_head
type='BBoxHeadAVA', # Name of the bbox_head
in_channels=2048, # Number of channels of the input feature
num_classes=81, # Number of action classes + 1
multilabel=True, # If the dataset is multilabel
dropout_ratio=0.5), # The dropout ratio used
data_preprocessor=dict( # Dict for data preprocessor
type='ActionDataPreprocessor', # Name of data preprocessor
mean=[123.675, 116.28, 103.53], # Mean values of different channels to normalize
std=[58.395, 57.12, 57.375], # Std values of different channels to normalize
format_shape='NCHW')), # Final image shape format
# model training and testing settings
train_cfg=dict( # Training config of FastRCNN
rcnn=dict( # Dict for rcnn training config
assigner=dict( # Dict for assigner
type='MaxIoUAssignerAVA', # Name of the assigner
pos_iou_thr=0.9, # IoU threshold for positive examples, > pos_iou_thr -> positive
neg_iou_thr=0.9, # IoU threshold for negative examples, < neg_iou_thr -> negative
min_pos_iou=0.9), # Minimum acceptable IoU for positive examples
sampler=dict( # Dict for sample
type='RandomSampler', # Name of the sampler
num=32, # Batch Size of the sampler
pos_fraction=1, # Positive bbox fraction of the sampler
neg_pos_ub=-1, # Upper bound of the ratio of num negative to num positive
add_gt_as_proposals=True), # Add gt bboxes as proposals
pos_weight=1.0)), # Loss weight of positive examples
test_cfg=dict(rcnn=None)) # Testing config of FastRCNN
# dataset settings
dataset_type = 'AVADataset' # Type of dataset for training, validation and testing
data_root = 'data/ava/rawframes' # Root path to data
anno_root = 'data/ava/annotations' # Root path to annotations
ann_file_train = f'{anno_root}/ava_train_v2.1.csv' # Path to the annotation file for training
ann_file_val = f'{anno_root}/ava_val_v2.1.csv' # Path to the annotation file for validation
exclude_file_train = f'{anno_root}/ava_train_excluded_timestamps_v2.1.csv' # Path to the exclude annotation file for training
exclude_file_val = f'{anno_root}/ava_val_excluded_timestamps_v2.1.csv' # Path to the exclude annotation file for validation
label_file = f'{anno_root}/ava_action_list_v2.1_for_activitynet_2018.pbtxt' # Path to the label file
proposal_file_train = f'{anno_root}/ava_dense_proposals_train.FAIR.recall_93.9.pkl' # Path to the human detection proposals for training examples
proposal_file_val = f'{anno_root}/ava_dense_proposals_val.FAIR.recall_93.9.pkl' # Path to the human detection proposals for validation examples
train_pipeline = [ # Training data processing pipeline
dict( # Config of SampleFrames
type='AVASampleFrames', # Sample frames pipeline, sampling frames from video
clip_len=4, # Frames of each sampled output clip
frame_interval=16), # Temporal interval of adjacent sampled frames
dict( # Config of RawFrameDecode
type='RawFrameDecode'), # Load and decode Frames pipeline, picking raw frames with given indices
dict( # Config of RandomRescale
type='RandomRescale', # Randomly rescale the shortedge by a given range
scale_range=(256, 320)), # The shortedge size range of RandomRescale
dict( # Config of RandomCrop
type='RandomCrop', # Randomly crop a patch with the given size
size=256), # The size of the cropped patch
dict( # Config of Flip
type='Flip', # Flip Pipeline
flip_ratio=0.5), # Probability of implementing flip
dict( # Config of FormatShape
type='FormatShape', # Format shape pipeline, Format final image shape to the given input_format
input_format='NCTHW', # Final image shape format
collapse=True), # Collapse the dim N if N == 1
dict(type='PackActionInputs') # Pack input data
]
val_pipeline = [ # Validation data processing pipeline
dict( # Config of SampleFrames
type='AVASampleFrames', # Sample frames pipeline, sampling frames from video
clip_len=4, # Frames of each sampled output clip
frame_interval=16), # Temporal interval of adjacent sampled frames
dict( # Config of RawFrameDecode
type='RawFrameDecode'), # Load and decode Frames pipeline, picking raw frames with given indices
dict( # Config of Resize
type='Resize', # Resize pipeline
scale=(-1, 256)), # The scale to resize images
dict( # Config of FormatShape
type='FormatShape', # Format shape pipeline, Format final image shape to the given input_format
input_format='NCTHW', # Final image shape format
collapse=True), # Collapse the dim N if N == 1
dict(type='PackActionInputs') # Pack input data
]
train_dataloader = dict( # Config of train dataloader
batch_size=32, # Batch size of each single GPU during training
num_workers=8, # Workers to pre-fetch data for each single GPU during training
persistent_workers=True, # If `True`, the dataloader will not shut down the worker processes after an epoch end, which can accelerate training speed
sampler=dict(
type='DefaultSampler', # DefaultSampler which supports both distributed and non-distributed training. Refer to https://github.com/open-mmlab/mmengine/blob/main/mmengine/dataset/sampler.py
shuffle=True), # Randomly shuffle the training data in each epoch
dataset=dict( # Config of train dataset
type=dataset_type,
ann_file=ann_file_train, # Path of annotation file
exclude_file=exclude_file_train, # Path of exclude annotation file
label_file=label_file, # Path of label file
data_prefix=dict(img=data_root), # Prefix of frame path
proposal_file=proposal_file_train, # Path of human detection proposals
pipeline=train_pipeline))
val_dataloader = dict( # Config of validation dataloader
batch_size=1, # Batch size of each single GPU during evaluation
num_workers=8, # Workers to pre-fetch data for each single GPU during evaluation
persistent_workers=True, # If `True`, the dataloader will not shut down the worker processes after an epoch end
sampler=dict(
type='DefaultSampler',
shuffle=False), # Not shuffle during validation and testing
dataset=dict( # Config of validation dataset
type=dataset_type,
ann_file=ann_file_val, # Path of annotation file
exclude_file=exclude_file_val, # Path of exclude annotation file
label_file=label_file, # Path of label file
data_prefix=dict(img=data_root_val), # Prefix of frame path
proposal_file=proposal_file_val, # Path of human detection proposals
pipeline=val_pipeline,
test_mode=True))
test_dataloader = val_dataloader # Config of testing dataloader
# evaluation settings
val_evaluator = dict( # Config of validation evaluator
type='AVAMetric',
ann_file=ann_file_val,
label_file=label_file,
exclude_file=exclude_file_val)
test_evaluator = val_evaluator # Config of testing evaluator
train_cfg = dict( # Config of training loop
type='EpochBasedTrainLoop', # Name of training loop
max_epochs=20, # Total training epochs
val_begin=1, # The epoch that begins validating
val_interval=1) # Validation interval
val_cfg = dict( # Config of validation loop
type='ValLoop') # Name of validation loop
test_cfg = dict( # Config of testing loop
type='TestLoop') # Name of testing loop
# learning policy
param_scheduler = [ # Parameter scheduler for updating optimizer parameters, support dict or list
dict(type='LinearLR', # Decays the learning rate of each parameter group by linearly changing small multiplicative factor
start_factor=0.1, # The number we multiply learning rate in the first epoch
by_epoch=True, # Whether the scheduled learning rate is updated by epochs
begin=0, # Step at which to start updating the learning rate
end=5), # Step at which to stop updating the learning rate
dict(type='MultiStepLR', # Decays the learning rate once the number of epoch reaches one of the milestones
begin=0, # Step at which to start updating the learning rate
end=20, # Step at which to stop updating the learning rate
by_epoch=True, # Whether the scheduled learning rate is updated by epochs
milestones=[10, 15], # Steps to decay the learning rate
gamma=0.1)] # Multiplicative factor of learning rate decay
# optimizer
optim_wrapper = dict( # Config of optimizer wrapper
type='OptimWrapper', # Name of optimizer wrapper, switch to AmpOptimWrapper to enable mixed precision training
optimizer=dict( # Config of optimizer. Support all kinds of optimizers in PyTorch. Refer to https://pytorch.org/docs/stable/optim.html#algorithms
type='SGD', # Name of optimizer
lr=0.2, # Learning rate
momentum=0.9, # Momentum factor
weight_decay=0.0001), # Weight decay
clip_grad=dict(max_norm=40, norm_type=2)) # Config of gradient clip
# runtime settings
default_scope = 'mmaction' # The default registry scope to find modules. Refer to https://mmengine.readthedocs.io/en/latest/tutorials/registry.html
default_hooks = dict( # Hooks to execute default actions like updating model parameters and saving checkpoints.
runtime_info=dict(type='RuntimeInfoHook'), # The hook to updates runtime information into message hub
timer=dict(type='IterTimerHook'), # The logger used to record time spent during iteration
logger=dict(
type='LoggerHook', # The logger used to record logs during training/validation/testing phase
interval=20, # Interval to print the log
ignore_last=False), # Ignore the log of last iterations in each epoch
param_scheduler=dict(type='ParamSchedulerHook'), # The hook to update some hyper-parameters in optimizer
checkpoint=dict(
type='CheckpointHook', # The hook to save checkpoints periodically
interval=3, # The saving period
save_best='auto', # Specified metric to mearsure the best checkpoint during evaluation
max_keep_ckpts=3), # The maximum checkpoints to keep
sampler_seed=dict(type='DistSamplerSeedHook'), # Data-loading sampler for distributed training
sync_buffers=dict(type='SyncBuffersHook')) # Synchronize model buffers at the end of each epoch
env_cfg = dict( # Dict for setting environment
cudnn_benchmark=False, # Whether to enable cudnn benchmark
mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0), # Parameters to setup multiprocessing
dist_cfg=dict(backend='nccl')) # Parameters to setup distributed environment, the port can also be set
log_processor = dict(
type='LogProcessor', # Log processor used to format log information
window_size=20, # Default smooth interval
by_epoch=True) # Whether to format logs with epoch type
vis_backends = [ # List of visualization backends
dict(type='LocalVisBackend')] # Local visualization backend
visualizer = dict( # Config of visualizer
type='ActionVisualizer', # Name of visualizer
vis_backends=vis_backends)
log_level = 'INFO' # The level of logging
load_from = ('https://download.openmmlab.com/mmaction/v1.0/recognition/slowonly/'
'slowonly_imagenet-pretrained-r50_8xb16-4x16x1-steplr-150e_kinetics400-rgb/'
'slowonly_imagenet-pretrained-r50_8xb16-4x16x1-steplr-150e_kinetics400-rgb_20220901-e7b65fad.pth') # Load model checkpoint as a pre-trained model from a given path. This will not resume training.
resume = False # Whether to resume from the checkpoint defined in `load_from`. If `load_from` is None, it will resume the latest checkpoint in the `work_dir`.
```
### Config System for Action localization
We incorporate modular design into our config system,
which is convenient to conduct various experiments.
- An Example of BMN
To help the users have a basic idea of a complete config structure and the modules in an action localization system,
we make brief comments on the config of BMN as the following.
For more detailed usage and alternative for per parameter in each module, please refer to the [API documentation](https://mmaction2.readthedocs.io/en/latest/api.html).
```python
# model settings
model = dict( # Config of the model
type='BMN', # Class name of the localizer
temporal_dim=100, # Total frames selected for each video
boundary_ratio=0.5, # Ratio for determining video boundaries
num_samples=32, # Number of samples for each proposal
num_samples_per_bin=3, # Number of bin samples for each sample
feat_dim=400, # Dimension of feature
soft_nms_alpha=0.4, # Soft NMS alpha
soft_nms_low_threshold=0.5, # Soft NMS low threshold
soft_nms_high_threshold=0.9, # Soft NMS high threshold
post_process_top_k=100) # Top k proposals in post process
# dataset settings
dataset_type = 'ActivityNetDataset' # Type of dataset for training, validation and testing
data_root = 'data/activitynet_feature_cuhk/csv_mean_100/' # Root path to data for training
data_root_val = 'data/activitynet_feature_cuhk/csv_mean_100/' # Root path to data for validation and testing
ann_file_train = 'data/ActivityNet/anet_anno_train.json' # Path to the annotation file for training
ann_file_val = 'data/ActivityNet/anet_anno_val.json' # Path to the annotation file for validation
ann_file_test = 'data/ActivityNet/anet_anno_test.json' # Path to the annotation file for testing
train_pipeline = [ # Training data processing pipeline
dict(type='LoadLocalizationFeature'), # Load localization feature pipeline
dict(type='GenerateLocalizationLabels'), # Generate localization labels pipeline
dict(
type='PackLocalizationInputs', # Pack localization data
keys=('gt_bbox'), # Keys of input
meta_keys=('video_name'))] # Meta keys of input
val_pipeline = [ # Validation data processing pipeline
dict(type='LoadLocalizationFeature'), # Load localization feature pipeline
dict(type='GenerateLocalizationLabels'), # Generate localization labels pipeline
dict(
type='PackLocalizationInputs', # Pack localization data
keys=('gt_bbox'), # Keys of input
meta_keys=('video_name', 'duration_second', 'duration_frame',
'annotations', 'feature_frame'))] # Meta keys of input
test_pipeline = [ # Testing data processing pipeline
dict(type='LoadLocalizationFeature'), # Load localization feature pipeline
dict(
type='PackLocalizationInputs', # Pack localization data
keys=('gt_bbox'), # Keys of input
meta_keys=('video_name', 'duration_second', 'duration_frame',
'annotations', 'feature_frame'))] # Meta keys of input
train_dataloader = dict( # Config of train dataloader
batch_size=8, # Batch size of each single GPU during training
num_workers=8, # Workers to pre-fetch data for each single GPU during training
persistent_workers=True, # If `True`, the dataloader will not shut down the worker processes after an epoch end, which can accelerate training speed
sampler=dict(
type='DefaultSampler', # DefaultSampler which supports both distributed and non-distributed training. Refer to https://github.com/open-mmlab/mmengine/blob/main/mmengine/dataset/sampler.py
shuffle=True), # Randomly shuffle the training data in each epoch
dataset=dict( # Config of train dataset
type=dataset_type,
ann_file=ann_file_train, # Path of annotation file
data_prefix=dict(video=data_root), # Prefix of video path
pipeline=train_pipeline))
val_dataloader = dict( # Config of validation dataloader
batch_size=1, # Batch size of each single GPU during evaluation
num_workers=8, # Workers to pre-fetch data for each single GPU during evaluation
persistent_workers=True, # If `True`, the dataloader will not shut down the worker processes after an epoch end
sampler=dict(
type='DefaultSampler',
shuffle=False), # Not shuffle during validation and testing
dataset=dict( # Config of validation dataset
type=dataset_type,
ann_file=ann_file_val, # Path of annotation file
data_prefix=dict(video=data_root_val), # Prefix of video path
pipeline=val_pipeline,
test_mode=True))
test_dataloader = dict( # Config of test dataloader
batch_size=1, # Batch size of each single GPU during testing
num_workers=8, # Workers to pre-fetch data for each single GPU during testing
persistent_workers=True, # If `True`, the dataloader will not shut down the worker processes after an epoch end
sampler=dict(
type='DefaultSampler',
shuffle=False), # Not shuffle during validation and testing
dataset=dict( # Config of test dataset
type=dataset_type,
ann_file=ann_file_val, # Path of annotation file
data_prefix=dict(video=data_root_val), # Prefix of video path
pipeline=test_pipeline,
test_mode=True))
# evaluation settings
work_dir = './work_dirs/bmn_400x100_2x8_9e_activitynet_feature/' # Directory to save the model checkpoints and logs for the current experiments
val_evaluator = dict(
type='ANetMetric',
metric_type='AR@AN',
dump_config=dict( # Config of localization output
out=f'{work_dir}/results.json', # Path to the output file
output_format='json')) # File format of the output file
test_evaluator = val_evaluator # Set test_evaluator as val_evaluator
max_epochs = 9 # Total epochs to train the model
train_cfg = dict( # Config of training loop
type='EpochBasedTrainLoop', # Name of training loop
max_epochs=max_epochs, # Total training epochs
val_begin=1, # The epoch that begins validating
val_interval=1) # Validation interval
val_cfg = dict( # Config of validation loop
type='ValLoop') # Name of validating loop
test_cfg = dict( # Config of testing loop
type='TestLoop') # Name of testing loop
# learning policy
param_scheduler = [ # Parameter scheduler for updating optimizer parameters, support dict or list
dict(type='MultiStepLR', # Decays the learning rate once the number of epoch reaches one of the milestones
begin=0, # Step at which to start updating the learning rate
end=max_epochs, # Step at which to stop updating the learning rate
by_epoch=True, # Whether the scheduled learning rate is updated by epochs
milestones=[7, ], # Steps to decay the learning rate
gamma=0.1)] # Multiplicative factor of parameter value decay
# optimizer
optim_wrapper = dict( # Config of optimizer wrapper
type='OptimWrapper', # Name of optimizer wrapper, switch to AmpOptimWrapper to enable mixed precision training
optimizer=dict( # Config of optimizer. Support all kinds of optimizers in PyTorch. Refer to https://pytorch.org/docs/stable/optim.html#algorithms
type='Adam', # Name of optimizer
lr=0.001, # Learning rate
weight_decay=0.0001), # Weight decay
clip_grad=dict(max_norm=40, norm_type=2)) # Config of gradient clip
# runtime settings
default_scope = 'mmaction' # The default registry scope to find modules. Refer to https://mmengine.readthedocs.io/en/latest/tutorials/registry.html
default_hooks = dict( # Hooks to execute default actions like updating model parameters and saving checkpoints.
runtime_info=dict(type='RuntimeInfoHook'), # The hook to updates runtime information into message hub
timer=dict(type='IterTimerHook'), # The logger used to record time spent during iteration
logger=dict(
type='LoggerHook', # The logger used to record logs during training/validation/testing phase
interval=20, # Interval to print the log
ignore_last=False), # Ignore the log of last iterations in each epoch
param_scheduler=dict(type='ParamSchedulerHook'), # The hook to update some hyper-parameters in optimizer
checkpoint=dict(
type='CheckpointHook', # The hook to save checkpoints periodically
interval=3, # The saving period
save_best='auto', # Specified metric to mearsure the best checkpoint during evaluation
max_keep_ckpts=3), # The maximum checkpoints to keep
sampler_seed=dict(type='DistSamplerSeedHook'), # Data-loading sampler for distributed training
sync_buffers=dict(type='SyncBuffersHook')) # Synchronize model buffers at the end of each epoch
env_cfg = dict( # Dict for setting environment
cudnn_benchmark=False, # Whether to enable cudnn benchmark
mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0), # Parameters to setup multiprocessing
dist_cfg=dict(backend='nccl')) # Parameters to setup distributed environment, the port can also be set
log_processor = dict(
type='LogProcessor', # Log processor used to format log information
window_size=20, # Default smooth interval
by_epoch=True) # Whether to format logs with epoch type
vis_backends = [ # List of visualization backends
dict(type='LocalVisBackend')] # Local visualization backend
visualizer = dict( # Config of visualizer
type='ActionVisualizer', # Name of visualizer
vis_backends=vis_backends)
log_level = 'INFO' # The level of logging
load_from = None # Load model checkpoint as a pre-trained model from a given path. This will not resume training.
resume = False # Whether to resume from the checkpoint defined in `load_from`. If `load_from` is None, it will resume the latest checkpoint in the `work_dir`.
```