test2 / docs /tutorials /config.md

mccaly

Upload 660 files

b13b124 about 1 year ago

preview code

raw

history blame

No virus

20.2 kB

	# Tutorial 1: Learn about Configs

	We incorporate modular and inheritance design into our config system, which is convenient to conduct various experiments.
	If you wish to inspect the config file, you may run `python tools/print_config.py /PATH/TO/CONFIG` to see the complete config.
	You may also pass `--options xxx.yyy=zzz` to see updated config.

	## Config File Structure

	There are 4 basic component types under `config/_base_`, dataset, model, schedule, default_runtime.
	Many methods could be easily constructed with one of each like DeepLabV3, PSPNet.
	The configs that are composed by components from `_base_` are called _primitive_.

	For all configs under the same folder, it is recommended to have only one _primitive_ config. All other configs should inherit from the _primitive_ config. In this way, the maximum of inheritance level is 3.

	For easy understanding, we recommend contributors to inherit from exiting methods.
	For example, if some modification is made base on DeepLabV3, user may first inherit the basic DeepLabV3 structure by specifying `_base_ = ../deeplabv3/deeplabv3_r50_512x1024_40ki_cityscapes.py`, then modify the necessary fields in the config files.

	If you are building an entirely new method that does not share the structure with any of the existing methods, you may create a folder `xxxnet` under `configs`,

	Please refer to [mmcv](https://mmcv.readthedocs.io/en/latest/utils.html#config) for detailed documentation.

	## Config Name Style

	We follow the below style to name config files. Contributors are advised to follow the same style.

	```
	{model}_{backbone}_[misc]_[gpu x batch_per_gpu]_{resolution}_{schedule}_{dataset}
	```

	`{xxx}` is required field and `[yyy]` is optional.

	- `{model}`: model type like `psp`, `deeplabv3`, etc.
	- `{backbone}`: backbone type like `r50` (ResNet-50), `x101` (ResNeXt-101).
	- `[misc]`: miscellaneous setting/plugins of model, e.g. `dconv`, `gcb`, `attention`, `mstrain`.
	- `[gpu x batch_per_gpu]`: GPUs and samples per GPU, `8x2` is used by default.
	- `{schedule}`: training schedule, `20ki` means 20k iterations.
	- `{dataset}`: dataset like `cityscapes`, `voc12aug`, `ade`.

	## An Example of PSPNet

	To help the users have a basic idea of a complete config and the modules in a modern semantic segmentation system,
	we make brief comments on the config of PSPNet using ResNet50V1c as the following.
	For more detailed usage and the corresponding alternative for each modules, please refer to the API documentation.

	```python
	norm_cfg = dict(type='SyncBN', requires_grad=True) # Segmentation usually uses SyncBN
	model = dict(
	type='EncoderDecoder', # Name of segmentor
	pretrained='open-mmlab://resnet50_v1c', # The ImageNet pretrained backbone to be loaded
	backbone=dict(
	type='ResNetV1c', # The type of backbone. Please refer to mmseg/backbone/resnet.py for details.
	depth=50, # Depth of backbone. Normally 50, 101 are used.
	num_stages=4, # Number of stages of backbone.
	out_indices=(0, 1, 2, 3), # The index of output feature maps produced in each stages.
	dilations=(1, 1, 2, 4), # The dilation rate of each layer.
	strides=(1, 2, 1, 1), # The stride of each layer.
	norm_cfg=dict( # The configuration of norm layer.
	type='SyncBN', # Type of norm layer. Usually it is SyncBN.
	requires_grad=True), # Whether to train the gamma and beta in norm
	norm_eval=False, # Whether to freeze the statistics in BN
	style='pytorch', # The style of backbone, 'pytorch' means that stride 2 layers are in 3x3 conv, 'caffe' means stride 2 layers are in 1x1 convs.
	contract_dilation=True), # When dilation > 1, whether contract first layer of dilation.
	decode_head=dict(
	type='PSPHead', # Type of decode head. Please refer to mmseg/models/decode_heads for available options.
	in_channels=2048, # Input channel of decode head.
	in_index=3, # The index of feature map to select.
	channels=512, # The intermediate channels of decode head.
	pool_scales=(1, 2, 3, 6), # The avg pooling scales of PSPHead. Please refer to paper for details.
	dropout_ratio=0.1, # The dropout ratio before final classification layer.
	num_classes=19, # Number of segmentation classs. Usually 19 for cityscapes, 21 for VOC, 150 for ADE20k.
	norm_cfg=dict(type='SyncBN', requires_grad=True), # The configuration of norm layer.
	align_corners=False, # The align_corners argument for resize in decoding.
	loss_decode=dict( # Config of loss function for the decode_head.
	type='CrossEntropyLoss', # Type of loss used for segmentation.
	use_sigmoid=False, # Whether use sigmoid activation for segmentation.
	loss_weight=1.0)), # Loss weight of decode head.
	auxiliary_head=dict(
	type='FCNHead', # Type of auxiliary head. Please refer to mmseg/models/decode_heads for available options.
	in_channels=1024, # Input channel of auxiliary head.
	in_index=2, # The index of feature map to select.
	channels=256, # The intermediate channels of decode head.
	num_convs=1, # Number of convs in FCNHead. It is usually 1 in auxiliary head.
	concat_input=False, # Whether concat output of convs with input before classification layer.
	dropout_ratio=0.1, # The dropout ratio before final classification layer.
	num_classes=19, # Number of segmentation classs. Usually 19 for cityscapes, 21 for VOC, 150 for ADE20k.
	norm_cfg=dict(type='SyncBN', requires_grad=True), # The configuration of norm layer.
	align_corners=False, # The align_corners argument for resize in decoding.
	loss_decode=dict( # Config of loss function for the decode_head.
	type='CrossEntropyLoss', # Type of loss used for segmentation.
	use_sigmoid=False, # Whether use sigmoid activation for segmentation.
	loss_weight=0.4))) # Loss weight of auxiliary head, which is usually 0.4 of decode head.
	train_cfg = dict() # train_cfg is just a place holder for now.
	test_cfg = dict(mode='whole') # The test mode, options are 'whole' and 'sliding'. 'whole': whole image fully-convolutional test. 'sliding': sliding crop window on the image.
	dataset_type = 'CityscapesDataset' # Dataset type, this will be used to define the dataset.
	data_root = 'data/cityscapes/' # Root path of data.
	img_norm_cfg = dict( # Image normalization config to normalize the input images.
	mean=[123.675, 116.28, 103.53], # Mean values used to pre-training the pre-trained backbone models.
	std=[58.395, 57.12, 57.375], # Standard variance used to pre-training the pre-trained backbone models.
	to_rgb=True) # The channel orders of image used to pre-training the pre-trained backbone models.
	crop_size = (512, 1024) # The crop size during training.
	train_pipeline = [ # Training pipeline.
	dict(type='LoadImageFromFile'), # First pipeline to load images from file path.
	dict(type='LoadAnnotations'), # Second pipeline to load annotations for current image.
	dict(type='Resize', # Augmentation pipeline that resize the images and their annotations.
	img_scale=(2048, 1024), # The largest scale of image.
	ratio_range=(0.5, 2.0)), # The augmented scale range as ratio.
	dict(type='RandomCrop', # Augmentation pipeline that randomly crop a patch from current image.
	crop_size=(512, 1024), # The crop size of patch.
	cat_max_ratio=0.75), # The max area ratio that could be occupied by single category.
	dict(
	type='RandomFlip', # Augmentation pipeline that flip the images and their annotations
	flip_ratio=0.5), # The ratio or probability to flip
	dict(type='PhotoMetricDistortion'), # Augmentation pipeline that distort current image with several photo metric methods.
	dict(
	type='Normalize', # Augmentation pipeline that normalize the input images
	mean=[123.675, 116.28, 103.53], # These keys are the same of img_norm_cfg since the
	std=[58.395, 57.12, 57.375], # keys of img_norm_cfg are used here as arguments
	to_rgb=True),
	dict(type='Pad', # Augmentation pipeline that pad the image to specified size.
	size=(512, 1024), # The output size of padding.
	pad_val=0, # The padding value for image.
	seg_pad_val=255), # The padding value of 'gt_semantic_seg'.
	dict(type='DefaultFormatBundle'), # Default format bundle to gather data in the pipeline
	dict(type='Collect', # Pipeline that decides which keys in the data should be passed to the segmentor
	keys=['img', 'gt_semantic_seg'])
	]
	test_pipeline = [
	dict(type='LoadImageFromFile'), # First pipeline to load images from file path
	dict(
	type='MultiScaleFlipAug', # An encapsulation that encapsulates the test time augmentations
	img_scale=(2048, 1024), # Decides the largest scale for testing, used for the Resize pipeline
	flip=False, # Whether to flip images during testing
	transforms=[
	dict(type='Resize', # Use resize augmentation
	keep_ratio=True), # Whether to keep the ratio between height and width, the img_scale set here will be supressed by the img_scale set above.
	dict(type='RandomFlip'), # Thought RandomFlip is added in pipeline, it is not used when flip=False
	dict(
	type='Normalize', # Normalization config, the values are from img_norm_cfg
	mean=[123.675, 116.28, 103.53],
	std=[58.395, 57.12, 57.375],
	to_rgb=True),
	dict(type='ImageToTensor', # Convert image to tensor
	keys=['img']),
	dict(type='Collect', # Collect pipeline that collect necessary keys for testing.
	keys=['img'])
	])
	]
	data = dict(
	samples_per_gpu=2, # Batch size of a single GPU
	workers_per_gpu=2, # Worker to pre-fetch data for each single GPU
	train=dict( # Train dataset config
	type='CityscapesDataset', # Type of dataset, refer to mmseg/datasets/ for details.
	data_root='data/cityscapes/', # The root of dataset.
	img_dir='leftImg8bit/train', # The image directory of dataset.
	ann_dir='gtFine/train', # The annotation directory of dataset.
	pipeline=[ # pipeline, this is passed by the train_pipeline created before.
	dict(type='LoadImageFromFile'),
	dict(type='LoadAnnotations'),
	dict(
	type='Resize', img_scale=(2048, 1024), ratio_range=(0.5, 2.0)),
	dict(type='RandomCrop', crop_size=(512, 1024), cat_max_ratio=0.75),
	dict(type='RandomFlip', flip_ratio=0.5),
	dict(type='PhotoMetricDistortion'),
	dict(
	type='Normalize',
	mean=[123.675, 116.28, 103.53],
	std=[58.395, 57.12, 57.375],
	to_rgb=True),
	dict(type='Pad', size=(512, 1024), pad_val=0, seg_pad_val=255),
	dict(type='DefaultFormatBundle'),
	dict(type='Collect', keys=['img', 'gt_semantic_seg'])
	]),
	val=dict( # Validation dataset config
	type='CityscapesDataset',
	data_root='data/cityscapes/',
	img_dir='leftImg8bit/val',
	ann_dir='gtFine/val',
	pipeline=[ # Pipeline is passed by test_pipeline created before
	dict(type='LoadImageFromFile'),
	dict(
	type='MultiScaleFlipAug',
	img_scale=(2048, 1024),
	flip=False,
	transforms=[
	dict(type='Resize', keep_ratio=True),
	dict(type='RandomFlip'),
	dict(
	type='Normalize',
	mean=[123.675, 116.28, 103.53],
	std=[58.395, 57.12, 57.375],
	to_rgb=True),
	dict(type='ImageToTensor', keys=['img']),
	dict(type='Collect', keys=['img'])
	])
	]),
	test=dict(
	type='CityscapesDataset',
	data_root='data/cityscapes/',
	img_dir='leftImg8bit/val',
	ann_dir='gtFine/val',
	pipeline=[
	dict(type='LoadImageFromFile'),
	dict(
	type='MultiScaleFlipAug',
	img_scale=(2048, 1024),
	flip=False,
	transforms=[
	dict(type='Resize', keep_ratio=True),
	dict(type='RandomFlip'),
	dict(
	type='Normalize',
	mean=[123.675, 116.28, 103.53],
	std=[58.395, 57.12, 57.375],
	to_rgb=True),
	dict(type='ImageToTensor', keys=['img']),
	dict(type='Collect', keys=['img'])
	])
	]))
	log_config = dict( # config to register logger hook
	interval=50, # Interval to print the log
	hooks=[
	# dict(type='TensorboardLoggerHook') # The Tensorboard logger is also supported
	dict(type='TextLoggerHook', by_epoch=False)
	])
	dist_params = dict(backend='nccl') # Parameters to setup distributed training, the port can also be set.
	log_level = 'INFO' # The level of logging.
	load_from = None # load models as a pre-trained model from a given path. This will not resume training.
	resume_from = None # Resume checkpoints from a given path, the training will be resumed from the iteration when the checkpoint's is saved.
	workflow = [('train', 1)] # Workflow for runner. [('train', 1)] means there is only one workflow and the workflow named 'train' is executed once. The workflow trains the model by 40000 iterations according to the `runner.max_iters`.
	cudnn_benchmark = True # Whether use cudnn_benchmark to speed up, which is fast for fixed input size.
	optimizer = dict( # Config used to build optimizer, support all the optimizers in PyTorch whose arguments are also the same as those in PyTorch
	type='SGD', # Type of optimizers, refer to https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/optimizer/default_constructor.py#L13 for more details
	lr=0.01, # Learning rate of optimizers, see detail usages of the parameters in the documentation of PyTorch
	momentum=0.9, # Momentum
	weight_decay=0.0005) # Weight decay of SGD
	optimizer_config = dict() # Config used to build the optimizer hook, refer to https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/optimizer.py#L8 for implementation details.
	lr_config = dict(
	policy='poly', # The policy of scheduler, also support Step, CosineAnnealing, Cyclic, etc. Refer to details of supported LrUpdater from https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/lr_updater.py#L9.
	power=0.9, # The power of polynomial decay.
	min_lr=0.0001, # The minimum learning rate to stable the training.
	by_epoch=False) # Whethe count by epoch or not.
	runner = dict(
	type='IterBasedRunner', # Type of runner to use (i.e. IterBasedRunner or EpochBasedRunner)
	max_iters=40000) # Total number of iterations. For EpochBasedRunner use `max_epochs`
	checkpoint_config = dict( # Config to set the checkpoint hook, Refer to https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/checkpoint.py for implementation.
	by_epoch=False, # Whethe count by epoch or not.
	interval=4000) # The save interval.
	evaluation = dict( # The config to build the evaluation hook. Please refer to mmseg/core/evaulation/eval_hook.py for details.
	interval=4000, # The interval of evaluation.
	metric='mIoU') # The evaluation metric.


	```

	## FAQ

	### Ignore some fields in the base configs

	Sometimes, you may set `_delete_=True` to ignore some of fields in base configs.
	You may refer to [mmcv](https://mmcv.readthedocs.io/en/latest/utils.html#inherit-from-base-config-with-ignored-fields) for simple inllustration.

	In MMSegmentation, for example, to change the backbone of PSPNet with the following config.

	```python
	norm_cfg = dict(type='SyncBN', requires_grad=True)
	model = dict(
	type='MaskRCNN',
	pretrained='torchvision://resnet50',
	backbone=dict(
	type='ResNetV1c',
	depth=50,
	num_stages=4,
	out_indices=(0, 1, 2, 3),
	dilations=(1, 1, 2, 4),
	strides=(1, 2, 1, 1),
	norm_cfg=norm_cfg,
	norm_eval=False,
	style='pytorch',
	contract_dilation=True),
	decode_head=dict(...),
	auxiliary_head=dict(...))
	```

	`ResNet` and `HRNet` use different keywords to construct.

	```python
	_base_ = '../pspnet/psp_r50_512x1024_40ki_cityscpaes.py'
	norm_cfg = dict(type='SyncBN', requires_grad=True)
	model = dict(
	pretrained='open-mmlab://msra/hrnetv2_w32',
	backbone=dict(
	_delete_=True,
	type='HRNet',
	norm_cfg=norm_cfg,
	extra=dict(
	stage1=dict(
	num_modules=1,
	num_branches=1,
	block='BOTTLENECK',
	num_blocks=(4, ),
	num_channels=(64, )),
	stage2=dict(
	num_modules=1,
	num_branches=2,
	block='BASIC',
	num_blocks=(4, 4),
	num_channels=(32, 64)),
	stage3=dict(
	num_modules=4,
	num_branches=3,
	block='BASIC',
	num_blocks=(4, 4, 4),
	num_channels=(32, 64, 128)),
	stage4=dict(
	num_modules=3,
	num_branches=4,
	block='BASIC',
	num_blocks=(4, 4, 4, 4),
	num_channels=(32, 64, 128, 256)))),
	decode_head=dict(...),
	auxiliary_head=dict(...))
	```

	The `_delete_=True` would replace all old keys in `backbone` field with new keys new keys.

	### Use intermediate variables in configs

	Some intermediate variables are used in the configs files, like `train_pipeline`/`test_pipeline` in datasets.
	It's worth noting that when modifying intermediate variables in the children configs, user need to pass the intermediate variables into corresponding fields again.
	For example, we would like to change multi scale strategy to train/test a PSPNet. `train_pipeline`/`test_pipeline` are intermediate variable we would like modify.

	```python
	_base_ = '../pspnet/psp_r50_512x1024_40ki_cityscapes.py'
	crop_size = (512, 1024)
	img_norm_cfg = dict(
	mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
	train_pipeline = [
	dict(type='LoadImageFromFile'),
	dict(type='LoadAnnotations'),
	dict(type='Resize', img_scale=(2048, 1024), ratio_range=(1.0, 2.0)), # change to [1., 2.]
	dict(type='RandomCrop', crop_size=crop_size, cat_max_ratio=0.75),
	dict(type='RandomFlip', flip_ratio=0.5),
	dict(type='PhotoMetricDistortion'),
	dict(type='Normalize', **img_norm_cfg),
	dict(type='Pad', size=crop_size, pad_val=0, seg_pad_val=255),
	dict(type='DefaultFormatBundle'),
	dict(type='Collect', keys=['img', 'gt_semantic_seg']),
	]
	test_pipeline = [
	dict(type='LoadImageFromFile'),
	dict(
	type='MultiScaleFlipAug',
	img_scale=(2048, 1024),
	img_ratios=[0.5, 0.75, 1.0, 1.25, 1.5, 1.75], # change to multi scale testing
	flip=False,
	transforms=[
	dict(type='Resize', keep_ratio=True),
	dict(type='RandomFlip'),
	dict(type='Normalize', **img_norm_cfg),
	dict(type='ImageToTensor', keys=['img']),
	dict(type='Collect', keys=['img']),
	])
	]
	data = dict(
	train=dict(pipeline=train_pipeline),
	val=dict(pipeline=test_pipeline),
	test=dict(pipeline=test_pipeline))
	```

	We first define the new `train_pipeline`/`test_pipeline` and pass them into `data`.

	Similarly, if we would like to switch from `SyncBN` to `BN` or `MMSyncBN`, we need to substitute every `norm_cfg` in the config.

	```python
	_base_ = '../pspnet/psp_r50_512x1024_40ki_cityscpaes.py'
	norm_cfg = dict(type='BN', requires_grad=True)
	model = dict(
	backbone=dict(norm_cfg=norm_cfg),
	decode_head=dict(norm_cfg=norm_cfg),
	auxiliary_head=dict(norm_cfg=norm_cfg))
	```