Model design instructions
YOLO series model basic class
The structural figure is provided by RangeKing@GitHub. Thank you RangeKing!
Most YOLO series algorithms adopt a unified algorithm-building structure, typically as Darknet + PAFPN. In order to let users quickly understand the YOLO series algorithm architecture, we deliberately designed the BaseBackbone
+ BaseYOLONeck
structure, as shown in the above figure.
The benefits of the abstract BaseBackbone
include:
- Subclasses do not need to be concerned about the forward process. Just build the model as a builder pattern.
- It can be configured to achieve custom plug-in functions. Users can easily insert some similar attention modules.
- All subclasses automatically support freezing certain stages and bn functions.
BaseYOLONeck
has the same benefits as BaseBackbone
.
BaseBackbone
- As shown in Figure 1, for P5,
BaseBackbone
includes 1 stem layer and 4 stage layers which are similar to the basic structure of ResNet. - As shown in Figure 2, for P6,
BaseBackbone
includes 1 stem layer and 5 stage layers. Different backbone network algorithms inherit theBaseBackbone
. Users can build each layer of the whole network by implementing customized basic modules through the internalbuild_xx
method.
BaseYOLONeck
We reproduce the YOLO series Neck components in the similar way as the BaseBackbone
, and we can mainly divide them into Reduce layer
, UpSample layer
, TopDown layer
, DownSample layer
, BottomUP layer
and output convolution layer
. Each layer can be customized its internal construction by the inheritance and rewrite from the build_xx
method.
BaseDenseHead
MMYOLO uses the BaseDenseHead
designed in MMDetection as the base class of the Head structure. Take YOLOv5 as an example, the forward function of its HeadModule replaces the original forward method.
HeadModule
As shown in the above graph, the solid line is the implementation in MMYOLO, whereas the original implementation in MMDetection is shown in the dotted line. MMYOLO has the following advantages over the original implementation:
- In MMDetection,
bbox_head
is split into three large components:assigner
+box coder
+sampler
. But because the transfer between these three components is universal, it is necessary to encapsulate additional objects. With the unification in MMYOLO, users do not need to separate them. The advantages of not deliberately forcing the division of the three components are: data encapsulation of internal data is no longer required, code logic is simplified, and the difficulty of community use and algorithm reproduction is reduced. - MMYOLO is Faster. When users customize the implementation algorithm, they can deeply optimize part of the code without relying on the original framework.
In general, with the partly decoupled model + loss_by_feat
part in MMYOLO, users can construct any model with any loss_by_feat
by modifying the configuration. For example, applying the loss_by_feat
of YOLOX to the YOLOv5 model, etc.
Take the YOLOX configuration in MMDetection as an example, the Head module configuration is written as follows:
bbox_head=dict(
type='YOLOXHead',
num_classes=80,
in_channels=128,
feat_channels=128,
stacked_convs=2,
strides=(8, 16, 32),
use_depthwise=False,
norm_cfg=dict(type='BN', momentum=0.03, eps=0.001),
act_cfg=dict(type='Swish'),
...
loss_obj=dict(
type='CrossEntropyLoss',
use_sigmoid=True,
reduction='sum',
loss_weight=1.0),
loss_l1=dict(type='L1Loss', reduction='sum', loss_weight=1.0)),
train_cfg=dict(assigner=dict(type='SimOTAAssigner', center_radius=2.5)),
For the head_module in MMYOLO, the new configuration is written as follows:
bbox_head=dict(
type='YOLOXHead',
head_module=dict(
type='YOLOXHeadModule',
num_classes=80,
in_channels=256,
feat_channels=256,
widen_factor=widen_factor,
stacked_convs=2,
featmap_strides=(8, 16, 32),
use_depthwise=False,
norm_cfg=dict(type='BN', momentum=0.03, eps=0.001),
act_cfg=dict(type='SiLU', inplace=True),
),
...
loss_obj=dict(
type='mmdet.CrossEntropyLoss',
use_sigmoid=True,
reduction='sum',
loss_weight=1.0),
loss_bbox_aux=dict(type='mmdet.L1Loss', reduction='sum', loss_weight=1.0)),
train_cfg=dict(
assigner=dict(
type='mmdet.SimOTAAssigner',
center_radius=2.5,
iou_calculator=dict(type='mmdet.BboxOverlaps2D'))),