glenn-jocher
commited on
v2.0 Release (#491)
Browse filesSigned-off-by: Glenn Jocher <glenn.jocher@ultralytics.com>
- README.md +9 -7
- models/yolo.py +12 -8
- models/yolov5l.yaml +11 -15
- models/yolov5m.yaml +11 -15
- models/yolov5s.yaml +11 -15
- models/yolov5x.yaml +11 -15
- train.py +6 -6
- utils/torch_utils.py +1 -1
- utils/utils.py +5 -4
README.md
CHANGED
@@ -8,26 +8,28 @@ This repository represents Ultralytics open-source research into future object d
|
|
8 |
|
9 |
<img src="https://user-images.githubusercontent.com/26833433/85340570-30360a80-b49b-11ea-87cf-bdf33d53ae15.png" width="1000">** GPU Speed measures end-to-end time per image averaged over 5000 COCO val2017 images using a V100 GPU with batch size 8, and includes image preprocessing, PyTorch FP16 inference, postprocessing and NMS.
|
10 |
|
|
|
11 |
- **June 22, 2020**: [PANet](https://arxiv.org/abs/1803.01534) updates: new heads, reduced parameters, faster inference and improved mAP [364fcfd](https://github.com/ultralytics/yolov5/commit/364fcfd7dba53f46edd4f04c037a039c0a287972).
|
12 |
- **June 19, 2020**: [FP16](https://pytorch.org/docs/stable/nn.html#torch.nn.Module.half) as new default for smaller checkpoints and faster inference [d4c6674](https://github.com/ultralytics/yolov5/commit/d4c6674c98e19df4c40e33a777610a18d1961145).
|
13 |
- **June 9, 2020**: [CSP](https://github.com/WongKinYiu/CrossStagePartialNetworks) updates: improved speed, size, and accuracy (credit to @WongKinYiu for CSP).
|
14 |
-
- **May 27, 2020**: Public release
|
15 |
-
- **April 1, 2020**: Start development of future [YOLOv3](https://github.com/ultralytics/yolov3)/[YOLOv4](https://github.com/AlexeyAB/darknet)-based PyTorch models
|
16 |
|
17 |
|
18 |
## Pretrained Checkpoints
|
19 |
|
20 |
| Model | AP<sup>val</sup> | AP<sup>test</sup> | AP<sub>50</sub> | Speed<sub>GPU</sub> | FPS<sub>GPU</sub> || params | FLOPS |
|
21 |
|---------- |------ |------ |------ | -------- | ------| ------ |------ | :------: |
|
22 |
-
| [
|
23 |
-
| [
|
24 |
-
| [
|
25 |
-
| [
|
|
|
26 |
| [YOLOv3-SPP](https://drive.google.com/open?id=1Drs_Aiu7xx6S-ix95f9kNsA6ueKRpN2J) | 45.6 | 45.5 | 65.2 | 4.5ms | 222 || 63.0M | 118.0B
|
27 |
|
28 |
|
29 |
** AP<sup>test</sup> denotes COCO [test-dev2017](http://cocodataset.org/#upload) server results, all other AP results in the table denote val2017 accuracy.
|
30 |
-
** All AP numbers are for single-model single-scale without ensemble or test-time augmentation. Reproduce by `python test.py --data coco.yaml --img
|
31 |
** Speed<sub>GPU</sub> measures end-to-end time per image averaged over 5000 COCO val2017 images using a GCP [n1-standard-16](https://cloud.google.com/compute/docs/machine-types#n1_standard_machine_types) instance with one V100 GPU, and includes image preprocessing, PyTorch FP16 image inference at --batch-size 32 --img-size 640, postprocessing and NMS. Average NMS time included in this chart is 1-2ms/img. Reproduce by `python test.py --data coco.yaml --img 640 --conf 0.1`
|
32 |
** All checkpoints are trained to 300 epochs with default settings and hyperparameters (no autoaugmentation).
|
33 |
|
|
|
8 |
|
9 |
<img src="https://user-images.githubusercontent.com/26833433/85340570-30360a80-b49b-11ea-87cf-bdf33d53ae15.png" width="1000">** GPU Speed measures end-to-end time per image averaged over 5000 COCO val2017 images using a V100 GPU with batch size 8, and includes image preprocessing, PyTorch FP16 inference, postprocessing and NMS.
|
10 |
|
11 |
+
- **July 23, 2020**: [v2.0 release](https://arxiv.org/abs/1803.01534): improved model definition, training and mAP []().
|
12 |
- **June 22, 2020**: [PANet](https://arxiv.org/abs/1803.01534) updates: new heads, reduced parameters, faster inference and improved mAP [364fcfd](https://github.com/ultralytics/yolov5/commit/364fcfd7dba53f46edd4f04c037a039c0a287972).
|
13 |
- **June 19, 2020**: [FP16](https://pytorch.org/docs/stable/nn.html#torch.nn.Module.half) as new default for smaller checkpoints and faster inference [d4c6674](https://github.com/ultralytics/yolov5/commit/d4c6674c98e19df4c40e33a777610a18d1961145).
|
14 |
- **June 9, 2020**: [CSP](https://github.com/WongKinYiu/CrossStagePartialNetworks) updates: improved speed, size, and accuracy (credit to @WongKinYiu for CSP).
|
15 |
+
- **May 27, 2020**: Public release. YOLOv5 models are SOTA among all known YOLO implementations.
|
16 |
+
- **April 1, 2020**: Start development of future compound-scaled [YOLOv3](https://github.com/ultralytics/yolov3)/[YOLOv4](https://github.com/AlexeyAB/darknet)-based PyTorch models.
|
17 |
|
18 |
|
19 |
## Pretrained Checkpoints
|
20 |
|
21 |
| Model | AP<sup>val</sup> | AP<sup>test</sup> | AP<sub>50</sub> | Speed<sub>GPU</sub> | FPS<sub>GPU</sub> || params | FLOPS |
|
22 |
|---------- |------ |------ |------ | -------- | ------| ------ |------ | :------: |
|
23 |
+
| [YOLOv5.1s](https://drive.google.com/open?id=1Drs_Aiu7xx6S-ix95f9kNsA6ueKRpN2J) | 36.1 | 36.1 | 55.3 | **2.1ms** | **476** || 7.5M | 13.2B
|
24 |
+
| [YOLOv5.1m](https://drive.google.com/open?id=1Drs_Aiu7xx6S-ix95f9kNsA6ueKRpN2J) | 43.5 | 43.5 | 62.5 | 3.0ms | 333 || 21.8M | 39.4B
|
25 |
+
| [YOLOv5.1l](https://drive.google.com/open?id=1Drs_Aiu7xx6S-ix95f9kNsA6ueKRpN2J) | 47.0 | 47.1 | 65.6 | 3.9ms | 256 || 47.8M | 88.1B
|
26 |
+
| [YOLOv5.1x](https://drive.google.com/open?id=1Drs_Aiu7xx6S-ix95f9kNsA6ueKRpN2J) | **49.0** | **49.0** | **67.4** | 6.1ms | 164 || 89.0M | 166.4B
|
27 |
+
| | | | | | || |
|
28 |
| [YOLOv3-SPP](https://drive.google.com/open?id=1Drs_Aiu7xx6S-ix95f9kNsA6ueKRpN2J) | 45.6 | 45.5 | 65.2 | 4.5ms | 222 || 63.0M | 118.0B
|
29 |
|
30 |
|
31 |
** AP<sup>test</sup> denotes COCO [test-dev2017](http://cocodataset.org/#upload) server results, all other AP results in the table denote val2017 accuracy.
|
32 |
+
** All AP numbers are for single-model single-scale without ensemble or test-time augmentation. Reproduce by `python test.py --data coco.yaml --img 672 --conf 0.001`
|
33 |
** Speed<sub>GPU</sub> measures end-to-end time per image averaged over 5000 COCO val2017 images using a GCP [n1-standard-16](https://cloud.google.com/compute/docs/machine-types#n1_standard_machine_types) instance with one V100 GPU, and includes image preprocessing, PyTorch FP16 image inference at --batch-size 32 --img-size 640, postprocessing and NMS. Average NMS time included in this chart is 1-2ms/img. Reproduce by `python test.py --data coco.yaml --img 640 --conf 0.1`
|
34 |
** All checkpoints are trained to 300 epochs with default settings and hyperparameters (no autoaugmentation).
|
35 |
|
models/yolo.py
CHANGED
@@ -5,7 +5,7 @@ from models.experimental import *
|
|
5 |
|
6 |
|
7 |
class Detect(nn.Module):
|
8 |
-
def __init__(self, nc=80, anchors=()): # detection layer
|
9 |
super(Detect, self).__init__()
|
10 |
self.stride = None # strides computed during build
|
11 |
self.nc = nc # number of classes
|
@@ -16,6 +16,7 @@ class Detect(nn.Module):
|
|
16 |
a = torch.tensor(anchors).float().view(self.nl, -1, 2)
|
17 |
self.register_buffer('anchors', a) # shape(nl,na,2)
|
18 |
self.register_buffer('anchor_grid', a.clone().view(self.nl, 1, -1, 1, 1, 2)) # shape(nl,1,na,1,1,2)
|
|
|
19 |
self.export = False # onnx export
|
20 |
|
21 |
def forward(self, x):
|
@@ -23,6 +24,7 @@ class Detect(nn.Module):
|
|
23 |
z = [] # inference output
|
24 |
self.training |= self.export
|
25 |
for i in range(self.nl):
|
|
|
26 |
bs, _, ny, nx = x[i].shape # x(bs,255,20,20) to x(bs,3,20,20,85)
|
27 |
x[i] = x[i].view(bs, self.na, self.no, ny, nx).permute(0, 1, 3, 4, 2).contiguous()
|
28 |
|
@@ -124,8 +126,7 @@ class Model(nn.Module):
|
|
124 |
def _initialize_biases(self, cf=None): # initialize biases into Detect(), cf is class frequency
|
125 |
# cf = torch.bincount(torch.tensor(np.concatenate(dataset.labels, 0)[:, 0]).long(), minlength=nc) + 1.
|
126 |
m = self.model[-1] # Detect() module
|
127 |
-
for
|
128 |
-
mi = self.model[f % m.i]
|
129 |
b = mi.bias.view(m.na, -1) # conv.bias(255) to (3,85)
|
130 |
b[:, 4] += math.log(8 / (640 / s) ** 2) # obj (8 objects per 640 image)
|
131 |
b[:, 5:] += math.log(0.6 / (m.nc - 0.99)) if cf is None else torch.log(cf / cf.sum()) # cls
|
@@ -133,9 +134,9 @@ class Model(nn.Module):
|
|
133 |
|
134 |
def _print_biases(self):
|
135 |
m = self.model[-1] # Detect() module
|
136 |
-
for
|
137 |
-
b =
|
138 |
-
print(('%
|
139 |
|
140 |
# def _print_weights(self):
|
141 |
# for m in self.model.modules():
|
@@ -159,7 +160,7 @@ class Model(nn.Module):
|
|
159 |
def parse_model(d, ch): # model_dict, input_channels(3)
|
160 |
print('\n%3s%18s%3s%10s %-40s%-30s' % ('', 'from', 'n', 'params', 'module', 'arguments'))
|
161 |
anchors, nc, gd, gw = d['anchors'], d['nc'], d['depth_multiple'], d['width_multiple']
|
162 |
-
na = (len(anchors[0]) // 2) # number of anchors
|
163 |
no = na * (nc + 5) # number of outputs = anchors * (classes + 5)
|
164 |
|
165 |
layers, save, c2 = [], [], ch[-1] # layers, savelist, ch out
|
@@ -181,6 +182,7 @@ def parse_model(d, ch): # model_dict, input_channels(3)
|
|
181 |
# e = math.log(c2 / ch[1]) / math.log(2)
|
182 |
# c2 = int(ch[1] * ex ** e)
|
183 |
# if m != Focus:
|
|
|
184 |
c2 = make_divisible(c2 * gw, 8) if c2 != no else c2
|
185 |
|
186 |
# Experimental
|
@@ -201,7 +203,9 @@ def parse_model(d, ch): # model_dict, input_channels(3)
|
|
201 |
elif m is Concat:
|
202 |
c2 = sum([ch[-1 if x == -1 else x + 1] for x in f])
|
203 |
elif m is Detect:
|
204 |
-
|
|
|
|
|
205 |
else:
|
206 |
c2 = ch[f]
|
207 |
|
|
|
5 |
|
6 |
|
7 |
class Detect(nn.Module):
|
8 |
+
def __init__(self, nc=80, anchors=(), ch=()): # detection layer
|
9 |
super(Detect, self).__init__()
|
10 |
self.stride = None # strides computed during build
|
11 |
self.nc = nc # number of classes
|
|
|
16 |
a = torch.tensor(anchors).float().view(self.nl, -1, 2)
|
17 |
self.register_buffer('anchors', a) # shape(nl,na,2)
|
18 |
self.register_buffer('anchor_grid', a.clone().view(self.nl, 1, -1, 1, 1, 2)) # shape(nl,1,na,1,1,2)
|
19 |
+
self.m = nn.ModuleList(nn.Conv2d(x, self.no * self.na, 1) for x in ch) # output conv
|
20 |
self.export = False # onnx export
|
21 |
|
22 |
def forward(self, x):
|
|
|
24 |
z = [] # inference output
|
25 |
self.training |= self.export
|
26 |
for i in range(self.nl):
|
27 |
+
x[i] = self.m[i](x[i]) # conv
|
28 |
bs, _, ny, nx = x[i].shape # x(bs,255,20,20) to x(bs,3,20,20,85)
|
29 |
x[i] = x[i].view(bs, self.na, self.no, ny, nx).permute(0, 1, 3, 4, 2).contiguous()
|
30 |
|
|
|
126 |
def _initialize_biases(self, cf=None): # initialize biases into Detect(), cf is class frequency
|
127 |
# cf = torch.bincount(torch.tensor(np.concatenate(dataset.labels, 0)[:, 0]).long(), minlength=nc) + 1.
|
128 |
m = self.model[-1] # Detect() module
|
129 |
+
for mi, s in zip(m.m, m.stride): # from
|
|
|
130 |
b = mi.bias.view(m.na, -1) # conv.bias(255) to (3,85)
|
131 |
b[:, 4] += math.log(8 / (640 / s) ** 2) # obj (8 objects per 640 image)
|
132 |
b[:, 5:] += math.log(0.6 / (m.nc - 0.99)) if cf is None else torch.log(cf / cf.sum()) # cls
|
|
|
134 |
|
135 |
def _print_biases(self):
|
136 |
m = self.model[-1] # Detect() module
|
137 |
+
for mi in m.m: # from
|
138 |
+
b = mi.bias.detach().view(m.na, -1).T # conv.bias(255) to (3,85)
|
139 |
+
print(('%6g Conv2d.bias:' + '%10.3g' * 6) % (mi.weight.shape[1], *b[:5].mean(1).tolist(), b[5:].mean()))
|
140 |
|
141 |
# def _print_weights(self):
|
142 |
# for m in self.model.modules():
|
|
|
160 |
def parse_model(d, ch): # model_dict, input_channels(3)
|
161 |
print('\n%3s%18s%3s%10s %-40s%-30s' % ('', 'from', 'n', 'params', 'module', 'arguments'))
|
162 |
anchors, nc, gd, gw = d['anchors'], d['nc'], d['depth_multiple'], d['width_multiple']
|
163 |
+
na = (len(anchors[0]) // 2) if isinstance(anchors, list) else anchors # number of anchors
|
164 |
no = na * (nc + 5) # number of outputs = anchors * (classes + 5)
|
165 |
|
166 |
layers, save, c2 = [], [], ch[-1] # layers, savelist, ch out
|
|
|
182 |
# e = math.log(c2 / ch[1]) / math.log(2)
|
183 |
# c2 = int(ch[1] * ex ** e)
|
184 |
# if m != Focus:
|
185 |
+
|
186 |
c2 = make_divisible(c2 * gw, 8) if c2 != no else c2
|
187 |
|
188 |
# Experimental
|
|
|
203 |
elif m is Concat:
|
204 |
c2 = sum([ch[-1 if x == -1 else x + 1] for x in f])
|
205 |
elif m is Detect:
|
206 |
+
args.append([ch[x + 1] for x in f])
|
207 |
+
if isinstance(args[1], int): # number of anchors
|
208 |
+
args[1] = [list(range(args[1] * 2))] * len(f)
|
209 |
else:
|
210 |
c2 = ch[f]
|
211 |
|
models/yolov5l.yaml
CHANGED
@@ -5,9 +5,9 @@ width_multiple: 1.0 # layer channel multiple
|
|
5 |
|
6 |
# anchors
|
7 |
anchors:
|
8 |
-
- [116,90, 156,198, 373,326] # P5/32
|
9 |
-
- [30,61, 62,45, 59,119] # P4/16
|
10 |
- [10,13, 16,30, 33,23] # P3/8
|
|
|
|
|
11 |
|
12 |
# YOLOv5 backbone
|
13 |
backbone:
|
@@ -19,15 +19,14 @@ backbone:
|
|
19 |
[-1, 9, BottleneckCSP, [256]],
|
20 |
[-1, 1, Conv, [512, 3, 2]], # 5-P4/16
|
21 |
[-1, 9, BottleneckCSP, [512]],
|
22 |
-
[-1, 1, Conv, [1024, 3, 2]],
|
23 |
[-1, 1, SPP, [1024, [5, 9, 13]]],
|
|
|
24 |
]
|
25 |
|
26 |
# YOLOv5 head
|
27 |
head:
|
28 |
-
[[-1,
|
29 |
-
|
30 |
-
[-1, 1, Conv, [512, 1, 1]],
|
31 |
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
|
32 |
[[-1, 6], 1, Concat, [1]], # cat backbone P4
|
33 |
[-1, 3, BottleneckCSP, [512, False]], # 13
|
@@ -35,18 +34,15 @@ head:
|
|
35 |
[-1, 1, Conv, [256, 1, 1]],
|
36 |
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
|
37 |
[[-1, 4], 1, Concat, [1]], # cat backbone P3
|
38 |
-
[-1, 3, BottleneckCSP, [256, False]],
|
39 |
-
[-1, 1, nn.Conv2d, [na * (nc + 5), 1, 1]], # 18 (P3/8-small)
|
40 |
|
41 |
-
[-
|
42 |
[[-1, 14], 1, Concat, [1]], # cat head P4
|
43 |
-
[-1, 3, BottleneckCSP, [512, False]],
|
44 |
-
[-1, 1, nn.Conv2d, [na * (nc + 5), 1, 1]], # 22 (P4/16-medium)
|
45 |
|
46 |
-
[-
|
47 |
[[-1, 10], 1, Concat, [1]], # cat head P5
|
48 |
-
[-1, 3, BottleneckCSP, [1024, False]],
|
49 |
-
[-1, 1, nn.Conv2d, [na * (nc + 5), 1, 1]], # 26 (P5/32-large)
|
50 |
|
51 |
-
[[], 1, Detect, [nc, anchors]], # Detect(
|
52 |
]
|
|
|
5 |
|
6 |
# anchors
|
7 |
anchors:
|
|
|
|
|
8 |
- [10,13, 16,30, 33,23] # P3/8
|
9 |
+
- [30,61, 62,45, 59,119] # P4/16
|
10 |
+
- [116,90, 156,198, 373,326] # P5/32
|
11 |
|
12 |
# YOLOv5 backbone
|
13 |
backbone:
|
|
|
19 |
[-1, 9, BottleneckCSP, [256]],
|
20 |
[-1, 1, Conv, [512, 3, 2]], # 5-P4/16
|
21 |
[-1, 9, BottleneckCSP, [512]],
|
22 |
+
[-1, 1, Conv, [1024, 3, 2]], # 7-P5/32
|
23 |
[-1, 1, SPP, [1024, [5, 9, 13]]],
|
24 |
+
[-1, 3, BottleneckCSP, [1024, False]], # 9
|
25 |
]
|
26 |
|
27 |
# YOLOv5 head
|
28 |
head:
|
29 |
+
[[-1, 1, Conv, [512, 1, 1]],
|
|
|
|
|
30 |
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
|
31 |
[[-1, 6], 1, Concat, [1]], # cat backbone P4
|
32 |
[-1, 3, BottleneckCSP, [512, False]], # 13
|
|
|
34 |
[-1, 1, Conv, [256, 1, 1]],
|
35 |
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
|
36 |
[[-1, 4], 1, Concat, [1]], # cat backbone P3
|
37 |
+
[-1, 3, BottleneckCSP, [256, False]], # 17
|
|
|
38 |
|
39 |
+
[-1, 1, Conv, [256, 3, 2]],
|
40 |
[[-1, 14], 1, Concat, [1]], # cat head P4
|
41 |
+
[-1, 3, BottleneckCSP, [512, False]], # 20
|
|
|
42 |
|
43 |
+
[-1, 1, Conv, [512, 3, 2]],
|
44 |
[[-1, 10], 1, Concat, [1]], # cat head P5
|
45 |
+
[-1, 3, BottleneckCSP, [1024, False]], # 23
|
|
|
46 |
|
47 |
+
[[17, 20, 23], 1, Detect, [nc, anchors]], # Detect(P3, P4, P5)
|
48 |
]
|
models/yolov5m.yaml
CHANGED
@@ -5,9 +5,9 @@ width_multiple: 0.75 # layer channel multiple
|
|
5 |
|
6 |
# anchors
|
7 |
anchors:
|
8 |
-
- [116,90, 156,198, 373,326] # P5/32
|
9 |
-
- [30,61, 62,45, 59,119] # P4/16
|
10 |
- [10,13, 16,30, 33,23] # P3/8
|
|
|
|
|
11 |
|
12 |
# YOLOv5 backbone
|
13 |
backbone:
|
@@ -19,15 +19,14 @@ backbone:
|
|
19 |
[-1, 9, BottleneckCSP, [256]],
|
20 |
[-1, 1, Conv, [512, 3, 2]], # 5-P4/16
|
21 |
[-1, 9, BottleneckCSP, [512]],
|
22 |
-
[-1, 1, Conv, [1024, 3, 2]],
|
23 |
[-1, 1, SPP, [1024, [5, 9, 13]]],
|
|
|
24 |
]
|
25 |
|
26 |
# YOLOv5 head
|
27 |
head:
|
28 |
-
[[-1,
|
29 |
-
|
30 |
-
[-1, 1, Conv, [512, 1, 1]],
|
31 |
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
|
32 |
[[-1, 6], 1, Concat, [1]], # cat backbone P4
|
33 |
[-1, 3, BottleneckCSP, [512, False]], # 13
|
@@ -35,18 +34,15 @@ head:
|
|
35 |
[-1, 1, Conv, [256, 1, 1]],
|
36 |
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
|
37 |
[[-1, 4], 1, Concat, [1]], # cat backbone P3
|
38 |
-
[-1, 3, BottleneckCSP, [256, False]],
|
39 |
-
[-1, 1, nn.Conv2d, [na * (nc + 5), 1, 1]], # 18 (P3/8-small)
|
40 |
|
41 |
-
[-
|
42 |
[[-1, 14], 1, Concat, [1]], # cat head P4
|
43 |
-
[-1, 3, BottleneckCSP, [512, False]],
|
44 |
-
[-1, 1, nn.Conv2d, [na * (nc + 5), 1, 1]], # 22 (P4/16-medium)
|
45 |
|
46 |
-
[-
|
47 |
[[-1, 10], 1, Concat, [1]], # cat head P5
|
48 |
-
[-1, 3, BottleneckCSP, [1024, False]],
|
49 |
-
[-1, 1, nn.Conv2d, [na * (nc + 5), 1, 1]], # 26 (P5/32-large)
|
50 |
|
51 |
-
[[], 1, Detect, [nc, anchors]], # Detect(
|
52 |
]
|
|
|
5 |
|
6 |
# anchors
|
7 |
anchors:
|
|
|
|
|
8 |
- [10,13, 16,30, 33,23] # P3/8
|
9 |
+
- [30,61, 62,45, 59,119] # P4/16
|
10 |
+
- [116,90, 156,198, 373,326] # P5/32
|
11 |
|
12 |
# YOLOv5 backbone
|
13 |
backbone:
|
|
|
19 |
[-1, 9, BottleneckCSP, [256]],
|
20 |
[-1, 1, Conv, [512, 3, 2]], # 5-P4/16
|
21 |
[-1, 9, BottleneckCSP, [512]],
|
22 |
+
[-1, 1, Conv, [1024, 3, 2]], # 7-P5/32
|
23 |
[-1, 1, SPP, [1024, [5, 9, 13]]],
|
24 |
+
[-1, 3, BottleneckCSP, [1024, False]], # 9
|
25 |
]
|
26 |
|
27 |
# YOLOv5 head
|
28 |
head:
|
29 |
+
[[-1, 1, Conv, [512, 1, 1]],
|
|
|
|
|
30 |
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
|
31 |
[[-1, 6], 1, Concat, [1]], # cat backbone P4
|
32 |
[-1, 3, BottleneckCSP, [512, False]], # 13
|
|
|
34 |
[-1, 1, Conv, [256, 1, 1]],
|
35 |
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
|
36 |
[[-1, 4], 1, Concat, [1]], # cat backbone P3
|
37 |
+
[-1, 3, BottleneckCSP, [256, False]], # 17
|
|
|
38 |
|
39 |
+
[-1, 1, Conv, [256, 3, 2]],
|
40 |
[[-1, 14], 1, Concat, [1]], # cat head P4
|
41 |
+
[-1, 3, BottleneckCSP, [512, False]], # 20
|
|
|
42 |
|
43 |
+
[-1, 1, Conv, [512, 3, 2]],
|
44 |
[[-1, 10], 1, Concat, [1]], # cat head P5
|
45 |
+
[-1, 3, BottleneckCSP, [1024, False]], # 23
|
|
|
46 |
|
47 |
+
[[17, 20, 23], 1, Detect, [nc, anchors]], # Detect(P3, P4, P5)
|
48 |
]
|
models/yolov5s.yaml
CHANGED
@@ -5,9 +5,9 @@ width_multiple: 0.50 # layer channel multiple
|
|
5 |
|
6 |
# anchors
|
7 |
anchors:
|
8 |
-
- [116,90, 156,198, 373,326] # P5/32
|
9 |
-
- [30,61, 62,45, 59,119] # P4/16
|
10 |
- [10,13, 16,30, 33,23] # P3/8
|
|
|
|
|
11 |
|
12 |
# YOLOv5 backbone
|
13 |
backbone:
|
@@ -19,15 +19,14 @@ backbone:
|
|
19 |
[-1, 9, BottleneckCSP, [256]],
|
20 |
[-1, 1, Conv, [512, 3, 2]], # 5-P4/16
|
21 |
[-1, 9, BottleneckCSP, [512]],
|
22 |
-
[-1, 1, Conv, [1024, 3, 2]],
|
23 |
[-1, 1, SPP, [1024, [5, 9, 13]]],
|
|
|
24 |
]
|
25 |
|
26 |
# YOLOv5 head
|
27 |
head:
|
28 |
-
[[-1,
|
29 |
-
|
30 |
-
[-1, 1, Conv, [512, 1, 1]],
|
31 |
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
|
32 |
[[-1, 6], 1, Concat, [1]], # cat backbone P4
|
33 |
[-1, 3, BottleneckCSP, [512, False]], # 13
|
@@ -35,18 +34,15 @@ head:
|
|
35 |
[-1, 1, Conv, [256, 1, 1]],
|
36 |
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
|
37 |
[[-1, 4], 1, Concat, [1]], # cat backbone P3
|
38 |
-
[-1, 3, BottleneckCSP, [256, False]],
|
39 |
-
[-1, 1, nn.Conv2d, [na * (nc + 5), 1, 1]], # 18 (P3/8-small)
|
40 |
|
41 |
-
[-
|
42 |
[[-1, 14], 1, Concat, [1]], # cat head P4
|
43 |
-
[-1, 3, BottleneckCSP, [512, False]],
|
44 |
-
[-1, 1, nn.Conv2d, [na * (nc + 5), 1, 1]], # 22 (P4/16-medium)
|
45 |
|
46 |
-
[-
|
47 |
[[-1, 10], 1, Concat, [1]], # cat head P5
|
48 |
-
[-1, 3, BottleneckCSP, [1024, False]],
|
49 |
-
[-1, 1, nn.Conv2d, [na * (nc + 5), 1, 1]], # 26 (P5/32-large)
|
50 |
|
51 |
-
[[], 1, Detect, [nc, anchors]], # Detect(
|
52 |
]
|
|
|
5 |
|
6 |
# anchors
|
7 |
anchors:
|
|
|
|
|
8 |
- [10,13, 16,30, 33,23] # P3/8
|
9 |
+
- [30,61, 62,45, 59,119] # P4/16
|
10 |
+
- [116,90, 156,198, 373,326] # P5/32
|
11 |
|
12 |
# YOLOv5 backbone
|
13 |
backbone:
|
|
|
19 |
[-1, 9, BottleneckCSP, [256]],
|
20 |
[-1, 1, Conv, [512, 3, 2]], # 5-P4/16
|
21 |
[-1, 9, BottleneckCSP, [512]],
|
22 |
+
[-1, 1, Conv, [1024, 3, 2]], # 7-P5/32
|
23 |
[-1, 1, SPP, [1024, [5, 9, 13]]],
|
24 |
+
[-1, 3, BottleneckCSP, [1024, False]], # 9
|
25 |
]
|
26 |
|
27 |
# YOLOv5 head
|
28 |
head:
|
29 |
+
[[-1, 1, Conv, [512, 1, 1]],
|
|
|
|
|
30 |
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
|
31 |
[[-1, 6], 1, Concat, [1]], # cat backbone P4
|
32 |
[-1, 3, BottleneckCSP, [512, False]], # 13
|
|
|
34 |
[-1, 1, Conv, [256, 1, 1]],
|
35 |
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
|
36 |
[[-1, 4], 1, Concat, [1]], # cat backbone P3
|
37 |
+
[-1, 3, BottleneckCSP, [256, False]], # 17
|
|
|
38 |
|
39 |
+
[-1, 1, Conv, [256, 3, 2]],
|
40 |
[[-1, 14], 1, Concat, [1]], # cat head P4
|
41 |
+
[-1, 3, BottleneckCSP, [512, False]], # 20
|
|
|
42 |
|
43 |
+
[-1, 1, Conv, [512, 3, 2]],
|
44 |
[[-1, 10], 1, Concat, [1]], # cat head P5
|
45 |
+
[-1, 3, BottleneckCSP, [1024, False]], # 23
|
|
|
46 |
|
47 |
+
[[17, 20, 23], 1, Detect, [nc, anchors]], # Detect(P3, P4, P5)
|
48 |
]
|
models/yolov5x.yaml
CHANGED
@@ -5,9 +5,9 @@ width_multiple: 1.25 # layer channel multiple
|
|
5 |
|
6 |
# anchors
|
7 |
anchors:
|
8 |
-
- [116,90, 156,198, 373,326] # P5/32
|
9 |
-
- [30,61, 62,45, 59,119] # P4/16
|
10 |
- [10,13, 16,30, 33,23] # P3/8
|
|
|
|
|
11 |
|
12 |
# YOLOv5 backbone
|
13 |
backbone:
|
@@ -19,15 +19,14 @@ backbone:
|
|
19 |
[-1, 9, BottleneckCSP, [256]],
|
20 |
[-1, 1, Conv, [512, 3, 2]], # 5-P4/16
|
21 |
[-1, 9, BottleneckCSP, [512]],
|
22 |
-
[-1, 1, Conv, [1024, 3, 2]],
|
23 |
[-1, 1, SPP, [1024, [5, 9, 13]]],
|
|
|
24 |
]
|
25 |
|
26 |
# YOLOv5 head
|
27 |
head:
|
28 |
-
[[-1,
|
29 |
-
|
30 |
-
[-1, 1, Conv, [512, 1, 1]],
|
31 |
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
|
32 |
[[-1, 6], 1, Concat, [1]], # cat backbone P4
|
33 |
[-1, 3, BottleneckCSP, [512, False]], # 13
|
@@ -35,18 +34,15 @@ head:
|
|
35 |
[-1, 1, Conv, [256, 1, 1]],
|
36 |
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
|
37 |
[[-1, 4], 1, Concat, [1]], # cat backbone P3
|
38 |
-
[-1, 3, BottleneckCSP, [256, False]],
|
39 |
-
[-1, 1, nn.Conv2d, [na * (nc + 5), 1, 1]], # 18 (P3/8-small)
|
40 |
|
41 |
-
[-
|
42 |
[[-1, 14], 1, Concat, [1]], # cat head P4
|
43 |
-
[-1, 3, BottleneckCSP, [512, False]],
|
44 |
-
[-1, 1, nn.Conv2d, [na * (nc + 5), 1, 1]], # 22 (P4/16-medium)
|
45 |
|
46 |
-
[-
|
47 |
[[-1, 10], 1, Concat, [1]], # cat head P5
|
48 |
-
[-1, 3, BottleneckCSP, [1024, False]],
|
49 |
-
[-1, 1, nn.Conv2d, [na * (nc + 5), 1, 1]], # 26 (P5/32-large)
|
50 |
|
51 |
-
[[], 1, Detect, [nc, anchors]], # Detect(
|
52 |
]
|
|
|
5 |
|
6 |
# anchors
|
7 |
anchors:
|
|
|
|
|
8 |
- [10,13, 16,30, 33,23] # P3/8
|
9 |
+
- [30,61, 62,45, 59,119] # P4/16
|
10 |
+
- [116,90, 156,198, 373,326] # P5/32
|
11 |
|
12 |
# YOLOv5 backbone
|
13 |
backbone:
|
|
|
19 |
[-1, 9, BottleneckCSP, [256]],
|
20 |
[-1, 1, Conv, [512, 3, 2]], # 5-P4/16
|
21 |
[-1, 9, BottleneckCSP, [512]],
|
22 |
+
[-1, 1, Conv, [1024, 3, 2]], # 7-P5/32
|
23 |
[-1, 1, SPP, [1024, [5, 9, 13]]],
|
24 |
+
[-1, 3, BottleneckCSP, [1024, False]], # 9
|
25 |
]
|
26 |
|
27 |
# YOLOv5 head
|
28 |
head:
|
29 |
+
[[-1, 1, Conv, [512, 1, 1]],
|
|
|
|
|
30 |
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
|
31 |
[[-1, 6], 1, Concat, [1]], # cat backbone P4
|
32 |
[-1, 3, BottleneckCSP, [512, False]], # 13
|
|
|
34 |
[-1, 1, Conv, [256, 1, 1]],
|
35 |
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
|
36 |
[[-1, 4], 1, Concat, [1]], # cat backbone P3
|
37 |
+
[-1, 3, BottleneckCSP, [256, False]], # 17
|
|
|
38 |
|
39 |
+
[-1, 1, Conv, [256, 3, 2]],
|
40 |
[[-1, 14], 1, Concat, [1]], # cat head P4
|
41 |
+
[-1, 3, BottleneckCSP, [512, False]], # 20
|
|
|
42 |
|
43 |
+
[-1, 1, Conv, [512, 3, 2]],
|
44 |
[[-1, 10], 1, Concat, [1]], # cat head P5
|
45 |
+
[-1, 3, BottleneckCSP, [1024, False]], # 23
|
|
|
46 |
|
47 |
+
[[17, 20, 23], 1, Detect, [nc, anchors]], # Detect(P3, P4, P5)
|
48 |
]
|
train.py
CHANGED
@@ -27,16 +27,16 @@ hyp = {'optimizer': 'SGD', # ['adam', 'SGD', None] if none, default is SGD
|
|
27 |
'momentum': 0.937, # SGD momentum/Adam beta1
|
28 |
'weight_decay': 5e-4, # optimizer weight decay
|
29 |
'giou': 0.05, # giou loss gain
|
30 |
-
'cls': 0.
|
31 |
'cls_pw': 1.0, # cls BCELoss positive_weight
|
32 |
'obj': 1.0, # obj loss gain (*=img_size/320 if img_size != 320)
|
33 |
'obj_pw': 1.0, # obj BCELoss positive_weight
|
34 |
'iou_t': 0.20, # iou training threshold
|
35 |
'anchor_t': 4.0, # anchor-multiple threshold
|
36 |
'fl_gamma': 0.0, # focal loss gamma (efficientDet default is gamma=1.5)
|
37 |
-
'hsv_h': 0.
|
38 |
-
'hsv_s': 0.
|
39 |
-
'hsv_v': 0.
|
40 |
'degrees': 0.0, # image rotation (+/- deg)
|
41 |
'translate': 0.0, # image translation (+/- fraction)
|
42 |
'scale': 0.5, # image scale (+/- gain)
|
@@ -159,7 +159,7 @@ def train(hyp, tb_writer, opt, device):
|
|
159 |
model, optimizer = amp.initialize(model, optimizer, opt_level='O1', verbosity=0)
|
160 |
|
161 |
# Scheduler https://arxiv.org/pdf/1812.01187.pdf
|
162 |
-
lf = lambda x: (((1 + math.cos(x * math.pi / epochs)) / 2) ** 1.0) * 0.
|
163 |
scheduler = lr_scheduler.LambdaLR(optimizer, lr_lambda=lf)
|
164 |
# https://discuss.pytorch.org/t/a-problem-occured-when-resuming-an-optimizer/28822
|
165 |
# plot_lr_scheduler(optimizer, scheduler, epochs)
|
@@ -334,7 +334,7 @@ def train(hyp, tb_writer, opt, device):
|
|
334 |
if rank in [-1, 0]:
|
335 |
# mAP
|
336 |
if ema is not None:
|
337 |
-
ema.update_attr(model, include=['
|
338 |
final_epoch = epoch + 1 == epochs
|
339 |
if not opt.notest or final_epoch: # Calculate mAP
|
340 |
results, maps, times = test.test(opt.data,
|
|
|
27 |
'momentum': 0.937, # SGD momentum/Adam beta1
|
28 |
'weight_decay': 5e-4, # optimizer weight decay
|
29 |
'giou': 0.05, # giou loss gain
|
30 |
+
'cls': 0.5, # cls loss gain
|
31 |
'cls_pw': 1.0, # cls BCELoss positive_weight
|
32 |
'obj': 1.0, # obj loss gain (*=img_size/320 if img_size != 320)
|
33 |
'obj_pw': 1.0, # obj BCELoss positive_weight
|
34 |
'iou_t': 0.20, # iou training threshold
|
35 |
'anchor_t': 4.0, # anchor-multiple threshold
|
36 |
'fl_gamma': 0.0, # focal loss gamma (efficientDet default is gamma=1.5)
|
37 |
+
'hsv_h': 0.015, # image HSV-Hue augmentation (fraction)
|
38 |
+
'hsv_s': 0.7, # image HSV-Saturation augmentation (fraction)
|
39 |
+
'hsv_v': 0.4, # image HSV-Value augmentation (fraction)
|
40 |
'degrees': 0.0, # image rotation (+/- deg)
|
41 |
'translate': 0.0, # image translation (+/- fraction)
|
42 |
'scale': 0.5, # image scale (+/- gain)
|
|
|
159 |
model, optimizer = amp.initialize(model, optimizer, opt_level='O1', verbosity=0)
|
160 |
|
161 |
# Scheduler https://arxiv.org/pdf/1812.01187.pdf
|
162 |
+
lf = lambda x: (((1 + math.cos(x * math.pi / epochs)) / 2) ** 1.0) * 0.8 + 0.2 # cosine
|
163 |
scheduler = lr_scheduler.LambdaLR(optimizer, lr_lambda=lf)
|
164 |
# https://discuss.pytorch.org/t/a-problem-occured-when-resuming-an-optimizer/28822
|
165 |
# plot_lr_scheduler(optimizer, scheduler, epochs)
|
|
|
334 |
if rank in [-1, 0]:
|
335 |
# mAP
|
336 |
if ema is not None:
|
337 |
+
ema.update_attr(model, include=['yaml', 'nc', 'hyp', 'gr', 'names', 'stride'])
|
338 |
final_epoch = epoch + 1 == epochs
|
339 |
if not opt.notest or final_epoch: # Calculate mAP
|
340 |
results, maps, times = test.test(opt.data,
|
utils/torch_utils.py
CHANGED
@@ -65,7 +65,7 @@ def initialize_weights(model):
|
|
65 |
if t is nn.Conv2d:
|
66 |
pass # nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
|
67 |
elif t is nn.BatchNorm2d:
|
68 |
-
m.eps = 1e-
|
69 |
m.momentum = 0.03
|
70 |
elif t in [nn.LeakyReLU, nn.ReLU, nn.ReLU6]:
|
71 |
m.inplace = True
|
|
|
65 |
if t is nn.Conv2d:
|
66 |
pass # nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
|
67 |
elif t is nn.BatchNorm2d:
|
68 |
+
m.eps = 1e-3
|
69 |
m.momentum = 0.03
|
70 |
elif t in [nn.LeakyReLU, nn.ReLU, nn.ReLU6]:
|
71 |
m.inplace = True
|
utils/utils.py
CHANGED
@@ -5,10 +5,10 @@ import random
|
|
5 |
import shutil
|
6 |
import subprocess
|
7 |
import time
|
|
|
8 |
from copy import copy
|
9 |
from pathlib import Path
|
10 |
from sys import platform
|
11 |
-
from contextlib import contextmanager
|
12 |
|
13 |
import cv2
|
14 |
import matplotlib
|
@@ -110,6 +110,7 @@ def check_anchor_order(m):
|
|
110 |
da = a[-1] - a[0] # delta a
|
111 |
ds = m.stride[-1] - m.stride[0] # delta s
|
112 |
if da.sign() != ds.sign(): # same order
|
|
|
113 |
m.anchors[:] = m.anchors.flip(0)
|
114 |
m.anchor_grid[:] = m.anchor_grid.flip(0)
|
115 |
|
@@ -459,7 +460,7 @@ def compute_loss(p, targets, model): # predictions, targets, model
|
|
459 |
# per output
|
460 |
nt = 0 # number of targets
|
461 |
np = len(p) # number of outputs
|
462 |
-
balance = [
|
463 |
for i, pi in enumerate(p): # layer index, layer predictions
|
464 |
b, a, gj, gi = indices[i] # image, anchor, gridy, gridx
|
465 |
tobj = torch.zeros_like(pi[..., 0]).to(device) # target obj
|
@@ -493,7 +494,7 @@ def compute_loss(p, targets, model): # predictions, targets, model
|
|
493 |
|
494 |
s = 3 / np # output count scaling
|
495 |
lbox *= h['giou'] * s
|
496 |
-
lobj *= h['obj'] * s
|
497 |
lcls *= h['cls'] * s
|
498 |
bs = tobj.shape[0] # batch size
|
499 |
if red == 'sum':
|
@@ -1119,7 +1120,7 @@ def plot_study_txt(f='study.txt', x=None): # from utils.utils import *; plot_st
|
|
1119 |
ax2.plot(y[6, :j], y[3, :j] * 1E2, '.-', linewidth=2, markersize=8,
|
1120 |
label=Path(f).stem.replace('study_coco_', '').replace('yolo', 'YOLO'))
|
1121 |
|
1122 |
-
ax2.plot(1E3 / np.array([209, 140, 97, 58, 35, 18]), [33.
|
1123 |
'k.-', linewidth=2, markersize=8, alpha=.25, label='EfficientDet')
|
1124 |
|
1125 |
ax2.grid()
|
|
|
5 |
import shutil
|
6 |
import subprocess
|
7 |
import time
|
8 |
+
from contextlib import contextmanager
|
9 |
from copy import copy
|
10 |
from pathlib import Path
|
11 |
from sys import platform
|
|
|
12 |
|
13 |
import cv2
|
14 |
import matplotlib
|
|
|
110 |
da = a[-1] - a[0] # delta a
|
111 |
ds = m.stride[-1] - m.stride[0] # delta s
|
112 |
if da.sign() != ds.sign(): # same order
|
113 |
+
print('Reversing anchor order')
|
114 |
m.anchors[:] = m.anchors.flip(0)
|
115 |
m.anchor_grid[:] = m.anchor_grid.flip(0)
|
116 |
|
|
|
460 |
# per output
|
461 |
nt = 0 # number of targets
|
462 |
np = len(p) # number of outputs
|
463 |
+
balance = [4.0, 1.0, 0.4] if np == 3 else [4.0, 1.0, 0.4, 0.1] # P3-5 or P3-6
|
464 |
for i, pi in enumerate(p): # layer index, layer predictions
|
465 |
b, a, gj, gi = indices[i] # image, anchor, gridy, gridx
|
466 |
tobj = torch.zeros_like(pi[..., 0]).to(device) # target obj
|
|
|
494 |
|
495 |
s = 3 / np # output count scaling
|
496 |
lbox *= h['giou'] * s
|
497 |
+
lobj *= h['obj'] * s * (1.4 if np == 4 else 1.)
|
498 |
lcls *= h['cls'] * s
|
499 |
bs = tobj.shape[0] # batch size
|
500 |
if red == 'sum':
|
|
|
1120 |
ax2.plot(y[6, :j], y[3, :j] * 1E2, '.-', linewidth=2, markersize=8,
|
1121 |
label=Path(f).stem.replace('study_coco_', '').replace('yolo', 'YOLO'))
|
1122 |
|
1123 |
+
ax2.plot(1E3 / np.array([209, 140, 97, 58, 35, 18]), [33.8, 39.6, 43.0, 47.5, 49.4, 50.7],
|
1124 |
'k.-', linewidth=2, markersize=8, alpha=.25, label='EfficientDet')
|
1125 |
|
1126 |
ax2.grid()
|