glenn-jocher commited on
Commit
9da56b6
·
unverified ·
1 Parent(s): 5e970d4

v2.0 Release (#491)

Browse files

Signed-off-by: Glenn Jocher <glenn.jocher@ultralytics.com>

README.md CHANGED
@@ -8,26 +8,28 @@ This repository represents Ultralytics open-source research into future object d
8
 
9
  <img src="https://user-images.githubusercontent.com/26833433/85340570-30360a80-b49b-11ea-87cf-bdf33d53ae15.png" width="1000">** GPU Speed measures end-to-end time per image averaged over 5000 COCO val2017 images using a V100 GPU with batch size 8, and includes image preprocessing, PyTorch FP16 inference, postprocessing and NMS.
10
 
 
11
  - **June 22, 2020**: [PANet](https://arxiv.org/abs/1803.01534) updates: new heads, reduced parameters, faster inference and improved mAP [364fcfd](https://github.com/ultralytics/yolov5/commit/364fcfd7dba53f46edd4f04c037a039c0a287972).
12
  - **June 19, 2020**: [FP16](https://pytorch.org/docs/stable/nn.html#torch.nn.Module.half) as new default for smaller checkpoints and faster inference [d4c6674](https://github.com/ultralytics/yolov5/commit/d4c6674c98e19df4c40e33a777610a18d1961145).
13
  - **June 9, 2020**: [CSP](https://github.com/WongKinYiu/CrossStagePartialNetworks) updates: improved speed, size, and accuracy (credit to @WongKinYiu for CSP).
14
- - **May 27, 2020**: Public release of repo. YOLOv5 models are SOTA among all known YOLO implementations.
15
- - **April 1, 2020**: Start development of future [YOLOv3](https://github.com/ultralytics/yolov3)/[YOLOv4](https://github.com/AlexeyAB/darknet)-based PyTorch models in a range of compound-scaled sizes.
16
 
17
 
18
  ## Pretrained Checkpoints
19
 
20
  | Model | AP<sup>val</sup> | AP<sup>test</sup> | AP<sub>50</sub> | Speed<sub>GPU</sub> | FPS<sub>GPU</sub> || params | FLOPS |
21
  |---------- |------ |------ |------ | -------- | ------| ------ |------ | :------: |
22
- | [YOLOv5s](https://drive.google.com/open?id=1Drs_Aiu7xx6S-ix95f9kNsA6ueKRpN2J) | 36.6 | 36.6 | 55.8 | **2.1ms** | **476** || 7.5M | 13.2B
23
- | [YOLOv5m](https://drive.google.com/open?id=1Drs_Aiu7xx6S-ix95f9kNsA6ueKRpN2J) | 43.4 | 43.4 | 62.4 | 3.0ms | 333 || 21.8M | 39.4B
24
- | [YOLOv5l](https://drive.google.com/open?id=1Drs_Aiu7xx6S-ix95f9kNsA6ueKRpN2J) | 46.6 | 46.7 | 65.4 | 3.9ms | 256 || 47.8M | 88.1B
25
- | [YOLOv5x](https://drive.google.com/open?id=1Drs_Aiu7xx6S-ix95f9kNsA6ueKRpN2J) | **48.4** | **48.4** | **66.9** | 6.1ms | 164 || 89.0M | 166.4B
 
26
  | [YOLOv3-SPP](https://drive.google.com/open?id=1Drs_Aiu7xx6S-ix95f9kNsA6ueKRpN2J) | 45.6 | 45.5 | 65.2 | 4.5ms | 222 || 63.0M | 118.0B
27
 
28
 
29
  ** AP<sup>test</sup> denotes COCO [test-dev2017](http://cocodataset.org/#upload) server results, all other AP results in the table denote val2017 accuracy.
30
- ** All AP numbers are for single-model single-scale without ensemble or test-time augmentation. Reproduce by `python test.py --data coco.yaml --img 736 --conf 0.001`
31
  ** Speed<sub>GPU</sub> measures end-to-end time per image averaged over 5000 COCO val2017 images using a GCP [n1-standard-16](https://cloud.google.com/compute/docs/machine-types#n1_standard_machine_types) instance with one V100 GPU, and includes image preprocessing, PyTorch FP16 image inference at --batch-size 32 --img-size 640, postprocessing and NMS. Average NMS time included in this chart is 1-2ms/img. Reproduce by `python test.py --data coco.yaml --img 640 --conf 0.1`
32
  ** All checkpoints are trained to 300 epochs with default settings and hyperparameters (no autoaugmentation).
33
 
 
8
 
9
  <img src="https://user-images.githubusercontent.com/26833433/85340570-30360a80-b49b-11ea-87cf-bdf33d53ae15.png" width="1000">** GPU Speed measures end-to-end time per image averaged over 5000 COCO val2017 images using a V100 GPU with batch size 8, and includes image preprocessing, PyTorch FP16 inference, postprocessing and NMS.
10
 
11
+ - **July 23, 2020**: [v2.0 release](https://arxiv.org/abs/1803.01534): improved model definition, training and mAP []().
12
  - **June 22, 2020**: [PANet](https://arxiv.org/abs/1803.01534) updates: new heads, reduced parameters, faster inference and improved mAP [364fcfd](https://github.com/ultralytics/yolov5/commit/364fcfd7dba53f46edd4f04c037a039c0a287972).
13
  - **June 19, 2020**: [FP16](https://pytorch.org/docs/stable/nn.html#torch.nn.Module.half) as new default for smaller checkpoints and faster inference [d4c6674](https://github.com/ultralytics/yolov5/commit/d4c6674c98e19df4c40e33a777610a18d1961145).
14
  - **June 9, 2020**: [CSP](https://github.com/WongKinYiu/CrossStagePartialNetworks) updates: improved speed, size, and accuracy (credit to @WongKinYiu for CSP).
15
+ - **May 27, 2020**: Public release. YOLOv5 models are SOTA among all known YOLO implementations.
16
+ - **April 1, 2020**: Start development of future compound-scaled [YOLOv3](https://github.com/ultralytics/yolov3)/[YOLOv4](https://github.com/AlexeyAB/darknet)-based PyTorch models.
17
 
18
 
19
  ## Pretrained Checkpoints
20
 
21
  | Model | AP<sup>val</sup> | AP<sup>test</sup> | AP<sub>50</sub> | Speed<sub>GPU</sub> | FPS<sub>GPU</sub> || params | FLOPS |
22
  |---------- |------ |------ |------ | -------- | ------| ------ |------ | :------: |
23
+ | [YOLOv5.1s](https://drive.google.com/open?id=1Drs_Aiu7xx6S-ix95f9kNsA6ueKRpN2J) | 36.1 | 36.1 | 55.3 | **2.1ms** | **476** || 7.5M | 13.2B
24
+ | [YOLOv5.1m](https://drive.google.com/open?id=1Drs_Aiu7xx6S-ix95f9kNsA6ueKRpN2J) | 43.5 | 43.5 | 62.5 | 3.0ms | 333 || 21.8M | 39.4B
25
+ | [YOLOv5.1l](https://drive.google.com/open?id=1Drs_Aiu7xx6S-ix95f9kNsA6ueKRpN2J) | 47.0 | 47.1 | 65.6 | 3.9ms | 256 || 47.8M | 88.1B
26
+ | [YOLOv5.1x](https://drive.google.com/open?id=1Drs_Aiu7xx6S-ix95f9kNsA6ueKRpN2J) | **49.0** | **49.0** | **67.4** | 6.1ms | 164 || 89.0M | 166.4B
27
+ | | | | | | || |
28
  | [YOLOv3-SPP](https://drive.google.com/open?id=1Drs_Aiu7xx6S-ix95f9kNsA6ueKRpN2J) | 45.6 | 45.5 | 65.2 | 4.5ms | 222 || 63.0M | 118.0B
29
 
30
 
31
  ** AP<sup>test</sup> denotes COCO [test-dev2017](http://cocodataset.org/#upload) server results, all other AP results in the table denote val2017 accuracy.
32
+ ** All AP numbers are for single-model single-scale without ensemble or test-time augmentation. Reproduce by `python test.py --data coco.yaml --img 672 --conf 0.001`
33
  ** Speed<sub>GPU</sub> measures end-to-end time per image averaged over 5000 COCO val2017 images using a GCP [n1-standard-16](https://cloud.google.com/compute/docs/machine-types#n1_standard_machine_types) instance with one V100 GPU, and includes image preprocessing, PyTorch FP16 image inference at --batch-size 32 --img-size 640, postprocessing and NMS. Average NMS time included in this chart is 1-2ms/img. Reproduce by `python test.py --data coco.yaml --img 640 --conf 0.1`
34
  ** All checkpoints are trained to 300 epochs with default settings and hyperparameters (no autoaugmentation).
35
 
models/yolo.py CHANGED
@@ -5,7 +5,7 @@ from models.experimental import *
5
 
6
 
7
  class Detect(nn.Module):
8
- def __init__(self, nc=80, anchors=()): # detection layer
9
  super(Detect, self).__init__()
10
  self.stride = None # strides computed during build
11
  self.nc = nc # number of classes
@@ -16,6 +16,7 @@ class Detect(nn.Module):
16
  a = torch.tensor(anchors).float().view(self.nl, -1, 2)
17
  self.register_buffer('anchors', a) # shape(nl,na,2)
18
  self.register_buffer('anchor_grid', a.clone().view(self.nl, 1, -1, 1, 1, 2)) # shape(nl,1,na,1,1,2)
 
19
  self.export = False # onnx export
20
 
21
  def forward(self, x):
@@ -23,6 +24,7 @@ class Detect(nn.Module):
23
  z = [] # inference output
24
  self.training |= self.export
25
  for i in range(self.nl):
 
26
  bs, _, ny, nx = x[i].shape # x(bs,255,20,20) to x(bs,3,20,20,85)
27
  x[i] = x[i].view(bs, self.na, self.no, ny, nx).permute(0, 1, 3, 4, 2).contiguous()
28
 
@@ -124,8 +126,7 @@ class Model(nn.Module):
124
  def _initialize_biases(self, cf=None): # initialize biases into Detect(), cf is class frequency
125
  # cf = torch.bincount(torch.tensor(np.concatenate(dataset.labels, 0)[:, 0]).long(), minlength=nc) + 1.
126
  m = self.model[-1] # Detect() module
127
- for f, s in zip(m.f, m.stride): #  from
128
- mi = self.model[f % m.i]
129
  b = mi.bias.view(m.na, -1) # conv.bias(255) to (3,85)
130
  b[:, 4] += math.log(8 / (640 / s) ** 2) # obj (8 objects per 640 image)
131
  b[:, 5:] += math.log(0.6 / (m.nc - 0.99)) if cf is None else torch.log(cf / cf.sum()) # cls
@@ -133,9 +134,9 @@ class Model(nn.Module):
133
 
134
  def _print_biases(self):
135
  m = self.model[-1] # Detect() module
136
- for f in sorted([x % m.i for x in m.f]): #  from
137
- b = self.model[f].bias.detach().view(m.na, -1).T # conv.bias(255) to (3,85)
138
- print(('%g Conv2d.bias:' + '%10.3g' * 6) % (f, *b[:5].mean(1).tolist(), b[5:].mean()))
139
 
140
  # def _print_weights(self):
141
  # for m in self.model.modules():
@@ -159,7 +160,7 @@ class Model(nn.Module):
159
  def parse_model(d, ch): # model_dict, input_channels(3)
160
  print('\n%3s%18s%3s%10s %-40s%-30s' % ('', 'from', 'n', 'params', 'module', 'arguments'))
161
  anchors, nc, gd, gw = d['anchors'], d['nc'], d['depth_multiple'], d['width_multiple']
162
- na = (len(anchors[0]) // 2) # number of anchors
163
  no = na * (nc + 5) # number of outputs = anchors * (classes + 5)
164
 
165
  layers, save, c2 = [], [], ch[-1] # layers, savelist, ch out
@@ -181,6 +182,7 @@ def parse_model(d, ch): # model_dict, input_channels(3)
181
  # e = math.log(c2 / ch[1]) / math.log(2)
182
  # c2 = int(ch[1] * ex ** e)
183
  # if m != Focus:
 
184
  c2 = make_divisible(c2 * gw, 8) if c2 != no else c2
185
 
186
  # Experimental
@@ -201,7 +203,9 @@ def parse_model(d, ch): # model_dict, input_channels(3)
201
  elif m is Concat:
202
  c2 = sum([ch[-1 if x == -1 else x + 1] for x in f])
203
  elif m is Detect:
204
- f = f or list(reversed([(-1 if j == i else j - 1) for j, x in enumerate(ch) if x == no]))
 
 
205
  else:
206
  c2 = ch[f]
207
 
 
5
 
6
 
7
  class Detect(nn.Module):
8
+ def __init__(self, nc=80, anchors=(), ch=()): # detection layer
9
  super(Detect, self).__init__()
10
  self.stride = None # strides computed during build
11
  self.nc = nc # number of classes
 
16
  a = torch.tensor(anchors).float().view(self.nl, -1, 2)
17
  self.register_buffer('anchors', a) # shape(nl,na,2)
18
  self.register_buffer('anchor_grid', a.clone().view(self.nl, 1, -1, 1, 1, 2)) # shape(nl,1,na,1,1,2)
19
+ self.m = nn.ModuleList(nn.Conv2d(x, self.no * self.na, 1) for x in ch) # output conv
20
  self.export = False # onnx export
21
 
22
  def forward(self, x):
 
24
  z = [] # inference output
25
  self.training |= self.export
26
  for i in range(self.nl):
27
+ x[i] = self.m[i](x[i]) # conv
28
  bs, _, ny, nx = x[i].shape # x(bs,255,20,20) to x(bs,3,20,20,85)
29
  x[i] = x[i].view(bs, self.na, self.no, ny, nx).permute(0, 1, 3, 4, 2).contiguous()
30
 
 
126
  def _initialize_biases(self, cf=None): # initialize biases into Detect(), cf is class frequency
127
  # cf = torch.bincount(torch.tensor(np.concatenate(dataset.labels, 0)[:, 0]).long(), minlength=nc) + 1.
128
  m = self.model[-1] # Detect() module
129
+ for mi, s in zip(m.m, m.stride): #  from
 
130
  b = mi.bias.view(m.na, -1) # conv.bias(255) to (3,85)
131
  b[:, 4] += math.log(8 / (640 / s) ** 2) # obj (8 objects per 640 image)
132
  b[:, 5:] += math.log(0.6 / (m.nc - 0.99)) if cf is None else torch.log(cf / cf.sum()) # cls
 
134
 
135
  def _print_biases(self):
136
  m = self.model[-1] # Detect() module
137
+ for mi in m.m: #  from
138
+ b = mi.bias.detach().view(m.na, -1).T # conv.bias(255) to (3,85)
139
+ print(('%6g Conv2d.bias:' + '%10.3g' * 6) % (mi.weight.shape[1], *b[:5].mean(1).tolist(), b[5:].mean()))
140
 
141
  # def _print_weights(self):
142
  # for m in self.model.modules():
 
160
  def parse_model(d, ch): # model_dict, input_channels(3)
161
  print('\n%3s%18s%3s%10s %-40s%-30s' % ('', 'from', 'n', 'params', 'module', 'arguments'))
162
  anchors, nc, gd, gw = d['anchors'], d['nc'], d['depth_multiple'], d['width_multiple']
163
+ na = (len(anchors[0]) // 2) if isinstance(anchors, list) else anchors # number of anchors
164
  no = na * (nc + 5) # number of outputs = anchors * (classes + 5)
165
 
166
  layers, save, c2 = [], [], ch[-1] # layers, savelist, ch out
 
182
  # e = math.log(c2 / ch[1]) / math.log(2)
183
  # c2 = int(ch[1] * ex ** e)
184
  # if m != Focus:
185
+
186
  c2 = make_divisible(c2 * gw, 8) if c2 != no else c2
187
 
188
  # Experimental
 
203
  elif m is Concat:
204
  c2 = sum([ch[-1 if x == -1 else x + 1] for x in f])
205
  elif m is Detect:
206
+ args.append([ch[x + 1] for x in f])
207
+ if isinstance(args[1], int): # number of anchors
208
+ args[1] = [list(range(args[1] * 2))] * len(f)
209
  else:
210
  c2 = ch[f]
211
 
models/yolov5l.yaml CHANGED
@@ -5,9 +5,9 @@ width_multiple: 1.0 # layer channel multiple
5
 
6
  # anchors
7
  anchors:
8
- - [116,90, 156,198, 373,326] # P5/32
9
- - [30,61, 62,45, 59,119] # P4/16
10
  - [10,13, 16,30, 33,23] # P3/8
 
 
11
 
12
  # YOLOv5 backbone
13
  backbone:
@@ -19,15 +19,14 @@ backbone:
19
  [-1, 9, BottleneckCSP, [256]],
20
  [-1, 1, Conv, [512, 3, 2]], # 5-P4/16
21
  [-1, 9, BottleneckCSP, [512]],
22
- [-1, 1, Conv, [1024, 3, 2]], # 7-P5/32
23
  [-1, 1, SPP, [1024, [5, 9, 13]]],
 
24
  ]
25
 
26
  # YOLOv5 head
27
  head:
28
- [[-1, 3, BottleneckCSP, [1024, False]], # 9
29
-
30
- [-1, 1, Conv, [512, 1, 1]],
31
  [-1, 1, nn.Upsample, [None, 2, 'nearest']],
32
  [[-1, 6], 1, Concat, [1]], # cat backbone P4
33
  [-1, 3, BottleneckCSP, [512, False]], # 13
@@ -35,18 +34,15 @@ head:
35
  [-1, 1, Conv, [256, 1, 1]],
36
  [-1, 1, nn.Upsample, [None, 2, 'nearest']],
37
  [[-1, 4], 1, Concat, [1]], # cat backbone P3
38
- [-1, 3, BottleneckCSP, [256, False]],
39
- [-1, 1, nn.Conv2d, [na * (nc + 5), 1, 1]], # 18 (P3/8-small)
40
 
41
- [-2, 1, Conv, [256, 3, 2]],
42
  [[-1, 14], 1, Concat, [1]], # cat head P4
43
- [-1, 3, BottleneckCSP, [512, False]],
44
- [-1, 1, nn.Conv2d, [na * (nc + 5), 1, 1]], # 22 (P4/16-medium)
45
 
46
- [-2, 1, Conv, [512, 3, 2]],
47
  [[-1, 10], 1, Concat, [1]], # cat head P5
48
- [-1, 3, BottleneckCSP, [1024, False]],
49
- [-1, 1, nn.Conv2d, [na * (nc + 5), 1, 1]], # 26 (P5/32-large)
50
 
51
- [[], 1, Detect, [nc, anchors]], # Detect(P5, P4, P3)
52
  ]
 
5
 
6
  # anchors
7
  anchors:
 
 
8
  - [10,13, 16,30, 33,23] # P3/8
9
+ - [30,61, 62,45, 59,119] # P4/16
10
+ - [116,90, 156,198, 373,326] # P5/32
11
 
12
  # YOLOv5 backbone
13
  backbone:
 
19
  [-1, 9, BottleneckCSP, [256]],
20
  [-1, 1, Conv, [512, 3, 2]], # 5-P4/16
21
  [-1, 9, BottleneckCSP, [512]],
22
+ [-1, 1, Conv, [1024, 3, 2]], # 7-P5/32
23
  [-1, 1, SPP, [1024, [5, 9, 13]]],
24
+ [-1, 3, BottleneckCSP, [1024, False]], # 9
25
  ]
26
 
27
  # YOLOv5 head
28
  head:
29
+ [[-1, 1, Conv, [512, 1, 1]],
 
 
30
  [-1, 1, nn.Upsample, [None, 2, 'nearest']],
31
  [[-1, 6], 1, Concat, [1]], # cat backbone P4
32
  [-1, 3, BottleneckCSP, [512, False]], # 13
 
34
  [-1, 1, Conv, [256, 1, 1]],
35
  [-1, 1, nn.Upsample, [None, 2, 'nearest']],
36
  [[-1, 4], 1, Concat, [1]], # cat backbone P3
37
+ [-1, 3, BottleneckCSP, [256, False]], # 17
 
38
 
39
+ [-1, 1, Conv, [256, 3, 2]],
40
  [[-1, 14], 1, Concat, [1]], # cat head P4
41
+ [-1, 3, BottleneckCSP, [512, False]], # 20
 
42
 
43
+ [-1, 1, Conv, [512, 3, 2]],
44
  [[-1, 10], 1, Concat, [1]], # cat head P5
45
+ [-1, 3, BottleneckCSP, [1024, False]], # 23
 
46
 
47
+ [[17, 20, 23], 1, Detect, [nc, anchors]], # Detect(P3, P4, P5)
48
  ]
models/yolov5m.yaml CHANGED
@@ -5,9 +5,9 @@ width_multiple: 0.75 # layer channel multiple
5
 
6
  # anchors
7
  anchors:
8
- - [116,90, 156,198, 373,326] # P5/32
9
- - [30,61, 62,45, 59,119] # P4/16
10
  - [10,13, 16,30, 33,23] # P3/8
 
 
11
 
12
  # YOLOv5 backbone
13
  backbone:
@@ -19,15 +19,14 @@ backbone:
19
  [-1, 9, BottleneckCSP, [256]],
20
  [-1, 1, Conv, [512, 3, 2]], # 5-P4/16
21
  [-1, 9, BottleneckCSP, [512]],
22
- [-1, 1, Conv, [1024, 3, 2]], # 7-P5/32
23
  [-1, 1, SPP, [1024, [5, 9, 13]]],
 
24
  ]
25
 
26
  # YOLOv5 head
27
  head:
28
- [[-1, 3, BottleneckCSP, [1024, False]], # 9
29
-
30
- [-1, 1, Conv, [512, 1, 1]],
31
  [-1, 1, nn.Upsample, [None, 2, 'nearest']],
32
  [[-1, 6], 1, Concat, [1]], # cat backbone P4
33
  [-1, 3, BottleneckCSP, [512, False]], # 13
@@ -35,18 +34,15 @@ head:
35
  [-1, 1, Conv, [256, 1, 1]],
36
  [-1, 1, nn.Upsample, [None, 2, 'nearest']],
37
  [[-1, 4], 1, Concat, [1]], # cat backbone P3
38
- [-1, 3, BottleneckCSP, [256, False]],
39
- [-1, 1, nn.Conv2d, [na * (nc + 5), 1, 1]], # 18 (P3/8-small)
40
 
41
- [-2, 1, Conv, [256, 3, 2]],
42
  [[-1, 14], 1, Concat, [1]], # cat head P4
43
- [-1, 3, BottleneckCSP, [512, False]],
44
- [-1, 1, nn.Conv2d, [na * (nc + 5), 1, 1]], # 22 (P4/16-medium)
45
 
46
- [-2, 1, Conv, [512, 3, 2]],
47
  [[-1, 10], 1, Concat, [1]], # cat head P5
48
- [-1, 3, BottleneckCSP, [1024, False]],
49
- [-1, 1, nn.Conv2d, [na * (nc + 5), 1, 1]], # 26 (P5/32-large)
50
 
51
- [[], 1, Detect, [nc, anchors]], # Detect(P5, P4, P3)
52
  ]
 
5
 
6
  # anchors
7
  anchors:
 
 
8
  - [10,13, 16,30, 33,23] # P3/8
9
+ - [30,61, 62,45, 59,119] # P4/16
10
+ - [116,90, 156,198, 373,326] # P5/32
11
 
12
  # YOLOv5 backbone
13
  backbone:
 
19
  [-1, 9, BottleneckCSP, [256]],
20
  [-1, 1, Conv, [512, 3, 2]], # 5-P4/16
21
  [-1, 9, BottleneckCSP, [512]],
22
+ [-1, 1, Conv, [1024, 3, 2]], # 7-P5/32
23
  [-1, 1, SPP, [1024, [5, 9, 13]]],
24
+ [-1, 3, BottleneckCSP, [1024, False]], # 9
25
  ]
26
 
27
  # YOLOv5 head
28
  head:
29
+ [[-1, 1, Conv, [512, 1, 1]],
 
 
30
  [-1, 1, nn.Upsample, [None, 2, 'nearest']],
31
  [[-1, 6], 1, Concat, [1]], # cat backbone P4
32
  [-1, 3, BottleneckCSP, [512, False]], # 13
 
34
  [-1, 1, Conv, [256, 1, 1]],
35
  [-1, 1, nn.Upsample, [None, 2, 'nearest']],
36
  [[-1, 4], 1, Concat, [1]], # cat backbone P3
37
+ [-1, 3, BottleneckCSP, [256, False]], # 17
 
38
 
39
+ [-1, 1, Conv, [256, 3, 2]],
40
  [[-1, 14], 1, Concat, [1]], # cat head P4
41
+ [-1, 3, BottleneckCSP, [512, False]], # 20
 
42
 
43
+ [-1, 1, Conv, [512, 3, 2]],
44
  [[-1, 10], 1, Concat, [1]], # cat head P5
45
+ [-1, 3, BottleneckCSP, [1024, False]], # 23
 
46
 
47
+ [[17, 20, 23], 1, Detect, [nc, anchors]], # Detect(P3, P4, P5)
48
  ]
models/yolov5s.yaml CHANGED
@@ -5,9 +5,9 @@ width_multiple: 0.50 # layer channel multiple
5
 
6
  # anchors
7
  anchors:
8
- - [116,90, 156,198, 373,326] # P5/32
9
- - [30,61, 62,45, 59,119] # P4/16
10
  - [10,13, 16,30, 33,23] # P3/8
 
 
11
 
12
  # YOLOv5 backbone
13
  backbone:
@@ -19,15 +19,14 @@ backbone:
19
  [-1, 9, BottleneckCSP, [256]],
20
  [-1, 1, Conv, [512, 3, 2]], # 5-P4/16
21
  [-1, 9, BottleneckCSP, [512]],
22
- [-1, 1, Conv, [1024, 3, 2]], # 7-P5/32
23
  [-1, 1, SPP, [1024, [5, 9, 13]]],
 
24
  ]
25
 
26
  # YOLOv5 head
27
  head:
28
- [[-1, 3, BottleneckCSP, [1024, False]], # 9
29
-
30
- [-1, 1, Conv, [512, 1, 1]],
31
  [-1, 1, nn.Upsample, [None, 2, 'nearest']],
32
  [[-1, 6], 1, Concat, [1]], # cat backbone P4
33
  [-1, 3, BottleneckCSP, [512, False]], # 13
@@ -35,18 +34,15 @@ head:
35
  [-1, 1, Conv, [256, 1, 1]],
36
  [-1, 1, nn.Upsample, [None, 2, 'nearest']],
37
  [[-1, 4], 1, Concat, [1]], # cat backbone P3
38
- [-1, 3, BottleneckCSP, [256, False]],
39
- [-1, 1, nn.Conv2d, [na * (nc + 5), 1, 1]], # 18 (P3/8-small)
40
 
41
- [-2, 1, Conv, [256, 3, 2]],
42
  [[-1, 14], 1, Concat, [1]], # cat head P4
43
- [-1, 3, BottleneckCSP, [512, False]],
44
- [-1, 1, nn.Conv2d, [na * (nc + 5), 1, 1]], # 22 (P4/16-medium)
45
 
46
- [-2, 1, Conv, [512, 3, 2]],
47
  [[-1, 10], 1, Concat, [1]], # cat head P5
48
- [-1, 3, BottleneckCSP, [1024, False]],
49
- [-1, 1, nn.Conv2d, [na * (nc + 5), 1, 1]], # 26 (P5/32-large)
50
 
51
- [[], 1, Detect, [nc, anchors]], # Detect(P5, P4, P3)
52
  ]
 
5
 
6
  # anchors
7
  anchors:
 
 
8
  - [10,13, 16,30, 33,23] # P3/8
9
+ - [30,61, 62,45, 59,119] # P4/16
10
+ - [116,90, 156,198, 373,326] # P5/32
11
 
12
  # YOLOv5 backbone
13
  backbone:
 
19
  [-1, 9, BottleneckCSP, [256]],
20
  [-1, 1, Conv, [512, 3, 2]], # 5-P4/16
21
  [-1, 9, BottleneckCSP, [512]],
22
+ [-1, 1, Conv, [1024, 3, 2]], # 7-P5/32
23
  [-1, 1, SPP, [1024, [5, 9, 13]]],
24
+ [-1, 3, BottleneckCSP, [1024, False]], # 9
25
  ]
26
 
27
  # YOLOv5 head
28
  head:
29
+ [[-1, 1, Conv, [512, 1, 1]],
 
 
30
  [-1, 1, nn.Upsample, [None, 2, 'nearest']],
31
  [[-1, 6], 1, Concat, [1]], # cat backbone P4
32
  [-1, 3, BottleneckCSP, [512, False]], # 13
 
34
  [-1, 1, Conv, [256, 1, 1]],
35
  [-1, 1, nn.Upsample, [None, 2, 'nearest']],
36
  [[-1, 4], 1, Concat, [1]], # cat backbone P3
37
+ [-1, 3, BottleneckCSP, [256, False]], # 17
 
38
 
39
+ [-1, 1, Conv, [256, 3, 2]],
40
  [[-1, 14], 1, Concat, [1]], # cat head P4
41
+ [-1, 3, BottleneckCSP, [512, False]], # 20
 
42
 
43
+ [-1, 1, Conv, [512, 3, 2]],
44
  [[-1, 10], 1, Concat, [1]], # cat head P5
45
+ [-1, 3, BottleneckCSP, [1024, False]], # 23
 
46
 
47
+ [[17, 20, 23], 1, Detect, [nc, anchors]], # Detect(P3, P4, P5)
48
  ]
models/yolov5x.yaml CHANGED
@@ -5,9 +5,9 @@ width_multiple: 1.25 # layer channel multiple
5
 
6
  # anchors
7
  anchors:
8
- - [116,90, 156,198, 373,326] # P5/32
9
- - [30,61, 62,45, 59,119] # P4/16
10
  - [10,13, 16,30, 33,23] # P3/8
 
 
11
 
12
  # YOLOv5 backbone
13
  backbone:
@@ -19,15 +19,14 @@ backbone:
19
  [-1, 9, BottleneckCSP, [256]],
20
  [-1, 1, Conv, [512, 3, 2]], # 5-P4/16
21
  [-1, 9, BottleneckCSP, [512]],
22
- [-1, 1, Conv, [1024, 3, 2]], # 7-P5/32
23
  [-1, 1, SPP, [1024, [5, 9, 13]]],
 
24
  ]
25
 
26
  # YOLOv5 head
27
  head:
28
- [[-1, 3, BottleneckCSP, [1024, False]], # 9
29
-
30
- [-1, 1, Conv, [512, 1, 1]],
31
  [-1, 1, nn.Upsample, [None, 2, 'nearest']],
32
  [[-1, 6], 1, Concat, [1]], # cat backbone P4
33
  [-1, 3, BottleneckCSP, [512, False]], # 13
@@ -35,18 +34,15 @@ head:
35
  [-1, 1, Conv, [256, 1, 1]],
36
  [-1, 1, nn.Upsample, [None, 2, 'nearest']],
37
  [[-1, 4], 1, Concat, [1]], # cat backbone P3
38
- [-1, 3, BottleneckCSP, [256, False]],
39
- [-1, 1, nn.Conv2d, [na * (nc + 5), 1, 1]], # 18 (P3/8-small)
40
 
41
- [-2, 1, Conv, [256, 3, 2]],
42
  [[-1, 14], 1, Concat, [1]], # cat head P4
43
- [-1, 3, BottleneckCSP, [512, False]],
44
- [-1, 1, nn.Conv2d, [na * (nc + 5), 1, 1]], # 22 (P4/16-medium)
45
 
46
- [-2, 1, Conv, [512, 3, 2]],
47
  [[-1, 10], 1, Concat, [1]], # cat head P5
48
- [-1, 3, BottleneckCSP, [1024, False]],
49
- [-1, 1, nn.Conv2d, [na * (nc + 5), 1, 1]], # 26 (P5/32-large)
50
 
51
- [[], 1, Detect, [nc, anchors]], # Detect(P5, P4, P3)
52
  ]
 
5
 
6
  # anchors
7
  anchors:
 
 
8
  - [10,13, 16,30, 33,23] # P3/8
9
+ - [30,61, 62,45, 59,119] # P4/16
10
+ - [116,90, 156,198, 373,326] # P5/32
11
 
12
  # YOLOv5 backbone
13
  backbone:
 
19
  [-1, 9, BottleneckCSP, [256]],
20
  [-1, 1, Conv, [512, 3, 2]], # 5-P4/16
21
  [-1, 9, BottleneckCSP, [512]],
22
+ [-1, 1, Conv, [1024, 3, 2]], # 7-P5/32
23
  [-1, 1, SPP, [1024, [5, 9, 13]]],
24
+ [-1, 3, BottleneckCSP, [1024, False]], # 9
25
  ]
26
 
27
  # YOLOv5 head
28
  head:
29
+ [[-1, 1, Conv, [512, 1, 1]],
 
 
30
  [-1, 1, nn.Upsample, [None, 2, 'nearest']],
31
  [[-1, 6], 1, Concat, [1]], # cat backbone P4
32
  [-1, 3, BottleneckCSP, [512, False]], # 13
 
34
  [-1, 1, Conv, [256, 1, 1]],
35
  [-1, 1, nn.Upsample, [None, 2, 'nearest']],
36
  [[-1, 4], 1, Concat, [1]], # cat backbone P3
37
+ [-1, 3, BottleneckCSP, [256, False]], # 17
 
38
 
39
+ [-1, 1, Conv, [256, 3, 2]],
40
  [[-1, 14], 1, Concat, [1]], # cat head P4
41
+ [-1, 3, BottleneckCSP, [512, False]], # 20
 
42
 
43
+ [-1, 1, Conv, [512, 3, 2]],
44
  [[-1, 10], 1, Concat, [1]], # cat head P5
45
+ [-1, 3, BottleneckCSP, [1024, False]], # 23
 
46
 
47
+ [[17, 20, 23], 1, Detect, [nc, anchors]], # Detect(P3, P4, P5)
48
  ]
train.py CHANGED
@@ -27,16 +27,16 @@ hyp = {'optimizer': 'SGD', # ['adam', 'SGD', None] if none, default is SGD
27
  'momentum': 0.937, # SGD momentum/Adam beta1
28
  'weight_decay': 5e-4, # optimizer weight decay
29
  'giou': 0.05, # giou loss gain
30
- 'cls': 0.58, # cls loss gain
31
  'cls_pw': 1.0, # cls BCELoss positive_weight
32
  'obj': 1.0, # obj loss gain (*=img_size/320 if img_size != 320)
33
  'obj_pw': 1.0, # obj BCELoss positive_weight
34
  'iou_t': 0.20, # iou training threshold
35
  'anchor_t': 4.0, # anchor-multiple threshold
36
  'fl_gamma': 0.0, # focal loss gamma (efficientDet default is gamma=1.5)
37
- 'hsv_h': 0.014, # image HSV-Hue augmentation (fraction)
38
- 'hsv_s': 0.68, # image HSV-Saturation augmentation (fraction)
39
- 'hsv_v': 0.36, # image HSV-Value augmentation (fraction)
40
  'degrees': 0.0, # image rotation (+/- deg)
41
  'translate': 0.0, # image translation (+/- fraction)
42
  'scale': 0.5, # image scale (+/- gain)
@@ -159,7 +159,7 @@ def train(hyp, tb_writer, opt, device):
159
  model, optimizer = amp.initialize(model, optimizer, opt_level='O1', verbosity=0)
160
 
161
  # Scheduler https://arxiv.org/pdf/1812.01187.pdf
162
- lf = lambda x: (((1 + math.cos(x * math.pi / epochs)) / 2) ** 1.0) * 0.9 + 0.1 # cosine
163
  scheduler = lr_scheduler.LambdaLR(optimizer, lr_lambda=lf)
164
  # https://discuss.pytorch.org/t/a-problem-occured-when-resuming-an-optimizer/28822
165
  # plot_lr_scheduler(optimizer, scheduler, epochs)
@@ -334,7 +334,7 @@ def train(hyp, tb_writer, opt, device):
334
  if rank in [-1, 0]:
335
  # mAP
336
  if ema is not None:
337
- ema.update_attr(model, include=['md', 'nc', 'hyp', 'gr', 'names', 'stride'])
338
  final_epoch = epoch + 1 == epochs
339
  if not opt.notest or final_epoch: # Calculate mAP
340
  results, maps, times = test.test(opt.data,
 
27
  'momentum': 0.937, # SGD momentum/Adam beta1
28
  'weight_decay': 5e-4, # optimizer weight decay
29
  'giou': 0.05, # giou loss gain
30
+ 'cls': 0.5, # cls loss gain
31
  'cls_pw': 1.0, # cls BCELoss positive_weight
32
  'obj': 1.0, # obj loss gain (*=img_size/320 if img_size != 320)
33
  'obj_pw': 1.0, # obj BCELoss positive_weight
34
  'iou_t': 0.20, # iou training threshold
35
  'anchor_t': 4.0, # anchor-multiple threshold
36
  'fl_gamma': 0.0, # focal loss gamma (efficientDet default is gamma=1.5)
37
+ 'hsv_h': 0.015, # image HSV-Hue augmentation (fraction)
38
+ 'hsv_s': 0.7, # image HSV-Saturation augmentation (fraction)
39
+ 'hsv_v': 0.4, # image HSV-Value augmentation (fraction)
40
  'degrees': 0.0, # image rotation (+/- deg)
41
  'translate': 0.0, # image translation (+/- fraction)
42
  'scale': 0.5, # image scale (+/- gain)
 
159
  model, optimizer = amp.initialize(model, optimizer, opt_level='O1', verbosity=0)
160
 
161
  # Scheduler https://arxiv.org/pdf/1812.01187.pdf
162
+ lf = lambda x: (((1 + math.cos(x * math.pi / epochs)) / 2) ** 1.0) * 0.8 + 0.2 # cosine
163
  scheduler = lr_scheduler.LambdaLR(optimizer, lr_lambda=lf)
164
  # https://discuss.pytorch.org/t/a-problem-occured-when-resuming-an-optimizer/28822
165
  # plot_lr_scheduler(optimizer, scheduler, epochs)
 
334
  if rank in [-1, 0]:
335
  # mAP
336
  if ema is not None:
337
+ ema.update_attr(model, include=['yaml', 'nc', 'hyp', 'gr', 'names', 'stride'])
338
  final_epoch = epoch + 1 == epochs
339
  if not opt.notest or final_epoch: # Calculate mAP
340
  results, maps, times = test.test(opt.data,
utils/torch_utils.py CHANGED
@@ -65,7 +65,7 @@ def initialize_weights(model):
65
  if t is nn.Conv2d:
66
  pass # nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
67
  elif t is nn.BatchNorm2d:
68
- m.eps = 1e-4
69
  m.momentum = 0.03
70
  elif t in [nn.LeakyReLU, nn.ReLU, nn.ReLU6]:
71
  m.inplace = True
 
65
  if t is nn.Conv2d:
66
  pass # nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
67
  elif t is nn.BatchNorm2d:
68
+ m.eps = 1e-3
69
  m.momentum = 0.03
70
  elif t in [nn.LeakyReLU, nn.ReLU, nn.ReLU6]:
71
  m.inplace = True
utils/utils.py CHANGED
@@ -5,10 +5,10 @@ import random
5
  import shutil
6
  import subprocess
7
  import time
 
8
  from copy import copy
9
  from pathlib import Path
10
  from sys import platform
11
- from contextlib import contextmanager
12
 
13
  import cv2
14
  import matplotlib
@@ -110,6 +110,7 @@ def check_anchor_order(m):
110
  da = a[-1] - a[0] # delta a
111
  ds = m.stride[-1] - m.stride[0] # delta s
112
  if da.sign() != ds.sign(): # same order
 
113
  m.anchors[:] = m.anchors.flip(0)
114
  m.anchor_grid[:] = m.anchor_grid.flip(0)
115
 
@@ -459,7 +460,7 @@ def compute_loss(p, targets, model): # predictions, targets, model
459
  # per output
460
  nt = 0 # number of targets
461
  np = len(p) # number of outputs
462
- balance = [1.0, 1.0, 1.0]
463
  for i, pi in enumerate(p): # layer index, layer predictions
464
  b, a, gj, gi = indices[i] # image, anchor, gridy, gridx
465
  tobj = torch.zeros_like(pi[..., 0]).to(device) # target obj
@@ -493,7 +494,7 @@ def compute_loss(p, targets, model): # predictions, targets, model
493
 
494
  s = 3 / np # output count scaling
495
  lbox *= h['giou'] * s
496
- lobj *= h['obj'] * s
497
  lcls *= h['cls'] * s
498
  bs = tobj.shape[0] # batch size
499
  if red == 'sum':
@@ -1119,7 +1120,7 @@ def plot_study_txt(f='study.txt', x=None): # from utils.utils import *; plot_st
1119
  ax2.plot(y[6, :j], y[3, :j] * 1E2, '.-', linewidth=2, markersize=8,
1120
  label=Path(f).stem.replace('study_coco_', '').replace('yolo', 'YOLO'))
1121
 
1122
- ax2.plot(1E3 / np.array([209, 140, 97, 58, 35, 18]), [33.5, 39.1, 42.5, 45.9, 49., 50.5],
1123
  'k.-', linewidth=2, markersize=8, alpha=.25, label='EfficientDet')
1124
 
1125
  ax2.grid()
 
5
  import shutil
6
  import subprocess
7
  import time
8
+ from contextlib import contextmanager
9
  from copy import copy
10
  from pathlib import Path
11
  from sys import platform
 
12
 
13
  import cv2
14
  import matplotlib
 
110
  da = a[-1] - a[0] # delta a
111
  ds = m.stride[-1] - m.stride[0] # delta s
112
  if da.sign() != ds.sign(): # same order
113
+ print('Reversing anchor order')
114
  m.anchors[:] = m.anchors.flip(0)
115
  m.anchor_grid[:] = m.anchor_grid.flip(0)
116
 
 
460
  # per output
461
  nt = 0 # number of targets
462
  np = len(p) # number of outputs
463
+ balance = [4.0, 1.0, 0.4] if np == 3 else [4.0, 1.0, 0.4, 0.1] # P3-5 or P3-6
464
  for i, pi in enumerate(p): # layer index, layer predictions
465
  b, a, gj, gi = indices[i] # image, anchor, gridy, gridx
466
  tobj = torch.zeros_like(pi[..., 0]).to(device) # target obj
 
494
 
495
  s = 3 / np # output count scaling
496
  lbox *= h['giou'] * s
497
+ lobj *= h['obj'] * s * (1.4 if np == 4 else 1.)
498
  lcls *= h['cls'] * s
499
  bs = tobj.shape[0] # batch size
500
  if red == 'sum':
 
1120
  ax2.plot(y[6, :j], y[3, :j] * 1E2, '.-', linewidth=2, markersize=8,
1121
  label=Path(f).stem.replace('study_coco_', '').replace('yolo', 'YOLO'))
1122
 
1123
+ ax2.plot(1E3 / np.array([209, 140, 97, 58, 35, 18]), [33.8, 39.6, 43.0, 47.5, 49.4, 50.7],
1124
  'k.-', linewidth=2, markersize=8, alpha=.25, label='EfficientDet')
1125
 
1126
  ax2.grid()