glenn-jocher commited on
Commit
69be8e7
·
unverified ·
1 Parent(s): 0e341c5

YOLOv5 v4.0 Release (#1837)

Browse files

* Update C3 module

* Update C3 module

* Update C3 module

* Update C3 module

* update

* update

* update

* update

* update

* update

* update

* update

* update

* updates

* updates

* updates

* updates

* updates

* updates

* updates

* updates

* updates

* updates

* update

* update

* update

* update

* updates

* updates

* updates

* updates

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update datasets

* update

* update

* update

* update attempt_downlaod()

* merge

* merge

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* parameterize eps

* comments

* gs-multiple

* update

* max_nms implemented

* Create one_cycle() function

* update

* update

* update

* update

* update

* update

* update

* update study.png

* update study.png

* Update datasets.py

README.md CHANGED
@@ -4,28 +4,32 @@
4
 
5
  ![CI CPU testing](https://github.com/ultralytics/yolov5/workflows/CI%20CPU%20testing/badge.svg)
6
 
7
- This repository represents Ultralytics open-source research into future object detection methods, and incorporates our lessons learned and best practices evolved over training thousands of models on custom client datasets with our previous YOLO repository https://github.com/ultralytics/yolov3. **All code and models are under active development, and are subject to modification or deletion without notice.** Use at your own risk.
8
 
9
- <img src="https://user-images.githubusercontent.com/26833433/90187293-6773ba00-dd6e-11ea-8f90-cd94afc0427f.png" width="1000">** GPU Speed measures end-to-end time per image averaged over 5000 COCO val2017 images using a V100 GPU with batch size 32, and includes image preprocessing, PyTorch FP16 inference, postprocessing and NMS. EfficientDet data from [google/automl](https://github.com/google/automl) at batch size 8.
10
 
 
11
  - **August 13, 2020**: [v3.0 release](https://github.com/ultralytics/yolov5/releases/tag/v3.0): nn.Hardswish() activations, data autodownload, native AMP.
12
  - **July 23, 2020**: [v2.0 release](https://github.com/ultralytics/yolov5/releases/tag/v2.0): improved model definition, training and mAP.
13
  - **June 22, 2020**: [PANet](https://arxiv.org/abs/1803.01534) updates: new heads, reduced parameters, improved speed and mAP [364fcfd](https://github.com/ultralytics/yolov5/commit/364fcfd7dba53f46edd4f04c037a039c0a287972).
14
  - **June 19, 2020**: [FP16](https://pytorch.org/docs/stable/nn.html#torch.nn.Module.half) as new default for smaller checkpoints and faster inference [d4c6674](https://github.com/ultralytics/yolov5/commit/d4c6674c98e19df4c40e33a777610a18d1961145).
15
- - **June 9, 2020**: [CSP](https://github.com/WongKinYiu/CrossStagePartialNetworks) updates: improved speed, size, and accuracy (credit to @WongKinYiu for CSP).
16
- - **May 27, 2020**: Public release. YOLOv5 models are SOTA among all known YOLO implementations.
17
 
18
 
19
  ## Pretrained Checkpoints
20
 
21
- | Model | AP<sup>val</sup> | AP<sup>test</sup> | AP<sub>50</sub> | Speed<sub>GPU</sub> | FPS<sub>GPU</sub> || params | GFLOPS |
22
- |---------- |------ |------ |------ | -------- | ------| ------ |------ | :------: |
23
- | [YOLOv5s](https://github.com/ultralytics/yolov5/releases) | 37.0 | 37.0 | 56.2 | **2.4ms** | **416** || 7.5M | 17.5
24
- | [YOLOv5m](https://github.com/ultralytics/yolov5/releases) | 44.3 | 44.3 | 63.2 | 3.4ms | 294 || 21.8M | 52.3
25
- | [YOLOv5l](https://github.com/ultralytics/yolov5/releases) | 47.7 | 47.7 | 66.5 | 4.4ms | 227 || 47.8M | 117.2
26
- | [YOLOv5x](https://github.com/ultralytics/yolov5/releases) | **49.2** | **49.2** | **67.7** | 6.9ms | 145 || 89.0M | 221.5
27
- | | | | | | || |
28
- | [YOLOv5x](https://github.com/ultralytics/yolov5/releases) + TTA|**50.8**| **50.8** | **68.9** | 25.5ms | 39 || 89.0M | 801.0
 
 
 
 
 
29
 
30
  ** AP<sup>test</sup> denotes COCO [test-dev2017](http://cocodataset.org/#upload) server results, all other AP results denote val2017 accuracy.
31
  ** All AP numbers are for single-model single-scale without ensemble or TTA. **Reproduce mAP** by `python test.py --data coco.yaml --img 640 --conf 0.001 --iou 0.65`
@@ -33,6 +37,7 @@ This repository represents Ultralytics open-source research into future object d
33
  ** All checkpoints are trained to 300 epochs with default settings and hyperparameters (no autoaugmentation).
34
  ** Test Time Augmentation ([TTA](https://github.com/ultralytics/yolov5/issues/303)) runs at 3 image sizes. **Reproduce TTA** by `python test.py --data coco.yaml --img 832 --iou 0.65 --augment`
35
 
 
36
  ## Requirements
37
 
38
  Python 3.8 or later with all [requirements.txt](https://github.com/ultralytics/yolov5/blob/master/requirements.txt) dependencies installed, including `torch>=1.7`. To install run:
@@ -106,7 +111,7 @@ import torch
106
  from PIL import Image
107
 
108
  # Model
109
- model = torch.hub.load('ultralytics/yolov5', 'yolov5s', pretrained=True) # for PIL/cv2/np inputs and NMS
110
 
111
  # Images
112
  img1 = Image.open('zidane.jpg')
@@ -114,13 +119,13 @@ img2 = Image.open('bus.jpg')
114
  imgs = [img1, img2] # batched list of images
115
 
116
  # Inference
117
- prediction = model(imgs, size=640) # includes NMS
118
  ```
119
 
120
 
121
  ## Training
122
 
123
- Download [COCO](https://github.com/ultralytics/yolov5/blob/master/data/scripts/get_coco.sh) and run command below. Training times for YOLOv5s/m/l/x are 2/4/6/8 days on a single V100 (multi-GPU times faster). Use the largest `--batch-size` your GPU allows (batch sizes shown for 16 GB devices).
124
  ```bash
125
  $ python train.py --data coco.yaml --cfg yolov5s.yaml --weights '' --batch-size 64
126
  yolov5m 40
 
4
 
5
  ![CI CPU testing](https://github.com/ultralytics/yolov5/workflows/CI%20CPU%20testing/badge.svg)
6
 
7
+ This repository represents Ultralytics open-source research into future object detection methods, and incorporates lessons learned and best practices evolved over thousands of hours of training and evolution on anonymized client datasets. **All code and models are under active development, and are subject to modification or deletion without notice.** Use at your own risk.
8
 
9
+ <img src="https://user-images.githubusercontent.com/26833433/103594689-455e0e00-4eae-11eb-9cdf-7d753e2ceeeb.png" width="1000">** GPU Speed measures end-to-end time per image averaged over 5000 COCO val2017 images using a V100 GPU with batch size 32, and includes image preprocessing, PyTorch FP16 inference, postprocessing and NMS. EfficientDet data from [google/automl](https://github.com/google/automl) at batch size 8.
10
 
11
+ - **January 5, 2021**: [v4.0 release](https://github.com/ultralytics/yolov5/releases/tag/v4.0): nn.SiLU() activations, [Weights & Biases](https://wandb.ai/) logging, [PyTorch Hub](https://pytorch.org/hub/ultralytics_yolov5/) integration.
12
  - **August 13, 2020**: [v3.0 release](https://github.com/ultralytics/yolov5/releases/tag/v3.0): nn.Hardswish() activations, data autodownload, native AMP.
13
  - **July 23, 2020**: [v2.0 release](https://github.com/ultralytics/yolov5/releases/tag/v2.0): improved model definition, training and mAP.
14
  - **June 22, 2020**: [PANet](https://arxiv.org/abs/1803.01534) updates: new heads, reduced parameters, improved speed and mAP [364fcfd](https://github.com/ultralytics/yolov5/commit/364fcfd7dba53f46edd4f04c037a039c0a287972).
15
  - **June 19, 2020**: [FP16](https://pytorch.org/docs/stable/nn.html#torch.nn.Module.half) as new default for smaller checkpoints and faster inference [d4c6674](https://github.com/ultralytics/yolov5/commit/d4c6674c98e19df4c40e33a777610a18d1961145).
 
 
16
 
17
 
18
  ## Pretrained Checkpoints
19
 
20
+ | Model | size | AP<sup>val</sup> | AP<sup>test</sup> | AP<sub>50</sub> | Speed<sub>V100</sub> | FPS<sub>V100</sub> || params | GFLOPS |
21
+ |---------- |------ |------ |------ |------ | -------- | ------| ------ |------ | :------: |
22
+ | [YOLOv5s](https://github.com/ultralytics/yolov5/releases) |640 |36.8 |36.8 |55.6 |**2.2ms** |**455** ||7.3M |17.0
23
+ | [YOLOv5m](https://github.com/ultralytics/yolov5/releases) |640 |44.5 |44.5 |63.1 |2.9ms |345 ||21.4M |51.3
24
+ | [YOLOv5l](https://github.com/ultralytics/yolov5/releases) |640 |48.1 |48.1 |66.4 |3.8ms |264 ||47.0M |115.4
25
+ | [YOLOv5x](https://github.com/ultralytics/yolov5/releases) |640 |**50.1** |**50.1** |**68.7** |6.0ms |167 ||87.7M |218.8
26
+ | | | | | | | || |
27
+ | [YOLOv5x](https://github.com/ultralytics/yolov5/releases) + TTA |832 |**51.9** |**51.9** |**69.6** |24.9ms |40 ||87.7M |1005.3
28
+
29
+ <!---
30
+ | [YOLOv5l6](https://github.com/ultralytics/yolov5/releases) |640 |49.0 |49.0 |67.4 |4.1ms |244 ||77.2M |117.7
31
+ | [YOLOv5l6](https://github.com/ultralytics/yolov5/releases) |1280 |53.0 |53.0 |70.8 |12.3ms |81 ||77.2M |117.7
32
+ --->
33
 
34
  ** AP<sup>test</sup> denotes COCO [test-dev2017](http://cocodataset.org/#upload) server results, all other AP results denote val2017 accuracy.
35
  ** All AP numbers are for single-model single-scale without ensemble or TTA. **Reproduce mAP** by `python test.py --data coco.yaml --img 640 --conf 0.001 --iou 0.65`
 
37
  ** All checkpoints are trained to 300 epochs with default settings and hyperparameters (no autoaugmentation).
38
  ** Test Time Augmentation ([TTA](https://github.com/ultralytics/yolov5/issues/303)) runs at 3 image sizes. **Reproduce TTA** by `python test.py --data coco.yaml --img 832 --iou 0.65 --augment`
39
 
40
+
41
  ## Requirements
42
 
43
  Python 3.8 or later with all [requirements.txt](https://github.com/ultralytics/yolov5/blob/master/requirements.txt) dependencies installed, including `torch>=1.7`. To install run:
 
111
  from PIL import Image
112
 
113
  # Model
114
+ model = torch.hub.load('ultralytics/yolov5', 'yolov5s', pretrained=True)
115
 
116
  # Images
117
  img1 = Image.open('zidane.jpg')
 
119
  imgs = [img1, img2] # batched list of images
120
 
121
  # Inference
122
+ result = model(imgs)
123
  ```
124
 
125
 
126
  ## Training
127
 
128
+ Run commands below to reproduce results on [COCO](https://github.com/ultralytics/yolov5/blob/master/data/scripts/get_coco.sh) dataset (dataset auto-downloads on first use). Training times for YOLOv5s/m/l/x are 2/4/6/8 days on a single V100 (multi-GPU times faster). Use the largest `--batch-size` your GPU allows (batch sizes shown for 16 GB devices).
129
  ```bash
130
  $ python train.py --data coco.yaml --cfg yolov5s.yaml --weights '' --batch-size 64
131
  yolov5m 40
data/coco.yaml CHANGED
@@ -18,15 +18,15 @@ test: ../coco/test-dev2017.txt # 20288 of 40670 images, submit to https://compe
18
  nc: 80
19
 
20
  # class names
21
- names: ['person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'boat', 'traffic light',
22
- 'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow',
23
- 'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee',
24
- 'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard',
25
- 'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple',
26
- 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch',
27
- 'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone',
28
- 'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear',
29
- 'hair drier', 'toothbrush']
30
 
31
  # Print classes
32
  # with open('data/coco.yaml') as f:
 
18
  nc: 80
19
 
20
  # class names
21
+ names: [ 'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'boat', 'traffic light',
22
+ 'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow',
23
+ 'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee',
24
+ 'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard',
25
+ 'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple',
26
+ 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch',
27
+ 'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone',
28
+ 'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear',
29
+ 'hair drier', 'toothbrush' ]
30
 
31
  # Print classes
32
  # with open('data/coco.yaml') as f:
data/coco128.yaml CHANGED
@@ -17,12 +17,12 @@ val: ../coco128/images/train2017/ # 128 images
17
  nc: 80
18
 
19
  # class names
20
- names: ['person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'boat', 'traffic light',
21
- 'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow',
22
- 'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee',
23
- 'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard',
24
- 'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple',
25
- 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch',
26
- 'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone',
27
- 'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear',
28
- 'hair drier', 'toothbrush']
 
17
  nc: 80
18
 
19
  # class names
20
+ names: [ 'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'boat', 'traffic light',
21
+ 'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow',
22
+ 'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee',
23
+ 'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard',
24
+ 'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple',
25
+ 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch',
26
+ 'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone',
27
+ 'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear',
28
+ 'hair drier', 'toothbrush' ]
data/voc.yaml CHANGED
@@ -17,5 +17,5 @@ val: ../VOC/images/val/ # 4952 images
17
  nc: 20
18
 
19
  # class names
20
- names: ['aeroplane', 'bicycle', 'bird', 'boat', 'bottle', 'bus', 'car', 'cat', 'chair', 'cow', 'diningtable', 'dog',
21
- 'horse', 'motorbike', 'person', 'pottedplant', 'sheep', 'sofa', 'train', 'tvmonitor']
 
17
  nc: 20
18
 
19
  # class names
20
+ names: [ 'aeroplane', 'bicycle', 'bird', 'boat', 'bottle', 'bus', 'car', 'cat', 'chair', 'cow', 'diningtable', 'dog',
21
+ 'horse', 'motorbike', 'person', 'pottedplant', 'sheep', 'sofa', 'train', 'tvmonitor' ]
models/common.py CHANGED
@@ -30,7 +30,7 @@ class Conv(nn.Module):
30
  super(Conv, self).__init__()
31
  self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p), groups=g, bias=False)
32
  self.bn = nn.BatchNorm2d(c2)
33
- self.act = nn.Hardswish() if act is True else (act if isinstance(act, nn.Module) else nn.Identity())
34
 
35
  def forward(self, x):
36
  return self.act(self.bn(self.conv(x)))
@@ -105,9 +105,39 @@ class Focus(nn.Module):
105
  def __init__(self, c1, c2, k=1, s=1, p=None, g=1, act=True): # ch_in, ch_out, kernel, stride, padding, groups
106
  super(Focus, self).__init__()
107
  self.conv = Conv(c1 * 4, c2, k, s, p, g, act)
 
108
 
109
  def forward(self, x): # x(b,c,w,h) -> y(b,4c,w/2,h/2)
110
  return self.conv(torch.cat([x[..., ::2, ::2], x[..., 1::2, ::2], x[..., ::2, 1::2], x[..., 1::2, 1::2]], 1))
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
111
 
112
 
113
  class Concat(nn.Module):
@@ -253,20 +283,13 @@ class Detections:
253
  return x
254
 
255
 
256
- class Flatten(nn.Module):
257
- # Use after nn.AdaptiveAvgPool2d(1) to remove last 2 dimensions
258
- @staticmethod
259
- def forward(x):
260
- return x.view(x.size(0), -1)
261
-
262
-
263
  class Classify(nn.Module):
264
  # Classification head, i.e. x(b,c1,20,20) to x(b,c2)
265
  def __init__(self, c1, c2, k=1, s=1, p=None, g=1): # ch_in, ch_out, kernel, stride, padding, groups
266
  super(Classify, self).__init__()
267
  self.aap = nn.AdaptiveAvgPool2d(1) # to x(b,c1,1,1)
268
  self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p), groups=g) # to x(b,c2,1,1)
269
- self.flat = Flatten()
270
 
271
  def forward(self, x):
272
  z = torch.cat([self.aap(y) for y in (x if isinstance(x, list) else [x])], 1) # cat if list
 
30
  super(Conv, self).__init__()
31
  self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p), groups=g, bias=False)
32
  self.bn = nn.BatchNorm2d(c2)
33
+ self.act = nn.SiLU() if act is True else (act if isinstance(act, nn.Module) else nn.Identity())
34
 
35
  def forward(self, x):
36
  return self.act(self.bn(self.conv(x)))
 
105
  def __init__(self, c1, c2, k=1, s=1, p=None, g=1, act=True): # ch_in, ch_out, kernel, stride, padding, groups
106
  super(Focus, self).__init__()
107
  self.conv = Conv(c1 * 4, c2, k, s, p, g, act)
108
+ # self.contract = Contract(gain=2)
109
 
110
  def forward(self, x): # x(b,c,w,h) -> y(b,4c,w/2,h/2)
111
  return self.conv(torch.cat([x[..., ::2, ::2], x[..., 1::2, ::2], x[..., ::2, 1::2], x[..., 1::2, 1::2]], 1))
112
+ # return self.conv(self.contract(x))
113
+
114
+
115
+ class Contract(nn.Module):
116
+ # Contract width-height into channels, i.e. x(1,64,80,80) to x(1,256,40,40)
117
+ def __init__(self, gain=2):
118
+ super().__init__()
119
+ self.gain = gain
120
+
121
+ def forward(self, x):
122
+ N, C, H, W = x.size() # assert (H / s == 0) and (W / s == 0), 'Indivisible gain'
123
+ s = self.gain
124
+ x = x.view(N, C, H // s, s, W // s, s) # x(1,64,40,2,40,2)
125
+ x = x.permute(0, 3, 5, 1, 2, 4).contiguous() # x(1,2,2,64,40,40)
126
+ return x.view(N, C * s * s, H // s, W // s) # x(1,256,40,40)
127
+
128
+
129
+ class Expand(nn.Module):
130
+ # Expand channels into width-height, i.e. x(1,64,80,80) to x(1,16,160,160)
131
+ def __init__(self, gain=2):
132
+ super().__init__()
133
+ self.gain = gain
134
+
135
+ def forward(self, x):
136
+ N, C, H, W = x.size() # assert C / s ** 2 == 0, 'Indivisible gain'
137
+ s = self.gain
138
+ x = x.view(N, s, s, C // s ** 2, H, W) # x(1,2,2,16,80,80)
139
+ x = x.permute(0, 3, 4, 1, 5, 2).contiguous() # x(1,16,80,2,80,2)
140
+ return x.view(N, C // s ** 2, H * s, W * s) # x(1,16,160,160)
141
 
142
 
143
  class Concat(nn.Module):
 
283
  return x
284
 
285
 
 
 
 
 
 
 
 
286
  class Classify(nn.Module):
287
  # Classification head, i.e. x(b,c1,20,20) to x(b,c2)
288
  def __init__(self, c1, c2, k=1, s=1, p=None, g=1): # ch_in, ch_out, kernel, stride, padding, groups
289
  super(Classify, self).__init__()
290
  self.aap = nn.AdaptiveAvgPool2d(1) # to x(b,c1,1,1)
291
  self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p), groups=g) # to x(b,c2,1,1)
292
+ self.flat = nn.Flatten()
293
 
294
  def forward(self, x):
295
  z = torch.cat([self.aap(y) for y in (x if isinstance(x, list) else [x])], 1) # cat if list
models/experimental.py CHANGED
@@ -105,8 +105,8 @@ class Ensemble(nn.ModuleList):
105
  for module in self:
106
  y.append(module(x, augment)[0])
107
  # y = torch.stack(y).max(0)[0] # max ensemble
108
- # y = torch.cat(y, 1) # nms ensemble
109
- y = torch.stack(y).mean(0) # mean ensemble
110
  return y, None # inference, train output
111
 
112
 
 
105
  for module in self:
106
  y.append(module(x, augment)[0])
107
  # y = torch.stack(y).max(0)[0] # max ensemble
108
+ # y = torch.stack(y).mean(0) # mean ensemble
109
+ y = torch.cat(y, 1) # nms ensemble
110
  return y, None # inference, train output
111
 
112
 
models/hub/anchors.yaml ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Default YOLOv5 anchors for COCO data
2
+
3
+
4
+ # P5 -------------------------------------------------------------------------------------------------------------------
5
+ # P5-640:
6
+ anchors_p5_640:
7
+ - [ 10,13, 16,30, 33,23 ] # P3/8
8
+ - [ 30,61, 62,45, 59,119 ] # P4/16
9
+ - [ 116,90, 156,198, 373,326 ] # P5/32
10
+
11
+
12
+ # P6 -------------------------------------------------------------------------------------------------------------------
13
+ # P6-640: thr=0.25: 0.9964 BPR, 5.54 anchors past thr, n=12, img_size=640, metric_all=0.281/0.716-mean/best, past_thr=0.469-mean: 9,11, 21,19, 17,41, 43,32, 39,70, 86,64, 65,131, 134,130, 120,265, 282,180, 247,354, 512,387
14
+ anchors_p6_640:
15
+ - [ 9,11, 21,19, 17,41 ] # P3/8
16
+ - [ 43,32, 39,70, 86,64 ] # P4/16
17
+ - [ 65,131, 134,130, 120,265 ] # P5/32
18
+ - [ 282,180, 247,354, 512,387 ] # P6/64
19
+
20
+ # P6-1280: thr=0.25: 0.9950 BPR, 5.55 anchors past thr, n=12, img_size=1280, metric_all=0.281/0.714-mean/best, past_thr=0.468-mean: 19,27, 44,40, 38,94, 96,68, 86,152, 180,137, 140,301, 303,264, 238,542, 436,615, 739,380, 925,792
21
+ anchors_p6_1280:
22
+ - [ 19,27, 44,40, 38,94 ] # P3/8
23
+ - [ 96,68, 86,152, 180,137 ] # P4/16
24
+ - [ 140,301, 303,264, 238,542 ] # P5/32
25
+ - [ 436,615, 739,380, 925,792 ] # P6/64
26
+
27
+ # P6-1920: thr=0.25: 0.9950 BPR, 5.55 anchors past thr, n=12, img_size=1920, metric_all=0.281/0.714-mean/best, past_thr=0.468-mean: 28,41, 67,59, 57,141, 144,103, 129,227, 270,205, 209,452, 455,396, 358,812, 653,922, 1109,570, 1387,1187
28
+ anchors_p6_1920:
29
+ - [ 28,41, 67,59, 57,141 ] # P3/8
30
+ - [ 144,103, 129,227, 270,205 ] # P4/16
31
+ - [ 209,452, 455,396, 358,812 ] # P5/32
32
+ - [ 653,922, 1109,570, 1387,1187 ] # P6/64
33
+
34
+
35
+ # P7 -------------------------------------------------------------------------------------------------------------------
36
+ # P7-640: thr=0.25: 0.9962 BPR, 6.76 anchors past thr, n=15, img_size=640, metric_all=0.275/0.733-mean/best, past_thr=0.466-mean: 11,11, 13,30, 29,20, 30,46, 61,38, 39,92, 78,80, 146,66, 79,163, 149,150, 321,143, 157,303, 257,402, 359,290, 524,372
37
+ anchors_p7_640:
38
+ - [ 11,11, 13,30, 29,20 ] # P3/8
39
+ - [ 30,46, 61,38, 39,92 ] # P4/16
40
+ - [ 78,80, 146,66, 79,163 ] # P5/32
41
+ - [ 149,150, 321,143, 157,303 ] # P6/64
42
+ - [ 257,402, 359,290, 524,372 ] # P7/128
43
+
44
+ # P7-1280: thr=0.25: 0.9968 BPR, 6.71 anchors past thr, n=15, img_size=1280, metric_all=0.273/0.732-mean/best, past_thr=0.463-mean: 19,22, 54,36, 32,77, 70,83, 138,71, 75,173, 165,159, 148,334, 375,151, 334,317, 251,626, 499,474, 750,326, 534,814, 1079,818
45
+ anchors_p7_1280:
46
+ - [ 19,22, 54,36, 32,77 ] # P3/8
47
+ - [ 70,83, 138,71, 75,173 ] # P4/16
48
+ - [ 165,159, 148,334, 375,151 ] # P5/32
49
+ - [ 334,317, 251,626, 499,474 ] # P6/64
50
+ - [ 750,326, 534,814, 1079,818 ] # P7/128
51
+
52
+ # P7-1920: thr=0.25: 0.9968 BPR, 6.71 anchors past thr, n=15, img_size=1920, metric_all=0.273/0.732-mean/best, past_thr=0.463-mean: 29,34, 81,55, 47,115, 105,124, 207,107, 113,259, 247,238, 222,500, 563,227, 501,476, 376,939, 749,711, 1126,489, 801,1222, 1618,1227
53
+ anchors_p7_1920:
54
+ - [ 29,34, 81,55, 47,115 ] # P3/8
55
+ - [ 105,124, 207,107, 113,259 ] # P4/16
56
+ - [ 247,238, 222,500, 563,227 ] # P5/32
57
+ - [ 501,476, 376,939, 749,711 ] # P6/64
58
+ - [ 1126,489, 801,1222, 1618,1227 ] # P7/128
models/hub/yolov5-p2.yaml ADDED
@@ -0,0 +1,54 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # parameters
2
+ nc: 80 # number of classes
3
+ depth_multiple: 1.0 # model depth multiple
4
+ width_multiple: 1.0 # layer channel multiple
5
+
6
+ # anchors
7
+ anchors: 3
8
+
9
+ # YOLOv5 backbone
10
+ backbone:
11
+ # [from, number, module, args]
12
+ [ [ -1, 1, Focus, [ 64, 3 ] ], # 0-P1/2
13
+ [ -1, 1, Conv, [ 128, 3, 2 ] ], # 1-P2/4
14
+ [ -1, 3, C3, [ 128 ] ],
15
+ [ -1, 1, Conv, [ 256, 3, 2 ] ], # 3-P3/8
16
+ [ -1, 9, C3, [ 256 ] ],
17
+ [ -1, 1, Conv, [ 512, 3, 2 ] ], # 5-P4/16
18
+ [ -1, 9, C3, [ 512 ] ],
19
+ [ -1, 1, Conv, [ 1024, 3, 2 ] ], # 7-P5/32
20
+ [ -1, 1, SPP, [ 1024, [ 5, 9, 13 ] ] ],
21
+ [ -1, 3, C3, [ 1024, False ] ], # 9
22
+ ]
23
+
24
+ # YOLOv5 head
25
+ head:
26
+ [ [ -1, 1, Conv, [ 512, 1, 1 ] ],
27
+ [ -1, 1, nn.Upsample, [ None, 2, 'nearest' ] ],
28
+ [ [ -1, 6 ], 1, Concat, [ 1 ] ], # cat backbone P4
29
+ [ -1, 3, C3, [ 512, False ] ], # 13
30
+
31
+ [ -1, 1, Conv, [ 256, 1, 1 ] ],
32
+ [ -1, 1, nn.Upsample, [ None, 2, 'nearest' ] ],
33
+ [ [ -1, 4 ], 1, Concat, [ 1 ] ], # cat backbone P3
34
+ [ -1, 3, C3, [ 256, False ] ], # 17 (P3/8-small)
35
+
36
+ [ -1, 1, Conv, [ 128, 1, 1 ] ],
37
+ [ -1, 1, nn.Upsample, [ None, 2, 'nearest' ] ],
38
+ [ [ -1, 2 ], 1, Concat, [ 1 ] ], # cat backbone P2
39
+ [ -1, 1, C3, [ 128, False ] ], # 21 (P2/4-xsmall)
40
+
41
+ [ -1, 1, Conv, [ 128, 3, 2 ] ],
42
+ [ [ -1, 18 ], 1, Concat, [ 1 ] ], # cat head P3
43
+ [ -1, 3, C3, [ 256, False ] ], # 24 (P3/8-small)
44
+
45
+ [ -1, 1, Conv, [ 256, 3, 2 ] ],
46
+ [ [ -1, 14 ], 1, Concat, [ 1 ] ], # cat head P4
47
+ [ -1, 3, C3, [ 512, False ] ], # 27 (P4/16-medium)
48
+
49
+ [ -1, 1, Conv, [ 512, 3, 2 ] ],
50
+ [ [ -1, 10 ], 1, Concat, [ 1 ] ], # cat head P5
51
+ [ -1, 3, C3, [ 1024, False ] ], # 30 (P5/32-large)
52
+
53
+ [ [ 24, 27, 30 ], 1, Detect, [ nc, anchors ] ], # Detect(P3, P4, P5)
54
+ ]
models/hub/yolov5-p6.yaml ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # parameters
2
+ nc: 80 # number of classes
3
+ depth_multiple: 1.0 # model depth multiple
4
+ width_multiple: 1.0 # layer channel multiple
5
+
6
+ # anchors
7
+ anchors: 3
8
+
9
+ # YOLOv5 backbone
10
+ backbone:
11
+ # [from, number, module, args]
12
+ [ [ -1, 1, Focus, [ 64, 3 ] ], # 0-P1/2
13
+ [ -1, 1, Conv, [ 128, 3, 2 ] ], # 1-P2/4
14
+ [ -1, 3, C3, [ 128 ] ],
15
+ [ -1, 1, Conv, [ 256, 3, 2 ] ], # 3-P3/8
16
+ [ -1, 9, C3, [ 256 ] ],
17
+ [ -1, 1, Conv, [ 512, 3, 2 ] ], # 5-P4/16
18
+ [ -1, 9, C3, [ 512 ] ],
19
+ [ -1, 1, Conv, [ 768, 3, 2 ] ], # 7-P5/32
20
+ [ -1, 3, C3, [ 768 ] ],
21
+ [ -1, 1, Conv, [ 1024, 3, 2 ] ], # 9-P6/64
22
+ [ -1, 1, SPP, [ 1024, [ 3, 5, 7 ] ] ],
23
+ [ -1, 3, C3, [ 1024, False ] ], # 11
24
+ ]
25
+
26
+ # YOLOv5 head
27
+ head:
28
+ [ [ -1, 1, Conv, [ 768, 1, 1 ] ],
29
+ [ -1, 1, nn.Upsample, [ None, 2, 'nearest' ] ],
30
+ [ [ -1, 8 ], 1, Concat, [ 1 ] ], # cat backbone P5
31
+ [ -1, 3, C3, [ 768, False ] ], # 15
32
+
33
+ [ -1, 1, Conv, [ 512, 1, 1 ] ],
34
+ [ -1, 1, nn.Upsample, [ None, 2, 'nearest' ] ],
35
+ [ [ -1, 6 ], 1, Concat, [ 1 ] ], # cat backbone P4
36
+ [ -1, 3, C3, [ 512, False ] ], # 19
37
+
38
+ [ -1, 1, Conv, [ 256, 1, 1 ] ],
39
+ [ -1, 1, nn.Upsample, [ None, 2, 'nearest' ] ],
40
+ [ [ -1, 4 ], 1, Concat, [ 1 ] ], # cat backbone P3
41
+ [ -1, 3, C3, [ 256, False ] ], # 23 (P3/8-small)
42
+
43
+ [ -1, 1, Conv, [ 256, 3, 2 ] ],
44
+ [ [ -1, 20 ], 1, Concat, [ 1 ] ], # cat head P4
45
+ [ -1, 3, C3, [ 512, False ] ], # 26 (P4/16-medium)
46
+
47
+ [ -1, 1, Conv, [ 512, 3, 2 ] ],
48
+ [ [ -1, 16 ], 1, Concat, [ 1 ] ], # cat head P5
49
+ [ -1, 3, C3, [ 768, False ] ], # 29 (P5/32-large)
50
+
51
+ [ -1, 1, Conv, [ 768, 3, 2 ] ],
52
+ [ [ -1, 12 ], 1, Concat, [ 1 ] ], # cat head P6
53
+ [ -1, 3, C3, [ 1024, False ] ], # 32 (P5/64-xlarge)
54
+
55
+ [ [ 23, 26, 29, 32 ], 1, Detect, [ nc, anchors ] ], # Detect(P3, P4, P5, P6)
56
+ ]
models/hub/yolov5-p7.yaml ADDED
@@ -0,0 +1,67 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # parameters
2
+ nc: 80 # number of classes
3
+ depth_multiple: 1.0 # model depth multiple
4
+ width_multiple: 1.0 # layer channel multiple
5
+
6
+ # anchors
7
+ anchors: 3
8
+
9
+ # YOLOv5 backbone
10
+ backbone:
11
+ # [from, number, module, args]
12
+ [ [ -1, 1, Focus, [ 64, 3 ] ], # 0-P1/2
13
+ [ -1, 1, Conv, [ 128, 3, 2 ] ], # 1-P2/4
14
+ [ -1, 3, C3, [ 128 ] ],
15
+ [ -1, 1, Conv, [ 256, 3, 2 ] ], # 3-P3/8
16
+ [ -1, 9, C3, [ 256 ] ],
17
+ [ -1, 1, Conv, [ 512, 3, 2 ] ], # 5-P4/16
18
+ [ -1, 9, C3, [ 512 ] ],
19
+ [ -1, 1, Conv, [ 768, 3, 2 ] ], # 7-P5/32
20
+ [ -1, 3, C3, [ 768 ] ],
21
+ [ -1, 1, Conv, [ 1024, 3, 2 ] ], # 9-P6/64
22
+ [ -1, 3, C3, [ 1024 ] ],
23
+ [ -1, 1, Conv, [ 1280, 3, 2 ] ], # 11-P7/128
24
+ [ -1, 1, SPP, [ 1280, [ 3, 5 ] ] ],
25
+ [ -1, 3, C3, [ 1280, False ] ], # 13
26
+ ]
27
+
28
+ # YOLOv5 head
29
+ head:
30
+ [ [ -1, 1, Conv, [ 1024, 1, 1 ] ],
31
+ [ -1, 1, nn.Upsample, [ None, 2, 'nearest' ] ],
32
+ [ [ -1, 10 ], 1, Concat, [ 1 ] ], # cat backbone P6
33
+ [ -1, 3, C3, [ 1024, False ] ], # 17
34
+
35
+ [ -1, 1, Conv, [ 768, 1, 1 ] ],
36
+ [ -1, 1, nn.Upsample, [ None, 2, 'nearest' ] ],
37
+ [ [ -1, 8 ], 1, Concat, [ 1 ] ], # cat backbone P5
38
+ [ -1, 3, C3, [ 768, False ] ], # 21
39
+
40
+ [ -1, 1, Conv, [ 512, 1, 1 ] ],
41
+ [ -1, 1, nn.Upsample, [ None, 2, 'nearest' ] ],
42
+ [ [ -1, 6 ], 1, Concat, [ 1 ] ], # cat backbone P4
43
+ [ -1, 3, C3, [ 512, False ] ], # 25
44
+
45
+ [ -1, 1, Conv, [ 256, 1, 1 ] ],
46
+ [ -1, 1, nn.Upsample, [ None, 2, 'nearest' ] ],
47
+ [ [ -1, 4 ], 1, Concat, [ 1 ] ], # cat backbone P3
48
+ [ -1, 3, C3, [ 256, False ] ], # 29 (P3/8-small)
49
+
50
+ [ -1, 1, Conv, [ 256, 3, 2 ] ],
51
+ [ [ -1, 26 ], 1, Concat, [ 1 ] ], # cat head P4
52
+ [ -1, 3, C3, [ 512, False ] ], # 32 (P4/16-medium)
53
+
54
+ [ -1, 1, Conv, [ 512, 3, 2 ] ],
55
+ [ [ -1, 22 ], 1, Concat, [ 1 ] ], # cat head P5
56
+ [ -1, 3, C3, [ 768, False ] ], # 35 (P5/32-large)
57
+
58
+ [ -1, 1, Conv, [ 768, 3, 2 ] ],
59
+ [ [ -1, 18 ], 1, Concat, [ 1 ] ], # cat head P6
60
+ [ -1, 3, C3, [ 1024, False ] ], # 38 (P6/64-xlarge)
61
+
62
+ [ -1, 1, Conv, [ 1024, 3, 2 ] ],
63
+ [ [ -1, 14 ], 1, Concat, [ 1 ] ], # cat head P7
64
+ [ -1, 3, C3, [ 1280, False ] ], # 41 (P7/128-xxlarge)
65
+
66
+ [ [ 29, 32, 35, 38, 41 ], 1, Detect, [ nc, anchors ] ], # Detect(P3, P4, P5, P6, P7)
67
+ ]
models/yolo.py CHANGED
@@ -1,17 +1,13 @@
1
  import argparse
2
  import logging
3
- import math
4
  import sys
5
  from copy import deepcopy
6
  from pathlib import Path
7
 
8
- import torch
9
- import torch.nn as nn
10
-
11
  sys.path.append('./') # to run '$ python *.py' files in subdirectories
12
  logger = logging.getLogger(__name__)
13
 
14
- from models.common import Conv, Bottleneck, SPP, DWConv, Focus, BottleneckCSP, C3, Concat, NMS, autoShape
15
  from models.experimental import MixConv2d, CrossConv
16
  from utils.autoanchor import check_anchor_order
17
  from utils.general import make_divisible, check_file, set_logging
@@ -89,7 +85,7 @@ class Model(nn.Module):
89
  # Build strides, anchors
90
  m = self.model[-1] # Detect()
91
  if isinstance(m, Detect):
92
- s = 128 # 2x min stride
93
  m.stride = torch.tensor([s / x.shape[-2] for x in self.forward(torch.zeros(1, ch, s, s))]) # forward
94
  m.anchors /= m.stride.view(-1, 1, 1)
95
  check_anchor_order(m)
@@ -109,7 +105,7 @@ class Model(nn.Module):
109
  f = [None, 3, None] # flips (2-ud, 3-lr)
110
  y = [] # outputs
111
  for si, fi in zip(s, f):
112
- xi = scale_img(x.flip(fi) if fi else x, si)
113
  yi = self.forward_once(xi)[0] # forward
114
  # cv2.imwrite('img%g.jpg' % s, 255 * xi[0].numpy().transpose((1, 2, 0))[:, :, ::-1]) # save
115
  yi[..., :4] /= si # de-scale
@@ -242,13 +238,17 @@ def parse_model(d, ch): # model_dict, input_channels(3)
242
  elif m is nn.BatchNorm2d:
243
  args = [ch[f]]
244
  elif m is Concat:
245
- c2 = sum([ch[-1 if x == -1 else x + 1] for x in f])
246
  elif m is Detect:
247
  args.append([ch[x + 1] for x in f])
248
  if isinstance(args[1], int): # number of anchors
249
  args[1] = [list(range(args[1] * 2))] * len(f)
 
 
 
 
250
  else:
251
- c2 = ch[f]
252
 
253
  m_ = nn.Sequential(*[m(*args) for _ in range(n)]) if n > 1 else m(*args) # module
254
  t = str(m)[8:-2].replace('__main__.', '') # module type
 
1
  import argparse
2
  import logging
 
3
  import sys
4
  from copy import deepcopy
5
  from pathlib import Path
6
 
 
 
 
7
  sys.path.append('./') # to run '$ python *.py' files in subdirectories
8
  logger = logging.getLogger(__name__)
9
 
10
+ from models.common import *
11
  from models.experimental import MixConv2d, CrossConv
12
  from utils.autoanchor import check_anchor_order
13
  from utils.general import make_divisible, check_file, set_logging
 
85
  # Build strides, anchors
86
  m = self.model[-1] # Detect()
87
  if isinstance(m, Detect):
88
+ s = 256 # 2x min stride
89
  m.stride = torch.tensor([s / x.shape[-2] for x in self.forward(torch.zeros(1, ch, s, s))]) # forward
90
  m.anchors /= m.stride.view(-1, 1, 1)
91
  check_anchor_order(m)
 
105
  f = [None, 3, None] # flips (2-ud, 3-lr)
106
  y = [] # outputs
107
  for si, fi in zip(s, f):
108
+ xi = scale_img(x.flip(fi) if fi else x, si, gs=int(self.stride.max()))
109
  yi = self.forward_once(xi)[0] # forward
110
  # cv2.imwrite('img%g.jpg' % s, 255 * xi[0].numpy().transpose((1, 2, 0))[:, :, ::-1]) # save
111
  yi[..., :4] /= si # de-scale
 
238
  elif m is nn.BatchNorm2d:
239
  args = [ch[f]]
240
  elif m is Concat:
241
+ c2 = sum([ch[x if x < 0 else x + 1] for x in f])
242
  elif m is Detect:
243
  args.append([ch[x + 1] for x in f])
244
  if isinstance(args[1], int): # number of anchors
245
  args[1] = [list(range(args[1] * 2))] * len(f)
246
+ elif m is Contract:
247
+ c2 = ch[f if f < 0 else f + 1] * args[0] ** 2
248
+ elif m is Expand:
249
+ c2 = ch[f if f < 0 else f + 1] // args[0] ** 2
250
  else:
251
+ c2 = ch[f if f < 0 else f + 1]
252
 
253
  m_ = nn.Sequential(*[m(*args) for _ in range(n)]) if n > 1 else m(*args) # module
254
  t = str(m)[8:-2].replace('__main__.', '') # module type
models/yolov5l.yaml CHANGED
@@ -14,14 +14,14 @@ backbone:
14
  # [from, number, module, args]
15
  [[-1, 1, Focus, [64, 3]], # 0-P1/2
16
  [-1, 1, Conv, [128, 3, 2]], # 1-P2/4
17
- [-1, 3, BottleneckCSP, [128]],
18
  [-1, 1, Conv, [256, 3, 2]], # 3-P3/8
19
- [-1, 9, BottleneckCSP, [256]],
20
  [-1, 1, Conv, [512, 3, 2]], # 5-P4/16
21
- [-1, 9, BottleneckCSP, [512]],
22
  [-1, 1, Conv, [1024, 3, 2]], # 7-P5/32
23
  [-1, 1, SPP, [1024, [5, 9, 13]]],
24
- [-1, 3, BottleneckCSP, [1024, False]], # 9
25
  ]
26
 
27
  # YOLOv5 head
@@ -29,20 +29,20 @@ head:
29
  [[-1, 1, Conv, [512, 1, 1]],
30
  [-1, 1, nn.Upsample, [None, 2, 'nearest']],
31
  [[-1, 6], 1, Concat, [1]], # cat backbone P4
32
- [-1, 3, BottleneckCSP, [512, False]], # 13
33
 
34
  [-1, 1, Conv, [256, 1, 1]],
35
  [-1, 1, nn.Upsample, [None, 2, 'nearest']],
36
  [[-1, 4], 1, Concat, [1]], # cat backbone P3
37
- [-1, 3, BottleneckCSP, [256, False]], # 17 (P3/8-small)
38
 
39
  [-1, 1, Conv, [256, 3, 2]],
40
  [[-1, 14], 1, Concat, [1]], # cat head P4
41
- [-1, 3, BottleneckCSP, [512, False]], # 20 (P4/16-medium)
42
 
43
  [-1, 1, Conv, [512, 3, 2]],
44
  [[-1, 10], 1, Concat, [1]], # cat head P5
45
- [-1, 3, BottleneckCSP, [1024, False]], # 23 (P5/32-large)
46
 
47
  [[17, 20, 23], 1, Detect, [nc, anchors]], # Detect(P3, P4, P5)
48
  ]
 
14
  # [from, number, module, args]
15
  [[-1, 1, Focus, [64, 3]], # 0-P1/2
16
  [-1, 1, Conv, [128, 3, 2]], # 1-P2/4
17
+ [-1, 3, C3, [128]],
18
  [-1, 1, Conv, [256, 3, 2]], # 3-P3/8
19
+ [-1, 9, C3, [256]],
20
  [-1, 1, Conv, [512, 3, 2]], # 5-P4/16
21
+ [-1, 9, C3, [512]],
22
  [-1, 1, Conv, [1024, 3, 2]], # 7-P5/32
23
  [-1, 1, SPP, [1024, [5, 9, 13]]],
24
+ [-1, 3, C3, [1024, False]], # 9
25
  ]
26
 
27
  # YOLOv5 head
 
29
  [[-1, 1, Conv, [512, 1, 1]],
30
  [-1, 1, nn.Upsample, [None, 2, 'nearest']],
31
  [[-1, 6], 1, Concat, [1]], # cat backbone P4
32
+ [-1, 3, C3, [512, False]], # 13
33
 
34
  [-1, 1, Conv, [256, 1, 1]],
35
  [-1, 1, nn.Upsample, [None, 2, 'nearest']],
36
  [[-1, 4], 1, Concat, [1]], # cat backbone P3
37
+ [-1, 3, C3, [256, False]], # 17 (P3/8-small)
38
 
39
  [-1, 1, Conv, [256, 3, 2]],
40
  [[-1, 14], 1, Concat, [1]], # cat head P4
41
+ [-1, 3, C3, [512, False]], # 20 (P4/16-medium)
42
 
43
  [-1, 1, Conv, [512, 3, 2]],
44
  [[-1, 10], 1, Concat, [1]], # cat head P5
45
+ [-1, 3, C3, [1024, False]], # 23 (P5/32-large)
46
 
47
  [[17, 20, 23], 1, Detect, [nc, anchors]], # Detect(P3, P4, P5)
48
  ]
models/yolov5m.yaml CHANGED
@@ -14,14 +14,14 @@ backbone:
14
  # [from, number, module, args]
15
  [[-1, 1, Focus, [64, 3]], # 0-P1/2
16
  [-1, 1, Conv, [128, 3, 2]], # 1-P2/4
17
- [-1, 3, BottleneckCSP, [128]],
18
  [-1, 1, Conv, [256, 3, 2]], # 3-P3/8
19
- [-1, 9, BottleneckCSP, [256]],
20
  [-1, 1, Conv, [512, 3, 2]], # 5-P4/16
21
- [-1, 9, BottleneckCSP, [512]],
22
  [-1, 1, Conv, [1024, 3, 2]], # 7-P5/32
23
  [-1, 1, SPP, [1024, [5, 9, 13]]],
24
- [-1, 3, BottleneckCSP, [1024, False]], # 9
25
  ]
26
 
27
  # YOLOv5 head
@@ -29,20 +29,20 @@ head:
29
  [[-1, 1, Conv, [512, 1, 1]],
30
  [-1, 1, nn.Upsample, [None, 2, 'nearest']],
31
  [[-1, 6], 1, Concat, [1]], # cat backbone P4
32
- [-1, 3, BottleneckCSP, [512, False]], # 13
33
 
34
  [-1, 1, Conv, [256, 1, 1]],
35
  [-1, 1, nn.Upsample, [None, 2, 'nearest']],
36
  [[-1, 4], 1, Concat, [1]], # cat backbone P3
37
- [-1, 3, BottleneckCSP, [256, False]], # 17 (P3/8-small)
38
 
39
  [-1, 1, Conv, [256, 3, 2]],
40
  [[-1, 14], 1, Concat, [1]], # cat head P4
41
- [-1, 3, BottleneckCSP, [512, False]], # 20 (P4/16-medium)
42
 
43
  [-1, 1, Conv, [512, 3, 2]],
44
  [[-1, 10], 1, Concat, [1]], # cat head P5
45
- [-1, 3, BottleneckCSP, [1024, False]], # 23 (P5/32-large)
46
 
47
  [[17, 20, 23], 1, Detect, [nc, anchors]], # Detect(P3, P4, P5)
48
  ]
 
14
  # [from, number, module, args]
15
  [[-1, 1, Focus, [64, 3]], # 0-P1/2
16
  [-1, 1, Conv, [128, 3, 2]], # 1-P2/4
17
+ [-1, 3, C3, [128]],
18
  [-1, 1, Conv, [256, 3, 2]], # 3-P3/8
19
+ [-1, 9, C3, [256]],
20
  [-1, 1, Conv, [512, 3, 2]], # 5-P4/16
21
+ [-1, 9, C3, [512]],
22
  [-1, 1, Conv, [1024, 3, 2]], # 7-P5/32
23
  [-1, 1, SPP, [1024, [5, 9, 13]]],
24
+ [-1, 3, C3, [1024, False]], # 9
25
  ]
26
 
27
  # YOLOv5 head
 
29
  [[-1, 1, Conv, [512, 1, 1]],
30
  [-1, 1, nn.Upsample, [None, 2, 'nearest']],
31
  [[-1, 6], 1, Concat, [1]], # cat backbone P4
32
+ [-1, 3, C3, [512, False]], # 13
33
 
34
  [-1, 1, Conv, [256, 1, 1]],
35
  [-1, 1, nn.Upsample, [None, 2, 'nearest']],
36
  [[-1, 4], 1, Concat, [1]], # cat backbone P3
37
+ [-1, 3, C3, [256, False]], # 17 (P3/8-small)
38
 
39
  [-1, 1, Conv, [256, 3, 2]],
40
  [[-1, 14], 1, Concat, [1]], # cat head P4
41
+ [-1, 3, C3, [512, False]], # 20 (P4/16-medium)
42
 
43
  [-1, 1, Conv, [512, 3, 2]],
44
  [[-1, 10], 1, Concat, [1]], # cat head P5
45
+ [-1, 3, C3, [1024, False]], # 23 (P5/32-large)
46
 
47
  [[17, 20, 23], 1, Detect, [nc, anchors]], # Detect(P3, P4, P5)
48
  ]
models/yolov5s.yaml CHANGED
@@ -14,14 +14,14 @@ backbone:
14
  # [from, number, module, args]
15
  [[-1, 1, Focus, [64, 3]], # 0-P1/2
16
  [-1, 1, Conv, [128, 3, 2]], # 1-P2/4
17
- [-1, 3, BottleneckCSP, [128]],
18
  [-1, 1, Conv, [256, 3, 2]], # 3-P3/8
19
- [-1, 9, BottleneckCSP, [256]],
20
  [-1, 1, Conv, [512, 3, 2]], # 5-P4/16
21
- [-1, 9, BottleneckCSP, [512]],
22
  [-1, 1, Conv, [1024, 3, 2]], # 7-P5/32
23
  [-1, 1, SPP, [1024, [5, 9, 13]]],
24
- [-1, 3, BottleneckCSP, [1024, False]], # 9
25
  ]
26
 
27
  # YOLOv5 head
@@ -29,20 +29,20 @@ head:
29
  [[-1, 1, Conv, [512, 1, 1]],
30
  [-1, 1, nn.Upsample, [None, 2, 'nearest']],
31
  [[-1, 6], 1, Concat, [1]], # cat backbone P4
32
- [-1, 3, BottleneckCSP, [512, False]], # 13
33
 
34
  [-1, 1, Conv, [256, 1, 1]],
35
  [-1, 1, nn.Upsample, [None, 2, 'nearest']],
36
  [[-1, 4], 1, Concat, [1]], # cat backbone P3
37
- [-1, 3, BottleneckCSP, [256, False]], # 17 (P3/8-small)
38
 
39
  [-1, 1, Conv, [256, 3, 2]],
40
  [[-1, 14], 1, Concat, [1]], # cat head P4
41
- [-1, 3, BottleneckCSP, [512, False]], # 20 (P4/16-medium)
42
 
43
  [-1, 1, Conv, [512, 3, 2]],
44
  [[-1, 10], 1, Concat, [1]], # cat head P5
45
- [-1, 3, BottleneckCSP, [1024, False]], # 23 (P5/32-large)
46
 
47
  [[17, 20, 23], 1, Detect, [nc, anchors]], # Detect(P3, P4, P5)
48
  ]
 
14
  # [from, number, module, args]
15
  [[-1, 1, Focus, [64, 3]], # 0-P1/2
16
  [-1, 1, Conv, [128, 3, 2]], # 1-P2/4
17
+ [-1, 3, C3, [128]],
18
  [-1, 1, Conv, [256, 3, 2]], # 3-P3/8
19
+ [-1, 9, C3, [256]],
20
  [-1, 1, Conv, [512, 3, 2]], # 5-P4/16
21
+ [-1, 9, C3, [512]],
22
  [-1, 1, Conv, [1024, 3, 2]], # 7-P5/32
23
  [-1, 1, SPP, [1024, [5, 9, 13]]],
24
+ [-1, 3, C3, [1024, False]], # 9
25
  ]
26
 
27
  # YOLOv5 head
 
29
  [[-1, 1, Conv, [512, 1, 1]],
30
  [-1, 1, nn.Upsample, [None, 2, 'nearest']],
31
  [[-1, 6], 1, Concat, [1]], # cat backbone P4
32
+ [-1, 3, C3, [512, False]], # 13
33
 
34
  [-1, 1, Conv, [256, 1, 1]],
35
  [-1, 1, nn.Upsample, [None, 2, 'nearest']],
36
  [[-1, 4], 1, Concat, [1]], # cat backbone P3
37
+ [-1, 3, C3, [256, False]], # 17 (P3/8-small)
38
 
39
  [-1, 1, Conv, [256, 3, 2]],
40
  [[-1, 14], 1, Concat, [1]], # cat head P4
41
+ [-1, 3, C3, [512, False]], # 20 (P4/16-medium)
42
 
43
  [-1, 1, Conv, [512, 3, 2]],
44
  [[-1, 10], 1, Concat, [1]], # cat head P5
45
+ [-1, 3, C3, [1024, False]], # 23 (P5/32-large)
46
 
47
  [[17, 20, 23], 1, Detect, [nc, anchors]], # Detect(P3, P4, P5)
48
  ]
models/yolov5x.yaml CHANGED
@@ -14,14 +14,14 @@ backbone:
14
  # [from, number, module, args]
15
  [[-1, 1, Focus, [64, 3]], # 0-P1/2
16
  [-1, 1, Conv, [128, 3, 2]], # 1-P2/4
17
- [-1, 3, BottleneckCSP, [128]],
18
  [-1, 1, Conv, [256, 3, 2]], # 3-P3/8
19
- [-1, 9, BottleneckCSP, [256]],
20
  [-1, 1, Conv, [512, 3, 2]], # 5-P4/16
21
- [-1, 9, BottleneckCSP, [512]],
22
  [-1, 1, Conv, [1024, 3, 2]], # 7-P5/32
23
  [-1, 1, SPP, [1024, [5, 9, 13]]],
24
- [-1, 3, BottleneckCSP, [1024, False]], # 9
25
  ]
26
 
27
  # YOLOv5 head
@@ -29,20 +29,20 @@ head:
29
  [[-1, 1, Conv, [512, 1, 1]],
30
  [-1, 1, nn.Upsample, [None, 2, 'nearest']],
31
  [[-1, 6], 1, Concat, [1]], # cat backbone P4
32
- [-1, 3, BottleneckCSP, [512, False]], # 13
33
 
34
  [-1, 1, Conv, [256, 1, 1]],
35
  [-1, 1, nn.Upsample, [None, 2, 'nearest']],
36
  [[-1, 4], 1, Concat, [1]], # cat backbone P3
37
- [-1, 3, BottleneckCSP, [256, False]], # 17 (P3/8-small)
38
 
39
  [-1, 1, Conv, [256, 3, 2]],
40
  [[-1, 14], 1, Concat, [1]], # cat head P4
41
- [-1, 3, BottleneckCSP, [512, False]], # 20 (P4/16-medium)
42
 
43
  [-1, 1, Conv, [512, 3, 2]],
44
  [[-1, 10], 1, Concat, [1]], # cat head P5
45
- [-1, 3, BottleneckCSP, [1024, False]], # 23 (P5/32-large)
46
 
47
  [[17, 20, 23], 1, Detect, [nc, anchors]], # Detect(P3, P4, P5)
48
  ]
 
14
  # [from, number, module, args]
15
  [[-1, 1, Focus, [64, 3]], # 0-P1/2
16
  [-1, 1, Conv, [128, 3, 2]], # 1-P2/4
17
+ [-1, 3, C3, [128]],
18
  [-1, 1, Conv, [256, 3, 2]], # 3-P3/8
19
+ [-1, 9, C3, [256]],
20
  [-1, 1, Conv, [512, 3, 2]], # 5-P4/16
21
+ [-1, 9, C3, [512]],
22
  [-1, 1, Conv, [1024, 3, 2]], # 7-P5/32
23
  [-1, 1, SPP, [1024, [5, 9, 13]]],
24
+ [-1, 3, C3, [1024, False]], # 9
25
  ]
26
 
27
  # YOLOv5 head
 
29
  [[-1, 1, Conv, [512, 1, 1]],
30
  [-1, 1, nn.Upsample, [None, 2, 'nearest']],
31
  [[-1, 6], 1, Concat, [1]], # cat backbone P4
32
+ [-1, 3, C3, [512, False]], # 13
33
 
34
  [-1, 1, Conv, [256, 1, 1]],
35
  [-1, 1, nn.Upsample, [None, 2, 'nearest']],
36
  [[-1, 4], 1, Concat, [1]], # cat backbone P3
37
+ [-1, 3, C3, [256, False]], # 17 (P3/8-small)
38
 
39
  [-1, 1, Conv, [256, 3, 2]],
40
  [[-1, 14], 1, Concat, [1]], # cat head P4
41
+ [-1, 3, C3, [512, False]], # 20 (P4/16-medium)
42
 
43
  [-1, 1, Conv, [512, 3, 2]],
44
  [[-1, 10], 1, Concat, [1]], # cat head P5
45
+ [-1, 3, C3, [1024, False]], # 23 (P5/32-large)
46
 
47
  [[17, 20, 23], 1, Detect, [nc, anchors]], # Detect(P3, P4, P5)
48
  ]
train.py CHANGED
@@ -104,6 +104,7 @@ def train(hyp, opt, device, tb_writer=None, wandb=None):
104
  nbs = 64 # nominal batch size
105
  accumulate = max(round(nbs / total_batch_size), 1) # accumulate loss before optimizing
106
  hyp['weight_decay'] *= total_batch_size * accumulate / nbs # scale weight_decay
 
107
 
108
  pg0, pg1, pg2 = [], [], [] # optimizer parameter groups
109
  for k, v in model.named_modules():
@@ -164,7 +165,8 @@ def train(hyp, opt, device, tb_writer=None, wandb=None):
164
  del ckpt, state_dict
165
 
166
  # Image sizes
167
- gs = int(max(model.stride)) # grid size (max stride)
 
168
  imgsz, imgsz_test = [check_img_size(x, gs) for x in opt.img_size] # verify imgsz are gs-multiples
169
 
170
  # DP mode
@@ -187,7 +189,7 @@ def train(hyp, opt, device, tb_writer=None, wandb=None):
187
  dataloader, dataset = create_dataloader(train_path, imgsz, batch_size, gs, opt,
188
  hyp=hyp, augment=True, cache=opt.cache_images, rect=opt.rect, rank=rank,
189
  world_size=opt.world_size, workers=opt.workers,
190
- image_weights=opt.image_weights)
191
  mlc = np.concatenate(dataset.labels, 0)[:, 0].max() # max label class
192
  nb = len(dataloader) # number of batches
193
  assert mlc < nc, 'Label class %g exceeds nc=%g in %s. Possible class labels are 0-%g' % (mlc, nc, opt.data, nc - 1)
@@ -214,7 +216,8 @@ def train(hyp, opt, device, tb_writer=None, wandb=None):
214
  check_anchors(dataset, model=model, thr=hyp['anchor_t'], imgsz=imgsz)
215
 
216
  # Model parameters
217
- hyp['cls'] *= nc / 80. # scale coco-tuned hyp['cls'] to current dataset
 
218
  model.nc = nc # attach number of classes to model
219
  model.hyp = hyp # attach hyperparameters to model
220
  model.gr = 1.0 # iou loss ratio (obj_loss = 1.0 or iou)
@@ -290,6 +293,8 @@ def train(hyp, opt, device, tb_writer=None, wandb=None):
290
  loss, loss_items = compute_loss(pred, targets.to(device), model) # loss scaled by batch_size
291
  if rank != -1:
292
  loss *= opt.world_size # gradient averaged between devices in DDP mode
 
 
293
 
294
  # Backward
295
  scaler.scale(loss).backward()
@@ -458,10 +463,10 @@ if __name__ == '__main__':
458
  parser.add_argument('--project', default='runs/train', help='save to project/name')
459
  parser.add_argument('--name', default='exp', help='save to project/name')
460
  parser.add_argument('--exist-ok', action='store_true', help='existing project/name ok, do not increment')
 
461
  opt = parser.parse_args()
462
 
463
  # Set DDP variables
464
- opt.total_batch_size = opt.batch_size
465
  opt.world_size = int(os.environ['WORLD_SIZE']) if 'WORLD_SIZE' in os.environ else 1
466
  opt.global_rank = int(os.environ['RANK']) if 'RANK' in os.environ else -1
467
  set_logging(opt.global_rank)
@@ -486,6 +491,7 @@ if __name__ == '__main__':
486
  opt.save_dir = increment_path(Path(opt.project) / opt.name, exist_ok=opt.exist_ok | opt.evolve) # increment run
487
 
488
  # DDP mode
 
489
  device = select_device(opt.device, batch_size=opt.batch_size)
490
  if opt.local_rank != -1:
491
  assert torch.cuda.device_count() > opt.local_rank
 
104
  nbs = 64 # nominal batch size
105
  accumulate = max(round(nbs / total_batch_size), 1) # accumulate loss before optimizing
106
  hyp['weight_decay'] *= total_batch_size * accumulate / nbs # scale weight_decay
107
+ logger.info(f"Scaled weight_decay = {hyp['weight_decay']}")
108
 
109
  pg0, pg1, pg2 = [], [], [] # optimizer parameter groups
110
  for k, v in model.named_modules():
 
165
  del ckpt, state_dict
166
 
167
  # Image sizes
168
+ gs = int(model.stride.max()) # grid size (max stride)
169
+ nl = model.model[-1].nl # number of detection layers (used for scaling hyp['obj'])
170
  imgsz, imgsz_test = [check_img_size(x, gs) for x in opt.img_size] # verify imgsz are gs-multiples
171
 
172
  # DP mode
 
189
  dataloader, dataset = create_dataloader(train_path, imgsz, batch_size, gs, opt,
190
  hyp=hyp, augment=True, cache=opt.cache_images, rect=opt.rect, rank=rank,
191
  world_size=opt.world_size, workers=opt.workers,
192
+ image_weights=opt.image_weights, quad=opt.quad)
193
  mlc = np.concatenate(dataset.labels, 0)[:, 0].max() # max label class
194
  nb = len(dataloader) # number of batches
195
  assert mlc < nc, 'Label class %g exceeds nc=%g in %s. Possible class labels are 0-%g' % (mlc, nc, opt.data, nc - 1)
 
216
  check_anchors(dataset, model=model, thr=hyp['anchor_t'], imgsz=imgsz)
217
 
218
  # Model parameters
219
+ hyp['cls'] *= nc / 80. # scale hyp['cls'] to class count
220
+ hyp['obj'] *= imgsz ** 2 / 640. ** 2 * 3. / nl # scale hyp['obj'] to image size and output layers
221
  model.nc = nc # attach number of classes to model
222
  model.hyp = hyp # attach hyperparameters to model
223
  model.gr = 1.0 # iou loss ratio (obj_loss = 1.0 or iou)
 
293
  loss, loss_items = compute_loss(pred, targets.to(device), model) # loss scaled by batch_size
294
  if rank != -1:
295
  loss *= opt.world_size # gradient averaged between devices in DDP mode
296
+ if opt.quad:
297
+ loss *= 4.
298
 
299
  # Backward
300
  scaler.scale(loss).backward()
 
463
  parser.add_argument('--project', default='runs/train', help='save to project/name')
464
  parser.add_argument('--name', default='exp', help='save to project/name')
465
  parser.add_argument('--exist-ok', action='store_true', help='existing project/name ok, do not increment')
466
+ parser.add_argument('--quad', action='store_true', help='quad dataloader')
467
  opt = parser.parse_args()
468
 
469
  # Set DDP variables
 
470
  opt.world_size = int(os.environ['WORLD_SIZE']) if 'WORLD_SIZE' in os.environ else 1
471
  opt.global_rank = int(os.environ['RANK']) if 'RANK' in os.environ else -1
472
  set_logging(opt.global_rank)
 
491
  opt.save_dir = increment_path(Path(opt.project) / opt.name, exist_ok=opt.exist_ok | opt.evolve) # increment run
492
 
493
  # DDP mode
494
+ opt.total_batch_size = opt.batch_size
495
  device = select_device(opt.device, batch_size=opt.batch_size)
496
  if opt.local_rank != -1:
497
  assert torch.cuda.device_count() > opt.local_rank
utils/autoanchor.py CHANGED
@@ -110,6 +110,7 @@ def kmean_anchors(path='./data/coco128.yaml', n=9, img_size=640, thr=4.0, gen=10
110
  print('WARNING: Extremely small objects found. '
111
  '%g of %g labels are < 3 pixels in width or height.' % (i, len(wh0)))
112
  wh = wh0[(wh0 >= 2.0).any(1)] # filter > 2 pixels
 
113
 
114
  # Kmeans calculation
115
  print('Running kmeans for %g anchors on %g points...' % (n, len(wh)))
 
110
  print('WARNING: Extremely small objects found. '
111
  '%g of %g labels are < 3 pixels in width or height.' % (i, len(wh0)))
112
  wh = wh0[(wh0 >= 2.0).any(1)] # filter > 2 pixels
113
+ # wh = wh * (np.random.rand(wh.shape[0], 1) * 0.9 + 0.1) # multiply by random scale 0-1
114
 
115
  # Kmeans calculation
116
  print('Running kmeans for %g anchors on %g points...' % (n, len(wh)))
utils/datasets.py CHANGED
@@ -15,6 +15,7 @@ from threading import Thread
15
  import cv2
16
  import numpy as np
17
  import torch
 
18
  from PIL import Image, ExifTags
19
  from torch.utils.data import Dataset
20
  from tqdm import tqdm
@@ -55,7 +56,7 @@ def exif_size(img):
55
 
56
 
57
  def create_dataloader(path, imgsz, batch_size, stride, opt, hyp=None, augment=False, cache=False, pad=0.0, rect=False,
58
- rank=-1, world_size=1, workers=8, image_weights=False):
59
  # Make sure only the first process in DDP process the dataset first, and the following others can use the cache
60
  with torch_distributed_zero_first(rank):
61
  dataset = LoadImagesAndLabels(path, imgsz, batch_size,
@@ -79,7 +80,7 @@ def create_dataloader(path, imgsz, batch_size, stride, opt, hyp=None, augment=Fa
79
  num_workers=nw,
80
  sampler=sampler,
81
  pin_memory=True,
82
- collate_fn=LoadImagesAndLabels.collate_fn)
83
  return dataloader, dataset
84
 
85
 
@@ -578,6 +579,32 @@ class LoadImagesAndLabels(Dataset): # for training/testing
578
  l[:, 0] = i # add target image index for build_targets()
579
  return torch.stack(img, 0), torch.cat(label, 0), path, shapes
580
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
581
 
582
  # Ancillary functions --------------------------------------------------------------------------------------------------
583
  def load_image(self, index):
@@ -617,7 +644,7 @@ def augment_hsv(img, hgain=0.5, sgain=0.5, vgain=0.5):
617
 
618
 
619
  def load_mosaic(self, index):
620
- # loads images in a mosaic
621
 
622
  labels4 = []
623
  s = self.img_size
@@ -674,6 +701,80 @@ def load_mosaic(self, index):
674
  return img4, labels4
675
 
676
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
677
  def replicate(img, labels):
678
  # Replicate labels
679
  h, w = img.shape[:2]
@@ -811,12 +912,12 @@ def random_perspective(img, targets=(), degrees=10, translate=.1, scale=.1, shea
811
  return img, targets
812
 
813
 
814
- def box_candidates(box1, box2, wh_thr=2, ar_thr=20, area_thr=0.1): # box1(4,n), box2(4,n)
815
  # Compute candidate boxes: box1 before augment, box2 after augment, wh_thr (pixels), aspect_ratio_thr, area_ratio
816
  w1, h1 = box1[2] - box1[0], box1[3] - box1[1]
817
  w2, h2 = box2[2] - box2[0], box2[3] - box2[1]
818
- ar = np.maximum(w2 / (h2 + 1e-16), h2 / (w2 + 1e-16)) # aspect ratio
819
- return (w2 > wh_thr) & (h2 > wh_thr) & (w2 * h2 / (w1 * h1 + 1e-16) > area_thr) & (ar < ar_thr) # candidates
820
 
821
 
822
  def cutout(image, labels):
 
15
  import cv2
16
  import numpy as np
17
  import torch
18
+ import torch.nn.functional as F
19
  from PIL import Image, ExifTags
20
  from torch.utils.data import Dataset
21
  from tqdm import tqdm
 
56
 
57
 
58
  def create_dataloader(path, imgsz, batch_size, stride, opt, hyp=None, augment=False, cache=False, pad=0.0, rect=False,
59
+ rank=-1, world_size=1, workers=8, image_weights=False, quad=False):
60
  # Make sure only the first process in DDP process the dataset first, and the following others can use the cache
61
  with torch_distributed_zero_first(rank):
62
  dataset = LoadImagesAndLabels(path, imgsz, batch_size,
 
80
  num_workers=nw,
81
  sampler=sampler,
82
  pin_memory=True,
83
+ collate_fn=LoadImagesAndLabels.collate_fn4 if quad else LoadImagesAndLabels.collate_fn)
84
  return dataloader, dataset
85
 
86
 
 
579
  l[:, 0] = i # add target image index for build_targets()
580
  return torch.stack(img, 0), torch.cat(label, 0), path, shapes
581
 
582
+ @staticmethod
583
+ def collate_fn4(batch):
584
+ img, label, path, shapes = zip(*batch) # transposed
585
+ n = len(shapes) // 4
586
+ img4, label4, path4, shapes4 = [], [], path[:n], shapes[:n]
587
+
588
+ ho = torch.tensor([[0., 0, 0, 1, 0, 0]])
589
+ wo = torch.tensor([[0., 0, 1, 0, 0, 0]])
590
+ s = torch.tensor([[1, 1, .5, .5, .5, .5]]) # scale
591
+ for i in range(n): # zidane torch.zeros(16,3,720,1280) # BCHW
592
+ i *= 4
593
+ if random.random() < 0.5:
594
+ im = F.interpolate(img[i].unsqueeze(0).float(), scale_factor=2., mode='bilinear', align_corners=False)[
595
+ 0].type(img[i].type())
596
+ l = label[i]
597
+ else:
598
+ im = torch.cat((torch.cat((img[i], img[i + 1]), 1), torch.cat((img[i + 2], img[i + 3]), 1)), 2)
599
+ l = torch.cat((label[i], label[i + 1] + ho, label[i + 2] + wo, label[i + 3] + ho + wo), 0) * s
600
+ img4.append(im)
601
+ label4.append(l)
602
+
603
+ for i, l in enumerate(label4):
604
+ l[:, 0] = i # add target image index for build_targets()
605
+
606
+ return torch.stack(img4, 0), torch.cat(label4, 0), path4, shapes4
607
+
608
 
609
  # Ancillary functions --------------------------------------------------------------------------------------------------
610
  def load_image(self, index):
 
644
 
645
 
646
  def load_mosaic(self, index):
647
+ # loads images in a 4-mosaic
648
 
649
  labels4 = []
650
  s = self.img_size
 
701
  return img4, labels4
702
 
703
 
704
+ def load_mosaic9(self, index):
705
+ # loads images in a 9-mosaic
706
+
707
+ labels9 = []
708
+ s = self.img_size
709
+ indices = [index] + [self.indices[random.randint(0, self.n - 1)] for _ in range(8)] # 8 additional image indices
710
+ for i, index in enumerate(indices):
711
+ # Load image
712
+ img, _, (h, w) = load_image(self, index)
713
+
714
+ # place img in img9
715
+ if i == 0: # center
716
+ img9 = np.full((s * 3, s * 3, img.shape[2]), 114, dtype=np.uint8) # base image with 4 tiles
717
+ h0, w0 = h, w
718
+ c = s, s, s + w, s + h # xmin, ymin, xmax, ymax (base) coordinates
719
+ elif i == 1: # top
720
+ c = s, s - h, s + w, s
721
+ elif i == 2: # top right
722
+ c = s + wp, s - h, s + wp + w, s
723
+ elif i == 3: # right
724
+ c = s + w0, s, s + w0 + w, s + h
725
+ elif i == 4: # bottom right
726
+ c = s + w0, s + hp, s + w0 + w, s + hp + h
727
+ elif i == 5: # bottom
728
+ c = s + w0 - w, s + h0, s + w0, s + h0 + h
729
+ elif i == 6: # bottom left
730
+ c = s + w0 - wp - w, s + h0, s + w0 - wp, s + h0 + h
731
+ elif i == 7: # left
732
+ c = s - w, s + h0 - h, s, s + h0
733
+ elif i == 8: # top left
734
+ c = s - w, s + h0 - hp - h, s, s + h0 - hp
735
+
736
+ padx, pady = c[:2]
737
+ x1, y1, x2, y2 = [max(x, 0) for x in c] # allocate coords
738
+
739
+ # Labels
740
+ x = self.labels[index]
741
+ labels = x.copy()
742
+ if x.size > 0: # Normalized xywh to pixel xyxy format
743
+ labels[:, 1] = w * (x[:, 1] - x[:, 3] / 2) + padx
744
+ labels[:, 2] = h * (x[:, 2] - x[:, 4] / 2) + pady
745
+ labels[:, 3] = w * (x[:, 1] + x[:, 3] / 2) + padx
746
+ labels[:, 4] = h * (x[:, 2] + x[:, 4] / 2) + pady
747
+ labels9.append(labels)
748
+
749
+ # Image
750
+ img9[y1:y2, x1:x2] = img[y1 - pady:, x1 - padx:] # img9[ymin:ymax, xmin:xmax]
751
+ hp, wp = h, w # height, width previous
752
+
753
+ # Offset
754
+ yc, xc = [int(random.uniform(0, s)) for x in self.mosaic_border] # mosaic center x, y
755
+ img9 = img9[yc:yc + 2 * s, xc:xc + 2 * s]
756
+
757
+ # Concat/clip labels
758
+ if len(labels9):
759
+ labels9 = np.concatenate(labels9, 0)
760
+ labels9[:, [1, 3]] -= xc
761
+ labels9[:, [2, 4]] -= yc
762
+
763
+ np.clip(labels9[:, 1:], 0, 2 * s, out=labels9[:, 1:]) # use with random_perspective
764
+ # img9, labels9 = replicate(img9, labels9) # replicate
765
+
766
+ # Augment
767
+ img9, labels9 = random_perspective(img9, labels9,
768
+ degrees=self.hyp['degrees'],
769
+ translate=self.hyp['translate'],
770
+ scale=self.hyp['scale'],
771
+ shear=self.hyp['shear'],
772
+ perspective=self.hyp['perspective'],
773
+ border=self.mosaic_border) # border to remove
774
+
775
+ return img9, labels9
776
+
777
+
778
  def replicate(img, labels):
779
  # Replicate labels
780
  h, w = img.shape[:2]
 
912
  return img, targets
913
 
914
 
915
+ def box_candidates(box1, box2, wh_thr=2, ar_thr=20, area_thr=0.1, eps=1e-16): # box1(4,n), box2(4,n)
916
  # Compute candidate boxes: box1 before augment, box2 after augment, wh_thr (pixels), aspect_ratio_thr, area_ratio
917
  w1, h1 = box1[2] - box1[0], box1[3] - box1[1]
918
  w2, h2 = box2[2] - box2[0], box2[3] - box2[1]
919
+ ar = np.maximum(w2 / (h2 + eps), h2 / (w2 + eps)) # aspect ratio
920
+ return (w2 > wh_thr) & (h2 > wh_thr) & (w2 * h2 / (w1 * h1 + eps) > area_thr) & (ar < ar_thr) # candidates
921
 
922
 
923
  def cutout(image, labels):
utils/general.py CHANGED
@@ -281,6 +281,7 @@ def non_max_suppression(prediction, conf_thres=0.25, iou_thres=0.45, classes=Non
281
  # Settings
282
  min_wh, max_wh = 2, 4096 # (pixels) minimum and maximum box width and height
283
  max_det = 300 # maximum number of detections per image
 
284
  time_limit = 10.0 # seconds to quit after
285
  redundant = True # require redundant detections
286
  multi_label = nc > 1 # multiple labels per box (adds 0.5ms/img)
@@ -328,13 +329,12 @@ def non_max_suppression(prediction, conf_thres=0.25, iou_thres=0.45, classes=Non
328
  # if not torch.isfinite(x).all():
329
  # x = x[torch.isfinite(x).all(1)]
330
 
331
- # If none remain process next image
332
  n = x.shape[0] # number of boxes
333
- if not n:
334
  continue
335
-
336
- # Sort by confidence
337
- # x = x[x[:, 4].argsort(descending=True)]
338
 
339
  # Batched NMS
340
  c = x[:, 5:6] * (0 if agnostic else max_wh) # classes
@@ -352,6 +352,7 @@ def non_max_suppression(prediction, conf_thres=0.25, iou_thres=0.45, classes=Non
352
 
353
  output[xi] = x[i]
354
  if (time.time() - t) > time_limit:
 
355
  break # time limit exceeded
356
 
357
  return output
 
281
  # Settings
282
  min_wh, max_wh = 2, 4096 # (pixels) minimum and maximum box width and height
283
  max_det = 300 # maximum number of detections per image
284
+ max_nms = 30000 # maximum number of boxes into torchvision.ops.nms()
285
  time_limit = 10.0 # seconds to quit after
286
  redundant = True # require redundant detections
287
  multi_label = nc > 1 # multiple labels per box (adds 0.5ms/img)
 
329
  # if not torch.isfinite(x).all():
330
  # x = x[torch.isfinite(x).all(1)]
331
 
332
+ # Check shape
333
  n = x.shape[0] # number of boxes
334
+ if not n: # no boxes
335
  continue
336
+ elif n > max_nms: # excess boxes
337
+ x = x[x[:, 4].argsort(descending=True)[:max_nms]] # sort by confidence
 
338
 
339
  # Batched NMS
340
  c = x[:, 5:6] * (0 if agnostic else max_wh) # classes
 
352
 
353
  output[xi] = x[i]
354
  if (time.time() - t) > time_limit:
355
+ print(f'WARNING: NMS time limit {time_limit}s exceeded')
356
  break # time limit exceeded
357
 
358
  return output
utils/google_utils.py CHANGED
@@ -6,6 +6,7 @@ import subprocess
6
  import time
7
  from pathlib import Path
8
 
 
9
  import torch
10
 
11
 
@@ -21,21 +22,14 @@ def attempt_download(weights):
21
  file = Path(weights).name.lower()
22
 
23
  msg = weights + ' missing, try downloading from https://github.com/ultralytics/yolov5/releases/'
24
- models = ['yolov5s.pt', 'yolov5m.pt', 'yolov5l.pt', 'yolov5x.pt'] # available models
25
- redundant = False # offer second download option
26
-
27
- if file in models and not os.path.isfile(weights):
28
- # Google Drive
29
- # d = {'yolov5s.pt': '1R5T6rIyy3lLwgFXNms8whc-387H0tMQO',
30
- # 'yolov5m.pt': '1vobuEExpWQVpXExsJ2w-Mbf3HJjWkQJr',
31
- # 'yolov5l.pt': '1hrlqD1Wdei7UT4OgT785BEk1JwnSvNEV',
32
- # 'yolov5x.pt': '1mM8aZJlWTxOg7BZJvNUMrTnA2AbeCVzS'}
33
- # r = gdrive_download(id=d[file], name=weights) if file in d else 1
34
- # if r == 0 and os.path.exists(weights) and os.path.getsize(weights) > 1E6: # check
35
- # return
36
 
 
37
  try: # GitHub
38
- url = 'https://github.com/ultralytics/yolov5/releases/download/v3.1/' + file
 
39
  print('Downloading %s to %s...' % (url, weights))
40
  torch.hub.download_url_to_file(url, weights)
41
  assert os.path.exists(weights) and os.path.getsize(weights) > 1E6 # check
@@ -53,10 +47,9 @@ def attempt_download(weights):
53
  return
54
 
55
 
56
- def gdrive_download(id='1uH2BylpFxHKEGXKL6wJJlsgMU2YEjxuc', name='tmp.zip'):
57
- # Downloads a file from Google Drive. from utils.google_utils import *; gdrive_download()
58
  t = time.time()
59
-
60
  print('Downloading https://drive.google.com/uc?export=download&id=%s as %s... ' % (id, name), end='')
61
  os.remove(name) if os.path.exists(name) else None # remove existing
62
  os.remove('cookie') if os.path.exists('cookie') else None
 
6
  import time
7
  from pathlib import Path
8
 
9
+ import requests
10
  import torch
11
 
12
 
 
22
  file = Path(weights).name.lower()
23
 
24
  msg = weights + ' missing, try downloading from https://github.com/ultralytics/yolov5/releases/'
25
+ response = requests.get('https://api.github.com/repos/ultralytics/yolov5/releases/latest').json() # github api
26
+ assets = [x['name'] for x in response['assets']] # release assets, i.e. ['yolov5s.pt', 'yolov5m.pt', ...]
27
+ redundant = False # second download option
 
 
 
 
 
 
 
 
 
28
 
29
+ if file in assets and not os.path.isfile(weights):
30
  try: # GitHub
31
+ tag = response['tag_name'] # i.e. 'v1.0'
32
+ url = f'https://github.com/ultralytics/yolov5/releases/download/{tag}/{file}'
33
  print('Downloading %s to %s...' % (url, weights))
34
  torch.hub.download_url_to_file(url, weights)
35
  assert os.path.exists(weights) and os.path.getsize(weights) > 1E6 # check
 
47
  return
48
 
49
 
50
+ def gdrive_download(id='16TiPfZj7htmTyhntwcZyEEAejOUxuT6m', name='tmp.zip'):
51
+ # Downloads a file from Google Drive. from yolov5.utils.google_utils import *; gdrive_download()
52
  t = time.time()
 
53
  print('Downloading https://drive.google.com/uc?export=download&id=%s as %s... ' % (id, name), end='')
54
  os.remove(name) if os.path.exists(name) else None # remove existing
55
  os.remove('cookie') if os.path.exists('cookie') else None
utils/loss.py CHANGED
@@ -106,7 +106,7 @@ def compute_loss(p, targets, model): # predictions, targets, model
106
  # Losses
107
  nt = 0 # number of targets
108
  no = len(p) # number of outputs
109
- balance = [4.0, 1.0, 0.4] if no == 3 else [4.0, 1.0, 0.4, 0.1] # P3-5 or P3-6
110
  for i, pi in enumerate(p): # layer index, layer predictions
111
  b, a, gj, gi = indices[i] # image, anchor, gridy, gridx
112
  tobj = torch.zeros_like(pi[..., 0], device=device) # target obj
@@ -140,7 +140,7 @@ def compute_loss(p, targets, model): # predictions, targets, model
140
 
141
  s = 3 / no # output count scaling
142
  lbox *= h['box'] * s
143
- lobj *= h['obj'] * s * (1.4 if no == 4 else 1.)
144
  lcls *= h['cls'] * s
145
  bs = tobj.shape[0] # batch size
146
 
 
106
  # Losses
107
  nt = 0 # number of targets
108
  no = len(p) # number of outputs
109
+ balance = [4.0, 1.0, 0.3, 0.1, 0.03] # P3-P7
110
  for i, pi in enumerate(p): # layer index, layer predictions
111
  b, a, gj, gi = indices[i] # image, anchor, gridy, gridx
112
  tobj = torch.zeros_like(pi[..., 0], device=device) # target obj
 
140
 
141
  s = 3 / no # output count scaling
142
  lbox *= h['box'] * s
143
+ lobj *= h['obj']
144
  lcls *= h['cls'] * s
145
  bs = tobj.shape[0] # batch size
146
 
utils/plots.py CHANGED
@@ -223,7 +223,7 @@ def plot_targets_txt(): # from utils.plots import *; plot_targets_txt()
223
  plt.savefig('targets.jpg', dpi=200)
224
 
225
 
226
- def plot_study_txt(path='', x=None): # from utils.plots import *; plot_study_txt()
227
  # Plot study.txt generated by test.py
228
  fig, ax = plt.subplots(2, 4, figsize=(10, 6), tight_layout=True)
229
  ax = ax.ravel()
@@ -246,7 +246,7 @@ def plot_study_txt(path='', x=None): # from utils.plots import *; plot_study_tx
246
 
247
  ax2.grid()
248
  ax2.set_xlim(0, 30)
249
- ax2.set_ylim(28, 50)
250
  ax2.set_yticks(np.arange(30, 55, 5))
251
  ax2.set_xlabel('GPU Speed (ms/img)')
252
  ax2.set_ylabel('COCO AP val')
 
223
  plt.savefig('targets.jpg', dpi=200)
224
 
225
 
226
+ def plot_study_txt(path='study/', x=None): # from utils.plots import *; plot_study_txt()
227
  # Plot study.txt generated by test.py
228
  fig, ax = plt.subplots(2, 4, figsize=(10, 6), tight_layout=True)
229
  ax = ax.ravel()
 
246
 
247
  ax2.grid()
248
  ax2.set_xlim(0, 30)
249
+ ax2.set_ylim(29, 51)
250
  ax2.set_yticks(np.arange(30, 55, 5))
251
  ax2.set_xlabel('GPU Speed (ms/img)')
252
  ax2.set_ylabel('COCO AP val')
utils/torch_utils.py CHANGED
@@ -225,8 +225,8 @@ def load_classifier(name='resnet101', n=2):
225
  return model
226
 
227
 
228
- def scale_img(img, ratio=1.0, same_shape=False): # img(16,3,256,416), r=ratio
229
- # scales img(bs,3,y,x) by ratio
230
  if ratio == 1.0:
231
  return img
232
  else:
@@ -234,7 +234,6 @@ def scale_img(img, ratio=1.0, same_shape=False): # img(16,3,256,416), r=ratio
234
  s = (int(h * ratio), int(w * ratio)) # new size
235
  img = F.interpolate(img, size=s, mode='bilinear', align_corners=False) # resize
236
  if not same_shape: # pad/crop img
237
- gs = 32 # (pixels) grid size
238
  h, w = [math.ceil(x * ratio / gs) * gs for x in (h, w)]
239
  return F.pad(img, [0, w - s[1], 0, h - s[0]], value=0.447) # value = imagenet mean
240
 
 
225
  return model
226
 
227
 
228
+ def scale_img(img, ratio=1.0, same_shape=False, gs=32): # img(16,3,256,416)
229
+ # scales img(bs,3,y,x) by ratio constrained to gs-multiple
230
  if ratio == 1.0:
231
  return img
232
  else:
 
234
  s = (int(h * ratio), int(w * ratio)) # new size
235
  img = F.interpolate(img, size=s, mode='bilinear', align_corners=False) # resize
236
  if not same_shape: # pad/crop img
 
237
  h, w = [math.ceil(x * ratio / gs) * gs for x in (h, w)]
238
  return F.pad(img, [0, w - s[1], 0, h - s[0]], value=0.447) # value = imagenet mean
239