josedolot commited on
Commit
9a9ed43
·
1 Parent(s): ad8f0b6

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +393 -0
README.md ADDED
@@ -0,0 +1,393 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # HybridNets: End2End Perception Network
2
+
3
+
4
+ <div align="center">
5
+
6
+ ![logo](images/hybridnets.jpg)
7
+ **HybridNets Network Architecture.**
8
+
9
+ [![Generic badge](https://img.shields.io/badge/License-MIT-<COLOR>.svg?style=for-the-badge)](https://github.com/datvuthanh/HybridNets/blob/main/LICENSE)
10
+ [![PyTorch - Version](https://img.shields.io/badge/PYTORCH-1.10+-red?style=for-the-badge&logo=pytorch)](https://pytorch.org/get-started/locally/)
11
+ [![Python - Version](https://img.shields.io/badge/PYTHON-3.7+-red?style=for-the-badge&logo=python&logoColor=white)](https://www.python.org/downloads/)
12
+ <br>
13
+ <!-- [![Contributors][contributors-shield]][contributors-url]
14
+ [![Forks][forks-shield]][forks-url]
15
+ [![Stargazers][stars-shield]][stars-url]
16
+ [![Issues][issues-shield]][issues-url] -->
17
+
18
+ </div>
19
+
20
+ > [**HybridNets: End-to-End Perception Network**](https://arxiv.org/abs/2203.09035)
21
+ >
22
+ > by Dat Vu, Bao Ngo, [Hung Phan](https://scholar.google.com/citations?user=V3paQH8AAAAJ&hl=vi&oi=ao)<sup> :email:</sup> [*FPT University*](https://uni.fpt.edu.vn/en-US/Default.aspx)
23
+ >
24
+ > (<sup>:email:</sup>) corresponding author.
25
+ >
26
+ > *arXiv technical report ([arXiv 2203.09035](https://arxiv.org/abs/2203.09035))*
27
+
28
+ [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/hybridnets-end-to-end-perception-network-1/traffic-object-detection-on-bdd100k)](https://paperswithcode.com/sota/traffic-object-detection-on-bdd100k?p=hybridnets-end-to-end-perception-network-1)
29
+ [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/hybridnets-end-to-end-perception-network-1/lane-detection-on-bdd100k)](https://paperswithcode.com/sota/lane-detection-on-bdd100k?p=hybridnets-end-to-end-perception-network-1)
30
+
31
+ <!-- TABLE OF CONTENTS -->
32
+ <details>
33
+ <summary>Table of Contents</summary>
34
+ <ol>
35
+ <li>
36
+ <a href="#about-the-project">About The Project</a>
37
+ <ul>
38
+ <li><a href="#project-structure">Project Structure</a></li>
39
+ </ul>
40
+ </li>
41
+ <li>
42
+ <a href="#getting-started">Getting Started</a>
43
+ <ul>
44
+ <li><a href="#installation">Installation</a></li>
45
+ <li><a href="#demo">Demo</a></li>
46
+ </ul>
47
+ </li>
48
+ <li>
49
+ <a href="#usage">Usage</a>
50
+ <ul>
51
+ <li><a href="#data-preparation">Data Preparation</a></li>
52
+ <li><a href="#training">Training</a></li>
53
+ </ul>
54
+ </li>
55
+ <li><a href="#training-tips">Training Tips</a></li>
56
+ <li><a href="#results">Results</a></li>
57
+ <li><a href="#license">License</a></li>
58
+ <li><a href="#acknowledgements">Acknowledgements</a></li>
59
+ <li><a href="#citation">Citation</a></li>
60
+ </ol>
61
+ </details>
62
+
63
+
64
+ ## About The Project
65
+ <!-- #### <div align=center> **HybridNets** = **real-time** :stopwatch: * **state-of-the-art** :1st_place_medal: * (traffic object detection + drivable area segmentation + lane line detection) :motorway: </div> -->
66
+ HybridNets is an end2end perception network for multi-tasks. Our work focused on traffic object detection, drivable area segmentation and lane detection. HybridNets can run real-time on embedded systems, and obtains SOTA Object Detection, Lane Detection on BDD100K Dataset.
67
+ ![intro](images/intro.jpg)
68
+
69
+ ### Project Structure
70
+ ```bash
71
+ HybridNets
72
+ │ backbone.py # Model configuration
73
+ │ hubconf.py # Pytorch Hub entrypoint
74
+ │ hybridnets_test.py # Image inference
75
+ │ hybridnets_test_videos.py # Video inference
76
+ │ train.py # Train script
77
+ │ val.py # Validate script
78
+
79
+ ├───encoders # https://github.com/qubvel/segmentation_models.pytorch/tree/master/segmentation_models_pytorch/encoders
80
+ │ ...
81
+
82
+ ├───hybridnets
83
+ │ autoanchor.py # Generate new anchors by k-means
84
+ │ dataset.py # BDD100K dataset
85
+ │ loss.py # Focal, tversky (dice)
86
+ │ model.py # Model blocks
87
+
88
+ ├───projects
89
+ │ bdd100k.yml # Project configuration
90
+
91
+ └───utils
92
+ │ plot.py # Draw bounding box
93
+ │ smp_metrics.py # https://github.com/qubvel/segmentation_models.pytorch/blob/master/segmentation_models_pytorch/metrics/functional.py
94
+ │ utils.py # Various helper functions (preprocess, postprocess, eval...)
95
+
96
+ └───sync_batchnorm # https://github.com/vacancy/Synchronized-BatchNorm-PyTorch/tree/master/sync_batchnorm
97
+ ...
98
+ ```
99
+
100
+ ## Getting Started [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1Uc1ZPoPeh-lAhPQ1CloiVUsOIRAVOGWA?usp=sharing)
101
+ ### Installation
102
+ The project was developed with [**Python>=3.7**](https://www.python.org/downloads/) and [**Pytorch>=1.10**](https://pytorch.org/get-started/locally/).
103
+ ```bash
104
+ git clone https://github.com/datvuthanh/HybridNets
105
+ cd HybridNets
106
+ pip install -r requirements.txt
107
+ ```
108
+
109
+ ### Demo
110
+ ```bash
111
+ # Download end-to-end weights
112
+ mkdir weights
113
+ curl -L -o weights/hybridnets.pth https://github.com/datvuthanh/HybridNets/releases/download/v1.0/hybridnets.pth
114
+
115
+ # Image inference
116
+ python hybridnets_test.py -w weights/hybridnets.pth --source demo/image --output demo_result --imshow False --imwrite True
117
+
118
+ # Video inference
119
+ python hybridnets_test_videos.py -w weights/hybridnets.pth --source demo/video --output demo_result
120
+
121
+ # Result is saved in a new folder called demo_result
122
+ ```
123
+
124
+ ## Usage
125
+ ### Data Preparation
126
+ Recommended dataset structure:
127
+ ```bash
128
+ HybridNets
129
+ └───datasets
130
+ ├───imgs
131
+ │ ├───train
132
+ │ └───val
133
+ ├───det_annot
134
+ │ ├───train
135
+ │ └───val
136
+ ├───da_seg_annot
137
+ │ ├───train
138
+ │ └───val
139
+ └───ll_seg_annot
140
+ ├───train
141
+ └───val
142
+ ```
143
+ Update your dataset paths in `projects/your_project_name.yml`.
144
+
145
+ For BDD100K: [imgs](https://bdd-data.berkeley.edu/), [det_annot](https://drive.google.com/file/d/19CEnZzgLXNNYh1wCvUlNi8UfiBkxVRH0/view), [da_seg_annot](https://drive.google.com/file/d/1NZM-xqJJYZ3bADgLCdrFOa5Vlen3JlkZ/view), [ll_seg_annot](https://drive.google.com/file/d/1o-XpIvHJq0TVUrwlwiMGzwP1CtFsfQ6t/view)
146
+
147
+ ### Training
148
+ #### 1) Edit or create a new project configuration, using bdd100k.yml as a template
149
+ ```python
150
+ # mean and std of dataset in RGB order
151
+ mean: [0.485, 0.456, 0.406]
152
+ std: [0.229, 0.224, 0.225]
153
+
154
+ # bdd100k anchors
155
+ anchors_scales: '[2**0, 2**0.70, 2**1.32]'
156
+ anchors_ratios: '[(0.62, 1.58), (1.0, 1.0), (1.58, 0.62)]'
157
+
158
+ # must match your dataset's category_id.
159
+ # category_id is one_indexed,
160
+ # for example, index of 'car' here is 0, while category_id is 1
161
+ obj_list: ['car']
162
+
163
+ seg_list: ['road',
164
+ 'lane']
165
+
166
+ dataset:
167
+ color_rgb: false
168
+ dataroot: path/to/imgs
169
+ labelroot: path/to/det_annot
170
+ laneroot: path/to/ll_seg_annot
171
+ maskroot: path/to/da_seg_annot
172
+ ...
173
+ ```
174
+
175
+ #### 2) Train
176
+ ```bash
177
+ python train.py -p bdd100k # your_project_name
178
+ -c 3 # coefficient of effnet backbone, result from paper is 3
179
+ -n 4 # num_workers
180
+ -b 8 # batch_size per gpu
181
+ -w path/to/weight # use 'last' to resume training from previous session
182
+ --freeze_det # freeze detection head, others: --freeze_backbone, --freeze_seg
183
+ --lr 1e-5 # learning rate
184
+ --optim adamw # adamw | sgd
185
+ --num_epochs 200
186
+ ```
187
+ Please check `python train.py --help` for every available arguments.
188
+
189
+ #### 3) Evaluate
190
+ ```bash
191
+ python val.py -p bdd100k -c 3 -w checkpoints/weight.pth
192
+ ```
193
+
194
+ ## Training Tips
195
+ ### Anchors :anchor:
196
+ If your dataset is intrinsically different from COCO or BDD100K, or the metrics of detection after training are not as high as expected, you could try enabling autoanchor in `project.yml`:
197
+ ```python
198
+ ...
199
+ model:
200
+ image_size:
201
+ - 640
202
+ - 384
203
+ need_autoanchor: true # set to true to run autoanchor
204
+ pin_memory: false
205
+ ...
206
+ ```
207
+ This automatically finds the best combination of anchor scales and anchor ratios for your dataset. Then you can manually edit them `project.yml` and disable autoanchor.
208
+
209
+ If you're feeling lucky, maybe mess around with base_anchor_scale in `backbone.py`:
210
+ ```python
211
+ class HybridNetsBackbone(nn.Module):
212
+ ...
213
+ self.pyramid_levels = [5, 5, 5, 5, 5, 5, 5, 5, 6]
214
+ self.anchor_scale = [1.25,1.25,1.25,1.25,1.25,1.25,1.25,1.25,1.25,]
215
+ self.aspect_ratios = kwargs.get('ratios', [(1.0, 1.0), (1.4, 0.7), (0.7, 1.4)])
216
+ ...
217
+ ```
218
+ and `model.py`:
219
+ ```python
220
+ class Anchors(nn.Module):
221
+ ...
222
+ for scale, ratio in itertools.product(self.scales, self.ratios):
223
+ base_anchor_size = self.anchor_scale * stride * scale
224
+ anchor_size_x_2 = base_anchor_size * ratio[0] / 2.0
225
+ anchor_size_y_2 = base_anchor_size * ratio[1] / 2.0
226
+ ...
227
+ ```
228
+ to get a grasp on how anchor boxes work.
229
+
230
+ And because a picture is worth a thousand words, you can visualize your anchor boxes in [Anchor Computation Tool](https://github.com/Cli98/anchor_computation_tool).
231
+ ### Training stages
232
+ We experimented with training stages and found that this settings achieved the best results:
233
+
234
+ 1. `--freeze_seg True` ~ 100 epochs
235
+ 2. `--freeze_backbone True --freeze_det True` ~ 50 epochs
236
+ 3. Train end-to-end ~ 50 epochs
237
+
238
+ The reason being detection head is harder to converge early on, so we basically skipped segmentation head to focus on detection first.
239
+
240
+ ## Results
241
+ ### Traffic Object Detection
242
+
243
+ <table>
244
+ <tr><th>Result </th><th>Visualization</th></tr>
245
+ <tr><td>
246
+
247
+ | Model | Recall (%) | mAP@0.5 (%) |
248
+ |:------------------:|:------------:|:---------------:|
249
+ | `MultiNet` | 81.3 | 60.2 |
250
+ | `DLT-Net` | 89.4 | 68.4 |
251
+ | `Faster R-CNN` | 77.2 | 55.6 |
252
+ | `YOLOv5s` | 86.8 | 77.2 |
253
+ | `YOLOP` | 89.2 | 76.5 |
254
+ | **`HybridNets`** | **92.8** | **77.3** |
255
+
256
+ </td><td>
257
+
258
+ <img src="images/det1.jpg" width="50%" /><img src="images/det2.jpg" width="50%" />
259
+
260
+ </td></tr> </table>
261
+
262
+ <!--
263
+ | Model | Recall (%) | mAP@0.5 (%) |
264
+ |:------------------:|:------------:|:---------------:|
265
+ | `MultiNet` | 81.3 | 60.2 |
266
+ | `DLT-Net` | 89.4 | 68.4 |
267
+ | `Faster R-CNN` | 77.2 | 55.6 |
268
+ | `YOLOv5s` | 86.8 | 77.2 |
269
+ | `YOLOP` | 89.2 | 76.5 |
270
+ | **`HybridNets`** | **92.8** | **77.3** |
271
+
272
+ <p align="middle">
273
+ <img src="images/det1.jpg" width="49%" />
274
+ <img src="images/det2.jpg" width="49%" />
275
+ </p>
276
+
277
+ -->
278
+
279
+ ### Drivable Area Segmentation
280
+
281
+ <table>
282
+ <tr><th>Result </th><th>Visualization</th></tr>
283
+ <tr><td>
284
+
285
+ | Model | Drivable mIoU (%) |
286
+ |:----------------:|:-----------------:|
287
+ | `MultiNet` | 71.6 |
288
+ | `DLT-Net` | 71.3 |
289
+ | `PSPNet` | 89.6 |
290
+ | `YOLOP` | 91.5 |
291
+ | **`HybridNets`** | **90.5** |
292
+
293
+ </td><td>
294
+
295
+ <img src="images/road1.jpg" width="50%" /><img src="images/road2.jpg" width="50%" />
296
+
297
+ </td></tr> </table>
298
+
299
+ <!--
300
+ | Model | Drivable mIoU (%) |
301
+ |:----------------:|:-----------------:|
302
+ | `MultiNet` | 71.6 |
303
+ | `DLT-Net` | 71.3 |
304
+ | `PSPNet` | 89.6 |
305
+ | `YOLOP` | 91.5 |
306
+ | **`HybridNets`** | **90.5** |
307
+ <p align="middle">
308
+ <img src="images/road1.jpg" width="49%" />
309
+ <img src="images/road2.jpg" width="49%" />
310
+ </p>
311
+ -->
312
+
313
+ ### Lane Line Detection
314
+
315
+ <table>
316
+ <tr><th>Result </th><th>Visualization</th></tr>
317
+ <tr><td>
318
+
319
+ | Model | Accuracy (%) | Lane Line IoU (%) |
320
+ |:----------------:|:------------:|:-----------------:|
321
+ | `Enet` | 34.12 | 14.64 |
322
+ | `SCNN` | 35.79 | 15.84 |
323
+ | `Enet-SAD` | 36.56 | 16.02 |
324
+ | `YOLOP` | 70.5 | 26.2 |
325
+ | **`HybridNets`** | **85.4** | **31.6** |
326
+
327
+ </td><td>
328
+
329
+ <img src="images/lane1.jpg" width="50%" /><img src="images/lane2.jpg" width="50%" />
330
+
331
+ </td></tr> </table>
332
+
333
+ <!--
334
+ | Model | Accuracy (%) | Lane Line IoU (%) |
335
+ |:----------------:|:------------:|:-----------------:|
336
+ | `Enet` | 34.12 | 14.64 |
337
+ | `SCNN` | 35.79 | 15.84 |
338
+ | `Enet-SAD` | 36.56 | 16.02 |
339
+ | `YOLOP` | 70.5 | 26.2 |
340
+ | **`HybridNets`** | **85.4** | **31.6** |
341
+
342
+ <p align="middle">
343
+ <img src="images/lane1.jpg" width="49%" />
344
+ <img src="images/lane2.jpg" width="49%" />
345
+ </p>
346
+ -->
347
+ <div align="center">
348
+
349
+ ![](images/full_video.gif)
350
+
351
+ [Original footage](https://www.youtube.com/watch?v=lx4yA1LEi9c) courtesy of [Hanoi Life](https://www.youtube.com/channel/UChT1Cpf_URepCpsdIqjsDHQ)
352
+
353
+ </div>
354
+
355
+ ## License
356
+
357
+ Distributed under the MIT License. See `LICENSE` for more information.
358
+
359
+ ## Acknowledgements
360
+
361
+ Our work would not be complete without the wonderful work of the following authors:
362
+
363
+ * [EfficientDet](https://github.com/zylo117/Yet-Another-EfficientDet-Pytorch)
364
+ * [YOLOv5](https://github.com/ultralytics/yolov5)
365
+ * [YOLOP](https://github.com/hustvl/YOLOP)
366
+ * [KMeans Anchors Ratios](https://github.com/mnslarcher/kmeans-anchors-ratios)
367
+ * [Anchor Computation Tool](https://github.com/Cli98/anchor_computation_tool)
368
+
369
+ ## Citation
370
+
371
+ If you find our paper and code useful for your research, please consider giving a star :star: and citation :pencil: :
372
+
373
+ ```BibTeX
374
+ @misc{vu2022hybridnets,
375
+ title={HybridNets: End-to-End Perception Network},
376
+ author={Dat Vu and Bao Ngo and Hung Phan},
377
+ year={2022},
378
+ eprint={2203.09035},
379
+ archivePrefix={arXiv},
380
+ primaryClass={cs.CV}
381
+ }
382
+ ```
383
+
384
+ <!-- MARKDOWN LINKS & IMAGES -->
385
+ <!-- https://www.markdownguide.org/basic-syntax/#reference-style-links -->
386
+ [contributors-shield]: https://img.shields.io/github/contributors/othneildrew/Best-README-Template.svg?style=for-the-badge
387
+ [contributors-url]: https://github.com/datvuthanh/HybridNets/graphs/contributors
388
+ [forks-shield]: https://img.shields.io/github/forks/othneildrew/Best-README-Template.svg?style=for-the-badge
389
+ [forks-url]: https://github.com/datvuthanh/HybridNets/network/members
390
+ [stars-shield]: https://img.shields.io/github/stars/othneildrew/Best-README-Template.svg?style=for-the-badge
391
+ [stars-url]: https://github.com/datvuthanh/HybridNets/stargazers
392
+ [issues-shield]: https://img.shields.io/github/issues/othneildrew/Best-README-Template.svg?style=for-the-badge
393
+ [issues-url]: https://github.com/datvuthanh/HybridNets/issues