karolmajek commited on
Commit
1a1ee1f
1 Parent(s): 799a750
This view is limited to 50 files because it contains too many changes.   See raw diff
Files changed (50) hide show
  1. LICENSE +674 -0
  2. README.md +224 -1
  3. app.py +104 -0
  4. cfg/yolor_csp.cfg +1376 -0
  5. cfg/yolor_csp_x.cfg +1576 -0
  6. cfg/yolor_p6.cfg +1760 -0
  7. cfg/yolor_w6.cfg +1760 -0
  8. cfg/yolov4_csp.cfg +1334 -0
  9. cfg/yolov4_csp_x.cfg +1534 -0
  10. cfg/yolov4_p6.cfg +2260 -0
  11. cfg/yolov4_p7.cfg +2714 -0
  12. darknet/README.md +63 -0
  13. darknet/cfg/yolov4-csp-x.cfg +1555 -0
  14. darknet/cfg/yolov4-csp.cfg +1354 -0
  15. darknet/new_layers.md +329 -0
  16. data/coco.names +80 -0
  17. data/coco.yaml +18 -0
  18. data/hyp.finetune.1280.yaml +28 -0
  19. data/hyp.scratch.1280.yaml +28 -0
  20. data/hyp.scratch.640.yaml +28 -0
  21. figure/implicit_modeling.png +0 -0
  22. figure/performance.png +0 -0
  23. figure/schedule.png +0 -0
  24. figure/unifued_network.png +0 -0
  25. inference/images/horses.jpg +0 -0
  26. inference/output/horses.jpg +0 -0
  27. models/__init__.py +1 -0
  28. models/__pycache__/__init__.cpython-37.pyc +0 -0
  29. models/__pycache__/models.cpython-37.pyc +0 -0
  30. models/export.py +68 -0
  31. models/models.py +761 -0
  32. requirements.txt +33 -0
  33. scripts/get_coco.sh +27 -0
  34. scripts/get_pretrain.sh +7 -0
  35. test.py +344 -0
  36. train.py +619 -0
  37. tune.py +619 -0
  38. utils/__init__.py +1 -0
  39. utils/__pycache__/__init__.cpython-37.pyc +0 -0
  40. utils/__pycache__/__init__.cpython-38.pyc +0 -0
  41. utils/__pycache__/datasets.cpython-37.pyc +0 -0
  42. utils/__pycache__/datasets.cpython-38.pyc +0 -0
  43. utils/__pycache__/general.cpython-37.pyc +0 -0
  44. utils/__pycache__/google_utils.cpython-37.pyc +0 -0
  45. utils/__pycache__/google_utils.cpython-38.pyc +0 -0
  46. utils/__pycache__/layers.cpython-37.pyc +0 -0
  47. utils/__pycache__/metrics.cpython-37.pyc +0 -0
  48. utils/__pycache__/parse_config.cpython-37.pyc +0 -0
  49. utils/__pycache__/plots.cpython-37.pyc +0 -0
  50. utils/__pycache__/torch_utils.cpython-37.pyc +0 -0
LICENSE ADDED
@@ -0,0 +1,674 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ GNU GENERAL PUBLIC LICENSE
2
+ Version 3, 29 June 2007
3
+
4
+ Copyright (C) 2007 Free Software Foundation, Inc. <https://fsf.org/>
5
+ Everyone is permitted to copy and distribute verbatim copies
6
+ of this license document, but changing it is not allowed.
7
+
8
+ Preamble
9
+
10
+ The GNU General Public License is a free, copyleft license for
11
+ software and other kinds of works.
12
+
13
+ The licenses for most software and other practical works are designed
14
+ to take away your freedom to share and change the works. By contrast,
15
+ the GNU General Public License is intended to guarantee your freedom to
16
+ share and change all versions of a program--to make sure it remains free
17
+ software for all its users. We, the Free Software Foundation, use the
18
+ GNU General Public License for most of our software; it applies also to
19
+ any other work released this way by its authors. You can apply it to
20
+ your programs, too.
21
+
22
+ When we speak of free software, we are referring to freedom, not
23
+ price. Our General Public Licenses are designed to make sure that you
24
+ have the freedom to distribute copies of free software (and charge for
25
+ them if you wish), that you receive source code or can get it if you
26
+ want it, that you can change the software or use pieces of it in new
27
+ free programs, and that you know you can do these things.
28
+
29
+ To protect your rights, we need to prevent others from denying you
30
+ these rights or asking you to surrender the rights. Therefore, you have
31
+ certain responsibilities if you distribute copies of the software, or if
32
+ you modify it: responsibilities to respect the freedom of others.
33
+
34
+ For example, if you distribute copies of such a program, whether
35
+ gratis or for a fee, you must pass on to the recipients the same
36
+ freedoms that you received. You must make sure that they, too, receive
37
+ or can get the source code. And you must show them these terms so they
38
+ know their rights.
39
+
40
+ Developers that use the GNU GPL protect your rights with two steps:
41
+ (1) assert copyright on the software, and (2) offer you this License
42
+ giving you legal permission to copy, distribute and/or modify it.
43
+
44
+ For the developers' and authors' protection, the GPL clearly explains
45
+ that there is no warranty for this free software. For both users' and
46
+ authors' sake, the GPL requires that modified versions be marked as
47
+ changed, so that their problems will not be attributed erroneously to
48
+ authors of previous versions.
49
+
50
+ Some devices are designed to deny users access to install or run
51
+ modified versions of the software inside them, although the manufacturer
52
+ can do so. This is fundamentally incompatible with the aim of
53
+ protecting users' freedom to change the software. The systematic
54
+ pattern of such abuse occurs in the area of products for individuals to
55
+ use, which is precisely where it is most unacceptable. Therefore, we
56
+ have designed this version of the GPL to prohibit the practice for those
57
+ products. If such problems arise substantially in other domains, we
58
+ stand ready to extend this provision to those domains in future versions
59
+ of the GPL, as needed to protect the freedom of users.
60
+
61
+ Finally, every program is threatened constantly by software patents.
62
+ States should not allow patents to restrict development and use of
63
+ software on general-purpose computers, but in those that do, we wish to
64
+ avoid the special danger that patents applied to a free program could
65
+ make it effectively proprietary. To prevent this, the GPL assures that
66
+ patents cannot be used to render the program non-free.
67
+
68
+ The precise terms and conditions for copying, distribution and
69
+ modification follow.
70
+
71
+ TERMS AND CONDITIONS
72
+
73
+ 0. Definitions.
74
+
75
+ "This License" refers to version 3 of the GNU General Public License.
76
+
77
+ "Copyright" also means copyright-like laws that apply to other kinds of
78
+ works, such as semiconductor masks.
79
+
80
+ "The Program" refers to any copyrightable work licensed under this
81
+ License. Each licensee is addressed as "you". "Licensees" and
82
+ "recipients" may be individuals or organizations.
83
+
84
+ To "modify" a work means to copy from or adapt all or part of the work
85
+ in a fashion requiring copyright permission, other than the making of an
86
+ exact copy. The resulting work is called a "modified version" of the
87
+ earlier work or a work "based on" the earlier work.
88
+
89
+ A "covered work" means either the unmodified Program or a work based
90
+ on the Program.
91
+
92
+ To "propagate" a work means to do anything with it that, without
93
+ permission, would make you directly or secondarily liable for
94
+ infringement under applicable copyright law, except executing it on a
95
+ computer or modifying a private copy. Propagation includes copying,
96
+ distribution (with or without modification), making available to the
97
+ public, and in some countries other activities as well.
98
+
99
+ To "convey" a work means any kind of propagation that enables other
100
+ parties to make or receive copies. Mere interaction with a user through
101
+ a computer network, with no transfer of a copy, is not conveying.
102
+
103
+ An interactive user interface displays "Appropriate Legal Notices"
104
+ to the extent that it includes a convenient and prominently visible
105
+ feature that (1) displays an appropriate copyright notice, and (2)
106
+ tells the user that there is no warranty for the work (except to the
107
+ extent that warranties are provided), that licensees may convey the
108
+ work under this License, and how to view a copy of this License. If
109
+ the interface presents a list of user commands or options, such as a
110
+ menu, a prominent item in the list meets this criterion.
111
+
112
+ 1. Source Code.
113
+
114
+ The "source code" for a work means the preferred form of the work
115
+ for making modifications to it. "Object code" means any non-source
116
+ form of a work.
117
+
118
+ A "Standard Interface" means an interface that either is an official
119
+ standard defined by a recognized standards body, or, in the case of
120
+ interfaces specified for a particular programming language, one that
121
+ is widely used among developers working in that language.
122
+
123
+ The "System Libraries" of an executable work include anything, other
124
+ than the work as a whole, that (a) is included in the normal form of
125
+ packaging a Major Component, but which is not part of that Major
126
+ Component, and (b) serves only to enable use of the work with that
127
+ Major Component, or to implement a Standard Interface for which an
128
+ implementation is available to the public in source code form. A
129
+ "Major Component", in this context, means a major essential component
130
+ (kernel, window system, and so on) of the specific operating system
131
+ (if any) on which the executable work runs, or a compiler used to
132
+ produce the work, or an object code interpreter used to run it.
133
+
134
+ The "Corresponding Source" for a work in object code form means all
135
+ the source code needed to generate, install, and (for an executable
136
+ work) run the object code and to modify the work, including scripts to
137
+ control those activities. However, it does not include the work's
138
+ System Libraries, or general-purpose tools or generally available free
139
+ programs which are used unmodified in performing those activities but
140
+ which are not part of the work. For example, Corresponding Source
141
+ includes interface definition files associated with source files for
142
+ the work, and the source code for shared libraries and dynamically
143
+ linked subprograms that the work is specifically designed to require,
144
+ such as by intimate data communication or control flow between those
145
+ subprograms and other parts of the work.
146
+
147
+ The Corresponding Source need not include anything that users
148
+ can regenerate automatically from other parts of the Corresponding
149
+ Source.
150
+
151
+ The Corresponding Source for a work in source code form is that
152
+ same work.
153
+
154
+ 2. Basic Permissions.
155
+
156
+ All rights granted under this License are granted for the term of
157
+ copyright on the Program, and are irrevocable provided the stated
158
+ conditions are met. This License explicitly affirms your unlimited
159
+ permission to run the unmodified Program. The output from running a
160
+ covered work is covered by this License only if the output, given its
161
+ content, constitutes a covered work. This License acknowledges your
162
+ rights of fair use or other equivalent, as provided by copyright law.
163
+
164
+ You may make, run and propagate covered works that you do not
165
+ convey, without conditions so long as your license otherwise remains
166
+ in force. You may convey covered works to others for the sole purpose
167
+ of having them make modifications exclusively for you, or provide you
168
+ with facilities for running those works, provided that you comply with
169
+ the terms of this License in conveying all material for which you do
170
+ not control copyright. Those thus making or running the covered works
171
+ for you must do so exclusively on your behalf, under your direction
172
+ and control, on terms that prohibit them from making any copies of
173
+ your copyrighted material outside their relationship with you.
174
+
175
+ Conveying under any other circumstances is permitted solely under
176
+ the conditions stated below. Sublicensing is not allowed; section 10
177
+ makes it unnecessary.
178
+
179
+ 3. Protecting Users' Legal Rights From Anti-Circumvention Law.
180
+
181
+ No covered work shall be deemed part of an effective technological
182
+ measure under any applicable law fulfilling obligations under article
183
+ 11 of the WIPO copyright treaty adopted on 20 December 1996, or
184
+ similar laws prohibiting or restricting circumvention of such
185
+ measures.
186
+
187
+ When you convey a covered work, you waive any legal power to forbid
188
+ circumvention of technological measures to the extent such circumvention
189
+ is effected by exercising rights under this License with respect to
190
+ the covered work, and you disclaim any intention to limit operation or
191
+ modification of the work as a means of enforcing, against the work's
192
+ users, your or third parties' legal rights to forbid circumvention of
193
+ technological measures.
194
+
195
+ 4. Conveying Verbatim Copies.
196
+
197
+ You may convey verbatim copies of the Program's source code as you
198
+ receive it, in any medium, provided that you conspicuously and
199
+ appropriately publish on each copy an appropriate copyright notice;
200
+ keep intact all notices stating that this License and any
201
+ non-permissive terms added in accord with section 7 apply to the code;
202
+ keep intact all notices of the absence of any warranty; and give all
203
+ recipients a copy of this License along with the Program.
204
+
205
+ You may charge any price or no price for each copy that you convey,
206
+ and you may offer support or warranty protection for a fee.
207
+
208
+ 5. Conveying Modified Source Versions.
209
+
210
+ You may convey a work based on the Program, or the modifications to
211
+ produce it from the Program, in the form of source code under the
212
+ terms of section 4, provided that you also meet all of these conditions:
213
+
214
+ a) The work must carry prominent notices stating that you modified
215
+ it, and giving a relevant date.
216
+
217
+ b) The work must carry prominent notices stating that it is
218
+ released under this License and any conditions added under section
219
+ 7. This requirement modifies the requirement in section 4 to
220
+ "keep intact all notices".
221
+
222
+ c) You must license the entire work, as a whole, under this
223
+ License to anyone who comes into possession of a copy. This
224
+ License will therefore apply, along with any applicable section 7
225
+ additional terms, to the whole of the work, and all its parts,
226
+ regardless of how they are packaged. This License gives no
227
+ permission to license the work in any other way, but it does not
228
+ invalidate such permission if you have separately received it.
229
+
230
+ d) If the work has interactive user interfaces, each must display
231
+ Appropriate Legal Notices; however, if the Program has interactive
232
+ interfaces that do not display Appropriate Legal Notices, your
233
+ work need not make them do so.
234
+
235
+ A compilation of a covered work with other separate and independent
236
+ works, which are not by their nature extensions of the covered work,
237
+ and which are not combined with it such as to form a larger program,
238
+ in or on a volume of a storage or distribution medium, is called an
239
+ "aggregate" if the compilation and its resulting copyright are not
240
+ used to limit the access or legal rights of the compilation's users
241
+ beyond what the individual works permit. Inclusion of a covered work
242
+ in an aggregate does not cause this License to apply to the other
243
+ parts of the aggregate.
244
+
245
+ 6. Conveying Non-Source Forms.
246
+
247
+ You may convey a covered work in object code form under the terms
248
+ of sections 4 and 5, provided that you also convey the
249
+ machine-readable Corresponding Source under the terms of this License,
250
+ in one of these ways:
251
+
252
+ a) Convey the object code in, or embodied in, a physical product
253
+ (including a physical distribution medium), accompanied by the
254
+ Corresponding Source fixed on a durable physical medium
255
+ customarily used for software interchange.
256
+
257
+ b) Convey the object code in, or embodied in, a physical product
258
+ (including a physical distribution medium), accompanied by a
259
+ written offer, valid for at least three years and valid for as
260
+ long as you offer spare parts or customer support for that product
261
+ model, to give anyone who possesses the object code either (1) a
262
+ copy of the Corresponding Source for all the software in the
263
+ product that is covered by this License, on a durable physical
264
+ medium customarily used for software interchange, for a price no
265
+ more than your reasonable cost of physically performing this
266
+ conveying of source, or (2) access to copy the
267
+ Corresponding Source from a network server at no charge.
268
+
269
+ c) Convey individual copies of the object code with a copy of the
270
+ written offer to provide the Corresponding Source. This
271
+ alternative is allowed only occasionally and noncommercially, and
272
+ only if you received the object code with such an offer, in accord
273
+ with subsection 6b.
274
+
275
+ d) Convey the object code by offering access from a designated
276
+ place (gratis or for a charge), and offer equivalent access to the
277
+ Corresponding Source in the same way through the same place at no
278
+ further charge. You need not require recipients to copy the
279
+ Corresponding Source along with the object code. If the place to
280
+ copy the object code is a network server, the Corresponding Source
281
+ may be on a different server (operated by you or a third party)
282
+ that supports equivalent copying facilities, provided you maintain
283
+ clear directions next to the object code saying where to find the
284
+ Corresponding Source. Regardless of what server hosts the
285
+ Corresponding Source, you remain obligated to ensure that it is
286
+ available for as long as needed to satisfy these requirements.
287
+
288
+ e) Convey the object code using peer-to-peer transmission, provided
289
+ you inform other peers where the object code and Corresponding
290
+ Source of the work are being offered to the general public at no
291
+ charge under subsection 6d.
292
+
293
+ A separable portion of the object code, whose source code is excluded
294
+ from the Corresponding Source as a System Library, need not be
295
+ included in conveying the object code work.
296
+
297
+ A "User Product" is either (1) a "consumer product", which means any
298
+ tangible personal property which is normally used for personal, family,
299
+ or household purposes, or (2) anything designed or sold for incorporation
300
+ into a dwelling. In determining whether a product is a consumer product,
301
+ doubtful cases shall be resolved in favor of coverage. For a particular
302
+ product received by a particular user, "normally used" refers to a
303
+ typical or common use of that class of product, regardless of the status
304
+ of the particular user or of the way in which the particular user
305
+ actually uses, or expects or is expected to use, the product. A product
306
+ is a consumer product regardless of whether the product has substantial
307
+ commercial, industrial or non-consumer uses, unless such uses represent
308
+ the only significant mode of use of the product.
309
+
310
+ "Installation Information" for a User Product means any methods,
311
+ procedures, authorization keys, or other information required to install
312
+ and execute modified versions of a covered work in that User Product from
313
+ a modified version of its Corresponding Source. The information must
314
+ suffice to ensure that the continued functioning of the modified object
315
+ code is in no case prevented or interfered with solely because
316
+ modification has been made.
317
+
318
+ If you convey an object code work under this section in, or with, or
319
+ specifically for use in, a User Product, and the conveying occurs as
320
+ part of a transaction in which the right of possession and use of the
321
+ User Product is transferred to the recipient in perpetuity or for a
322
+ fixed term (regardless of how the transaction is characterized), the
323
+ Corresponding Source conveyed under this section must be accompanied
324
+ by the Installation Information. But this requirement does not apply
325
+ if neither you nor any third party retains the ability to install
326
+ modified object code on the User Product (for example, the work has
327
+ been installed in ROM).
328
+
329
+ The requirement to provide Installation Information does not include a
330
+ requirement to continue to provide support service, warranty, or updates
331
+ for a work that has been modified or installed by the recipient, or for
332
+ the User Product in which it has been modified or installed. Access to a
333
+ network may be denied when the modification itself materially and
334
+ adversely affects the operation of the network or violates the rules and
335
+ protocols for communication across the network.
336
+
337
+ Corresponding Source conveyed, and Installation Information provided,
338
+ in accord with this section must be in a format that is publicly
339
+ documented (and with an implementation available to the public in
340
+ source code form), and must require no special password or key for
341
+ unpacking, reading or copying.
342
+
343
+ 7. Additional Terms.
344
+
345
+ "Additional permissions" are terms that supplement the terms of this
346
+ License by making exceptions from one or more of its conditions.
347
+ Additional permissions that are applicable to the entire Program shall
348
+ be treated as though they were included in this License, to the extent
349
+ that they are valid under applicable law. If additional permissions
350
+ apply only to part of the Program, that part may be used separately
351
+ under those permissions, but the entire Program remains governed by
352
+ this License without regard to the additional permissions.
353
+
354
+ When you convey a copy of a covered work, you may at your option
355
+ remove any additional permissions from that copy, or from any part of
356
+ it. (Additional permissions may be written to require their own
357
+ removal in certain cases when you modify the work.) You may place
358
+ additional permissions on material, added by you to a covered work,
359
+ for which you have or can give appropriate copyright permission.
360
+
361
+ Notwithstanding any other provision of this License, for material you
362
+ add to a covered work, you may (if authorized by the copyright holders of
363
+ that material) supplement the terms of this License with terms:
364
+
365
+ a) Disclaiming warranty or limiting liability differently from the
366
+ terms of sections 15 and 16 of this License; or
367
+
368
+ b) Requiring preservation of specified reasonable legal notices or
369
+ author attributions in that material or in the Appropriate Legal
370
+ Notices displayed by works containing it; or
371
+
372
+ c) Prohibiting misrepresentation of the origin of that material, or
373
+ requiring that modified versions of such material be marked in
374
+ reasonable ways as different from the original version; or
375
+
376
+ d) Limiting the use for publicity purposes of names of licensors or
377
+ authors of the material; or
378
+
379
+ e) Declining to grant rights under trademark law for use of some
380
+ trade names, trademarks, or service marks; or
381
+
382
+ f) Requiring indemnification of licensors and authors of that
383
+ material by anyone who conveys the material (or modified versions of
384
+ it) with contractual assumptions of liability to the recipient, for
385
+ any liability that these contractual assumptions directly impose on
386
+ those licensors and authors.
387
+
388
+ All other non-permissive additional terms are considered "further
389
+ restrictions" within the meaning of section 10. If the Program as you
390
+ received it, or any part of it, contains a notice stating that it is
391
+ governed by this License along with a term that is a further
392
+ restriction, you may remove that term. If a license document contains
393
+ a further restriction but permits relicensing or conveying under this
394
+ License, you may add to a covered work material governed by the terms
395
+ of that license document, provided that the further restriction does
396
+ not survive such relicensing or conveying.
397
+
398
+ If you add terms to a covered work in accord with this section, you
399
+ must place, in the relevant source files, a statement of the
400
+ additional terms that apply to those files, or a notice indicating
401
+ where to find the applicable terms.
402
+
403
+ Additional terms, permissive or non-permissive, may be stated in the
404
+ form of a separately written license, or stated as exceptions;
405
+ the above requirements apply either way.
406
+
407
+ 8. Termination.
408
+
409
+ You may not propagate or modify a covered work except as expressly
410
+ provided under this License. Any attempt otherwise to propagate or
411
+ modify it is void, and will automatically terminate your rights under
412
+ this License (including any patent licenses granted under the third
413
+ paragraph of section 11).
414
+
415
+ However, if you cease all violation of this License, then your
416
+ license from a particular copyright holder is reinstated (a)
417
+ provisionally, unless and until the copyright holder explicitly and
418
+ finally terminates your license, and (b) permanently, if the copyright
419
+ holder fails to notify you of the violation by some reasonable means
420
+ prior to 60 days after the cessation.
421
+
422
+ Moreover, your license from a particular copyright holder is
423
+ reinstated permanently if the copyright holder notifies you of the
424
+ violation by some reasonable means, this is the first time you have
425
+ received notice of violation of this License (for any work) from that
426
+ copyright holder, and you cure the violation prior to 30 days after
427
+ your receipt of the notice.
428
+
429
+ Termination of your rights under this section does not terminate the
430
+ licenses of parties who have received copies or rights from you under
431
+ this License. If your rights have been terminated and not permanently
432
+ reinstated, you do not qualify to receive new licenses for the same
433
+ material under section 10.
434
+
435
+ 9. Acceptance Not Required for Having Copies.
436
+
437
+ You are not required to accept this License in order to receive or
438
+ run a copy of the Program. Ancillary propagation of a covered work
439
+ occurring solely as a consequence of using peer-to-peer transmission
440
+ to receive a copy likewise does not require acceptance. However,
441
+ nothing other than this License grants you permission to propagate or
442
+ modify any covered work. These actions infringe copyright if you do
443
+ not accept this License. Therefore, by modifying or propagating a
444
+ covered work, you indicate your acceptance of this License to do so.
445
+
446
+ 10. Automatic Licensing of Downstream Recipients.
447
+
448
+ Each time you convey a covered work, the recipient automatically
449
+ receives a license from the original licensors, to run, modify and
450
+ propagate that work, subject to this License. You are not responsible
451
+ for enforcing compliance by third parties with this License.
452
+
453
+ An "entity transaction" is a transaction transferring control of an
454
+ organization, or substantially all assets of one, or subdividing an
455
+ organization, or merging organizations. If propagation of a covered
456
+ work results from an entity transaction, each party to that
457
+ transaction who receives a copy of the work also receives whatever
458
+ licenses to the work the party's predecessor in interest had or could
459
+ give under the previous paragraph, plus a right to possession of the
460
+ Corresponding Source of the work from the predecessor in interest, if
461
+ the predecessor has it or can get it with reasonable efforts.
462
+
463
+ You may not impose any further restrictions on the exercise of the
464
+ rights granted or affirmed under this License. For example, you may
465
+ not impose a license fee, royalty, or other charge for exercise of
466
+ rights granted under this License, and you may not initiate litigation
467
+ (including a cross-claim or counterclaim in a lawsuit) alleging that
468
+ any patent claim is infringed by making, using, selling, offering for
469
+ sale, or importing the Program or any portion of it.
470
+
471
+ 11. Patents.
472
+
473
+ A "contributor" is a copyright holder who authorizes use under this
474
+ License of the Program or a work on which the Program is based. The
475
+ work thus licensed is called the contributor's "contributor version".
476
+
477
+ A contributor's "essential patent claims" are all patent claims
478
+ owned or controlled by the contributor, whether already acquired or
479
+ hereafter acquired, that would be infringed by some manner, permitted
480
+ by this License, of making, using, or selling its contributor version,
481
+ but do not include claims that would be infringed only as a
482
+ consequence of further modification of the contributor version. For
483
+ purposes of this definition, "control" includes the right to grant
484
+ patent sublicenses in a manner consistent with the requirements of
485
+ this License.
486
+
487
+ Each contributor grants you a non-exclusive, worldwide, royalty-free
488
+ patent license under the contributor's essential patent claims, to
489
+ make, use, sell, offer for sale, import and otherwise run, modify and
490
+ propagate the contents of its contributor version.
491
+
492
+ In the following three paragraphs, a "patent license" is any express
493
+ agreement or commitment, however denominated, not to enforce a patent
494
+ (such as an express permission to practice a patent or covenant not to
495
+ sue for patent infringement). To "grant" such a patent license to a
496
+ party means to make such an agreement or commitment not to enforce a
497
+ patent against the party.
498
+
499
+ If you convey a covered work, knowingly relying on a patent license,
500
+ and the Corresponding Source of the work is not available for anyone
501
+ to copy, free of charge and under the terms of this License, through a
502
+ publicly available network server or other readily accessible means,
503
+ then you must either (1) cause the Corresponding Source to be so
504
+ available, or (2) arrange to deprive yourself of the benefit of the
505
+ patent license for this particular work, or (3) arrange, in a manner
506
+ consistent with the requirements of this License, to extend the patent
507
+ license to downstream recipients. "Knowingly relying" means you have
508
+ actual knowledge that, but for the patent license, your conveying the
509
+ covered work in a country, or your recipient's use of the covered work
510
+ in a country, would infringe one or more identifiable patents in that
511
+ country that you have reason to believe are valid.
512
+
513
+ If, pursuant to or in connection with a single transaction or
514
+ arrangement, you convey, or propagate by procuring conveyance of, a
515
+ covered work, and grant a patent license to some of the parties
516
+ receiving the covered work authorizing them to use, propagate, modify
517
+ or convey a specific copy of the covered work, then the patent license
518
+ you grant is automatically extended to all recipients of the covered
519
+ work and works based on it.
520
+
521
+ A patent license is "discriminatory" if it does not include within
522
+ the scope of its coverage, prohibits the exercise of, or is
523
+ conditioned on the non-exercise of one or more of the rights that are
524
+ specifically granted under this License. You may not convey a covered
525
+ work if you are a party to an arrangement with a third party that is
526
+ in the business of distributing software, under which you make payment
527
+ to the third party based on the extent of your activity of conveying
528
+ the work, and under which the third party grants, to any of the
529
+ parties who would receive the covered work from you, a discriminatory
530
+ patent license (a) in connection with copies of the covered work
531
+ conveyed by you (or copies made from those copies), or (b) primarily
532
+ for and in connection with specific products or compilations that
533
+ contain the covered work, unless you entered into that arrangement,
534
+ or that patent license was granted, prior to 28 March 2007.
535
+
536
+ Nothing in this License shall be construed as excluding or limiting
537
+ any implied license or other defenses to infringement that may
538
+ otherwise be available to you under applicable patent law.
539
+
540
+ 12. No Surrender of Others' Freedom.
541
+
542
+ If conditions are imposed on you (whether by court order, agreement or
543
+ otherwise) that contradict the conditions of this License, they do not
544
+ excuse you from the conditions of this License. If you cannot convey a
545
+ covered work so as to satisfy simultaneously your obligations under this
546
+ License and any other pertinent obligations, then as a consequence you may
547
+ not convey it at all. For example, if you agree to terms that obligate you
548
+ to collect a royalty for further conveying from those to whom you convey
549
+ the Program, the only way you could satisfy both those terms and this
550
+ License would be to refrain entirely from conveying the Program.
551
+
552
+ 13. Use with the GNU Affero General Public License.
553
+
554
+ Notwithstanding any other provision of this License, you have
555
+ permission to link or combine any covered work with a work licensed
556
+ under version 3 of the GNU Affero General Public License into a single
557
+ combined work, and to convey the resulting work. The terms of this
558
+ License will continue to apply to the part which is the covered work,
559
+ but the special requirements of the GNU Affero General Public License,
560
+ section 13, concerning interaction through a network will apply to the
561
+ combination as such.
562
+
563
+ 14. Revised Versions of this License.
564
+
565
+ The Free Software Foundation may publish revised and/or new versions of
566
+ the GNU General Public License from time to time. Such new versions will
567
+ be similar in spirit to the present version, but may differ in detail to
568
+ address new problems or concerns.
569
+
570
+ Each version is given a distinguishing version number. If the
571
+ Program specifies that a certain numbered version of the GNU General
572
+ Public License "or any later version" applies to it, you have the
573
+ option of following the terms and conditions either of that numbered
574
+ version or of any later version published by the Free Software
575
+ Foundation. If the Program does not specify a version number of the
576
+ GNU General Public License, you may choose any version ever published
577
+ by the Free Software Foundation.
578
+
579
+ If the Program specifies that a proxy can decide which future
580
+ versions of the GNU General Public License can be used, that proxy's
581
+ public statement of acceptance of a version permanently authorizes you
582
+ to choose that version for the Program.
583
+
584
+ Later license versions may give you additional or different
585
+ permissions. However, no additional obligations are imposed on any
586
+ author or copyright holder as a result of your choosing to follow a
587
+ later version.
588
+
589
+ 15. Disclaimer of Warranty.
590
+
591
+ THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY
592
+ APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT
593
+ HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY
594
+ OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO,
595
+ THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
596
+ PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM
597
+ IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF
598
+ ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
599
+
600
+ 16. Limitation of Liability.
601
+
602
+ IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
603
+ WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS
604
+ THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY
605
+ GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE
606
+ USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF
607
+ DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD
608
+ PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS),
609
+ EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF
610
+ SUCH DAMAGES.
611
+
612
+ 17. Interpretation of Sections 15 and 16.
613
+
614
+ If the disclaimer of warranty and limitation of liability provided
615
+ above cannot be given local legal effect according to their terms,
616
+ reviewing courts shall apply local law that most closely approximates
617
+ an absolute waiver of all civil liability in connection with the
618
+ Program, unless a warranty or assumption of liability accompanies a
619
+ copy of the Program in return for a fee.
620
+
621
+ END OF TERMS AND CONDITIONS
622
+
623
+ How to Apply These Terms to Your New Programs
624
+
625
+ If you develop a new program, and you want it to be of the greatest
626
+ possible use to the public, the best way to achieve this is to make it
627
+ free software which everyone can redistribute and change under these terms.
628
+
629
+ To do so, attach the following notices to the program. It is safest
630
+ to attach them to the start of each source file to most effectively
631
+ state the exclusion of warranty; and each file should have at least
632
+ the "copyright" line and a pointer to where the full notice is found.
633
+
634
+ <one line to give the program's name and a brief idea of what it does.>
635
+ Copyright (C) <year> <name of author>
636
+
637
+ This program is free software: you can redistribute it and/or modify
638
+ it under the terms of the GNU General Public License as published by
639
+ the Free Software Foundation, either version 3 of the License, or
640
+ (at your option) any later version.
641
+
642
+ This program is distributed in the hope that it will be useful,
643
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
644
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
645
+ GNU General Public License for more details.
646
+
647
+ You should have received a copy of the GNU General Public License
648
+ along with this program. If not, see <https://www.gnu.org/licenses/>.
649
+
650
+ Also add information on how to contact you by electronic and paper mail.
651
+
652
+ If the program does terminal interaction, make it output a short
653
+ notice like this when it starts in an interactive mode:
654
+
655
+ <program> Copyright (C) <year> <name of author>
656
+ This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
657
+ This is free software, and you are welcome to redistribute it
658
+ under certain conditions; type `show c' for details.
659
+
660
+ The hypothetical commands `show w' and `show c' should show the appropriate
661
+ parts of the General Public License. Of course, your program's commands
662
+ might be different; for a GUI interface, you would use an "about box".
663
+
664
+ You should also get your employer (if you work as a programmer) or school,
665
+ if any, to sign a "copyright disclaimer" for the program, if necessary.
666
+ For more information on this, and how to apply and follow the GNU GPL, see
667
+ <https://www.gnu.org/licenses/>.
668
+
669
+ The GNU General Public License does not permit incorporating your program
670
+ into proprietary programs. If your program is a subroutine library, you
671
+ may consider it more useful to permit linking proprietary applications with
672
+ the library. If this is what you want to do, use the GNU Lesser General
673
+ Public License instead of this License. But first, please read
674
+ <https://www.gnu.org/licenses/why-not-lgpl.html>.
README.md CHANGED
@@ -1,6 +1,6 @@
1
  ---
2
  title: YOLOR
3
- emoji: 📉
4
  colorFrom: gray
5
  colorTo: purple
6
  sdk: gradio
@@ -35,3 +35,226 @@ Path is relative to the root of the repository.
35
 
36
  `pinned`: _boolean_
37
  Whether the Space stays on top of your list.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  title: YOLOR
3
+ emoji: 🚀
4
  colorFrom: gray
5
  colorTo: purple
6
  sdk: gradio
 
35
 
36
  `pinned`: _boolean_
37
  Whether the Space stays on top of your list.
38
+
39
+
40
+ # YOLOR
41
+ implementation of paper - [You Only Learn One Representation: Unified Network for Multiple Tasks](https://arxiv.org/abs/2105.04206)
42
+
43
+ [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/you-only-learn-one-representation-unified/real-time-object-detection-on-coco)](https://paperswithcode.com/sota/real-time-object-detection-on-coco?p=you-only-learn-one-representation-unified)
44
+
45
+ ![Unified Network](https://github.com/WongKinYiu/yolor/blob/main/figure/unifued_network.png)
46
+
47
+ <img src="https://github.com/WongKinYiu/yolor/blob/main/figure/performance.png" height="480">
48
+
49
+ To get the results on the table, please use [this branch](https://github.com/WongKinYiu/yolor/tree/paper).
50
+
51
+ | Model | Test Size | AP<sup>test</sup> | AP<sub>50</sub><sup>test</sup> | AP<sub>75</sub><sup>test</sup> | batch1 throughput | batch32 inference |
52
+ | :-- | :-: | :-: | :-: | :-: | :-: | :-: |
53
+ | **YOLOR-P6** | 1280 | **54.1%** | **71.8%** | **59.3%** | 49 *fps* | 8.3 *ms* |
54
+ | **YOLOR-W6** | 1280 | **55.5%** | **73.2%** | **60.6%** | 47 *fps* | 10.7 *ms* |
55
+ | **YOLOR-E6** | 1280 | **56.4%** | **74.1%** | **61.6%** | 37 *fps* | 17.1 *ms* |
56
+ | **YOLOR-D6** | 1280 | **57.3%** | **75.0%** | **62.7%** | 30 *fps* | 21.8 *ms* |
57
+ | **YOLOR-D6*** | 1280 | **57.8%** | **75.5%** | **63.3%** | 30 *fps* | 21.8 *ms* |
58
+ | | | | | | | |
59
+ | **YOLOv4-P5** | 896 | **51.8%** | **70.3%** | **56.6%** | 41 *fps* | - |
60
+ | **YOLOv4-P6** | 1280 | **54.5%** | **72.6%** | **59.8%** | 30 *fps* | - |
61
+ | **YOLOv4-P7** | 1536 | **55.5%** | **73.4%** | **60.8%** | 16 *fps* | - |
62
+ | | | | | | | |
63
+
64
+ To reproduce the inference speed, please see [darknet](https://github.com/WongKinYiu/yolor/tree/main/darknet).
65
+
66
+ | Model | Test Size | AP<sup>val</sup> | AP<sub>50</sub><sup>val</sup> | AP<sub>75</sub><sup>val</sup> | AP<sub>S</sub><sup>val</sup> | AP<sub>M</sub><sup>val</sup> | AP<sub>L</sub><sup>val</sup> | batch1 throughput |
67
+ | :-- | :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: |
68
+ | [**YOLOv4-CSP**](/cfg/yolov4_csp.cfg) | 640 | **49.1%** | **67.7%** | **53.8%** | **32.1%** | **54.4%** | **63.2%** | 76 *fps* |
69
+ | [**YOLOR-CSP**](/cfg/yolor_csp.cfg) | 640 | **49.2%** | **67.6%** | **53.7%** | **32.9%** | **54.4%** | **63.0%** | [weights](https://drive.google.com/file/d/1ZEqGy4kmZyD-Cj3tEFJcLSZenZBDGiyg/view?usp=sharing) |
70
+ | [**YOLOR-CSP***](/cfg/yolor_csp.cfg) | 640 | **50.0%** | **68.7%** | **54.3%** | **34.2%** | **55.1%** | **64.3%** | [weights](https://drive.google.com/file/d/1OJKgIasELZYxkIjFoiqyn555bcmixUP2/view?usp=sharing) |
71
+ | | | | | | | |
72
+ | [**YOLOv4-CSP-X**](/cfg/yolov4_csp_x.cfg) | 640 | **50.9%** | **69.3%** | **55.4%** | **35.3%** | **55.8%** | **64.8%** | 53 *fps* |
73
+ | [**YOLOR-CSP-X**](/cfg/yolor_csp_x.cfg) | 640 | **51.1%** | **69.6%** | **55.7%** | **35.7%** | **56.0%** | **65.2%** | [weights](https://drive.google.com/file/d/1L29rfIPNH1n910qQClGftknWpTBgAv6c/view?usp=sharing) |
74
+ | [**YOLOR-CSP-X***](/cfg/yolor_csp_x.cfg) | 640 | **51.5%** | **69.9%** | **56.1%** | **35.8%** | **56.8%** | **66.1%** | [weights](https://drive.google.com/file/d/1NbMG3ivuBQ4S8kEhFJ0FIqOQXevGje_w/view?usp=sharing) |
75
+ | | | | | | | |
76
+
77
+ Developing...
78
+
79
+ | Model | Test Size | AP<sup>test</sup> | AP<sub>50</sub><sup>test</sup> | AP<sub>75</sub><sup>test</sup> | AP<sub>S</sub><sup>test</sup> | AP<sub>M</sub><sup>test</sup> | AP<sub>L</sub><sup>test</sup> |
80
+ | :-- | :-: | :-: | :-: | :-: | :-: | :-: | :-: |
81
+ | **YOLOR-CSP** | 640 | **51.1%** | **69.6%** | **55.7%** | **31.7%** | **55.3%** | **64.7%** |
82
+ | **YOLOR-CSP-X** | 640 | **53.0%** | **71.4%** | **57.9%** | **33.7%** | **57.1%** | **66.8%** |
83
+
84
+ Train from scratch for 300 epochs...
85
+
86
+ | Model | Info | Test Size | AP |
87
+ | :-- | :-- | :-: | :-: |
88
+ | **YOLOR-CSP** | [evolution](https://github.com/ultralytics/yolov3/issues/392) | 640 | **48.0%** |
89
+ | **YOLOR-CSP** | [strategy](https://openaccess.thecvf.com/content/ICCV2021W/LPCV/html/Wang_Exploring_the_Power_of_Lightweight_YOLOv4_ICCVW_2021_paper.html) | 640 | **50.0%** |
90
+ | **YOLOR-CSP** | [strategy](https://openaccess.thecvf.com/content/ICCV2021W/LPCV/html/Wang_Exploring_the_Power_of_Lightweight_YOLOv4_ICCVW_2021_paper.html) + [simOTA](https://arxiv.org/abs/2107.08430) | 640 | **51.1%** |
91
+ | | | | |
92
+ | **YOLOR-CSP-X** | [strategy](https://openaccess.thecvf.com/content/ICCV2021W/LPCV/html/Wang_Exploring_the_Power_of_Lightweight_YOLOv4_ICCVW_2021_paper.html) | 640 | **51.5%** |
93
+ | **YOLOR-CSP-X** | [strategy](https://openaccess.thecvf.com/content/ICCV2021W/LPCV/html/Wang_Exploring_the_Power_of_Lightweight_YOLOv4_ICCVW_2021_paper.html) + [simOTA](https://arxiv.org/abs/2107.08430) | 640 | **53.0%** |
94
+
95
+ ## Installation
96
+
97
+ Docker environment (recommended)
98
+ <details><summary> <b>Expand</b> </summary>
99
+
100
+ ```
101
+ # create the docker container, you can change the share memory size if you have more.
102
+ nvidia-docker run --name yolor -it -v your_coco_path/:/coco/ -v your_code_path/:/yolor --shm-size=64g nvcr.io/nvidia/pytorch:20.11-py3
103
+
104
+ # apt install required packages
105
+ apt update
106
+ apt install -y zip htop screen libgl1-mesa-glx
107
+
108
+ # pip install required packages
109
+ pip install seaborn thop
110
+
111
+ # install mish-cuda if you want to use mish activation
112
+ # https://github.com/thomasbrandon/mish-cuda
113
+ # https://github.com/JunnYu/mish-cuda
114
+ cd /
115
+ git clone https://github.com/JunnYu/mish-cuda
116
+ cd mish-cuda
117
+ python setup.py build install
118
+
119
+ # install pytorch_wavelets if you want to use dwt down-sampling module
120
+ # https://github.com/fbcotter/pytorch_wavelets
121
+ cd /
122
+ git clone https://github.com/fbcotter/pytorch_wavelets
123
+ cd pytorch_wavelets
124
+ pip install .
125
+
126
+ # go to code folder
127
+ cd /yolor
128
+ ```
129
+
130
+ </details>
131
+
132
+ Colab environment
133
+ <details><summary> <b>Expand</b> </summary>
134
+
135
+ ```
136
+ git clone https://github.com/WongKinYiu/yolor
137
+ cd yolor
138
+
139
+ # pip install required packages
140
+ pip install -qr requirements.txt
141
+
142
+ # install mish-cuda if you want to use mish activation
143
+ # https://github.com/thomasbrandon/mish-cuda
144
+ # https://github.com/JunnYu/mish-cuda
145
+ git clone https://github.com/JunnYu/mish-cuda
146
+ cd mish-cuda
147
+ python setup.py build install
148
+ cd ..
149
+
150
+ # install pytorch_wavelets if you want to use dwt down-sampling module
151
+ # https://github.com/fbcotter/pytorch_wavelets
152
+ git clone https://github.com/fbcotter/pytorch_wavelets
153
+ cd pytorch_wavelets
154
+ pip install .
155
+ cd ..
156
+ ```
157
+
158
+ </details>
159
+
160
+ Prepare COCO dataset
161
+ <details><summary> <b>Expand</b> </summary>
162
+
163
+ ```
164
+ cd /yolor
165
+ bash scripts/get_coco.sh
166
+ ```
167
+
168
+ </details>
169
+
170
+ Prepare pretrained weight
171
+ <details><summary> <b>Expand</b> </summary>
172
+
173
+ ```
174
+ cd /yolor
175
+ bash scripts/get_pretrain.sh
176
+ ```
177
+
178
+ </details>
179
+
180
+ ## Testing
181
+
182
+ [`yolor_p6.pt`](https://drive.google.com/file/d/1Tdn3yqpZ79X7R1Ql0zNlNScB1Dv9Fp76/view?usp=sharing)
183
+
184
+ ```
185
+ python test.py --data data/coco.yaml --img 1280 --batch 32 --conf 0.001 --iou 0.65 --device 0 --cfg cfg/yolor_p6.cfg --weights yolor_p6.pt --name yolor_p6_val
186
+ ```
187
+
188
+ You will get the results:
189
+
190
+ ```
191
+ Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.52510
192
+ Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.70718
193
+ Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.57520
194
+ Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.37058
195
+ Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.56878
196
+ Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.66102
197
+ Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.39181
198
+ Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.65229
199
+ Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.71441
200
+ Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.57755
201
+ Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.75337
202
+ Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.84013
203
+ ```
204
+
205
+ ## Training
206
+
207
+ Single GPU training:
208
+
209
+ ```
210
+ python train.py --batch-size 8 --img 1280 1280 --data coco.yaml --cfg cfg/yolor_p6.cfg --weights '' --device 0 --name yolor_p6 --hyp hyp.scratch.1280.yaml --epochs 300
211
+ ```
212
+
213
+ Multiple GPU training:
214
+
215
+ ```
216
+ python -m torch.distributed.launch --nproc_per_node 2 --master_port 9527 train.py --batch-size 16 --img 1280 1280 --data coco.yaml --cfg cfg/yolor_p6.cfg --weights '' --device 0,1 --sync-bn --name yolor_p6 --hyp hyp.scratch.1280.yaml --epochs 300
217
+ ```
218
+
219
+ Training schedule in the paper:
220
+
221
+ ```
222
+ python -m torch.distributed.launch --nproc_per_node 8 --master_port 9527 train.py --batch-size 64 --img 1280 1280 --data data/coco.yaml --cfg cfg/yolor_p6.cfg --weights '' --device 0,1,2,3,4,5,6,7 --sync-bn --name yolor_p6 --hyp hyp.scratch.1280.yaml --epochs 300
223
+ python -m torch.distributed.launch --nproc_per_node 8 --master_port 9527 tune.py --batch-size 64 --img 1280 1280 --data data/coco.yaml --cfg cfg/yolor_p6.cfg --weights 'runs/train/yolor_p6/weights/last_298.pt' --device 0,1,2,3,4,5,6,7 --sync-bn --name yolor_p6-tune --hyp hyp.finetune.1280.yaml --epochs 450
224
+ python -m torch.distributed.launch --nproc_per_node 8 --master_port 9527 train.py --batch-size 64 --img 1280 1280 --data data/coco.yaml --cfg cfg/yolor_p6.cfg --weights 'runs/train/yolor_p6-tune/weights/epoch_424.pt' --device 0,1,2,3,4,5,6,7 --sync-bn --name yolor_p6-fine --hyp hyp.finetune.1280.yaml --epochs 450
225
+ ```
226
+
227
+ ## Inference
228
+
229
+ [`yolor_p6.pt`](https://drive.google.com/file/d/1Tdn3yqpZ79X7R1Ql0zNlNScB1Dv9Fp76/view?usp=sharing)
230
+
231
+ ```
232
+ python detect.py --source inference/images/horses.jpg --cfg cfg/yolor_p6.cfg --weights yolor_p6.pt --conf 0.25 --img-size 1280 --device 0
233
+ ```
234
+
235
+ You will get the results:
236
+
237
+ ![horses](https://github.com/WongKinYiu/yolor/blob/main/inference/output/horses.jpg)
238
+
239
+ ## Citation
240
+
241
+ ```
242
+ @article{wang2021you,
243
+ title={You Only Learn One Representation: Unified Network for Multiple Tasks},
244
+ author={Wang, Chien-Yao and Yeh, I-Hau and Liao, Hong-Yuan Mark},
245
+ journal={arXiv preprint arXiv:2105.04206},
246
+ year={2021}
247
+ }
248
+ ```
249
+
250
+ ## Acknowledgements
251
+
252
+ <details><summary> <b>Expand</b> </summary>
253
+
254
+ * [https://github.com/AlexeyAB/darknet](https://github.com/AlexeyAB/darknet)
255
+ * [https://github.com/WongKinYiu/PyTorch_YOLOv4](https://github.com/WongKinYiu/PyTorch_YOLOv4)
256
+ * [https://github.com/WongKinYiu/ScaledYOLOv4](https://github.com/WongKinYiu/ScaledYOLOv4)
257
+ * [https://github.com/ultralytics/yolov3](https://github.com/ultralytics/yolov3)
258
+ * [https://github.com/ultralytics/yolov5](https://github.com/ultralytics/yolov5)
259
+
260
+ </details>
app.py ADDED
@@ -0,0 +1,104 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from PIL import Image
2
+ import cv2
3
+ import torch
4
+ from numpy import random
5
+
6
+ from utils.general import (non_max_suppression, scale_coords)
7
+ from utils.plots import plot_one_box
8
+
9
+ from models.models import *
10
+ from utils.datasets import *
11
+ from utils.general import *
12
+
13
+ import gradio as gr
14
+ import requests
15
+
16
+ import gdown
17
+
18
+
19
+ url = 'https://drive.google.com/u/0/uc?id=1Tdn3yqpZ79X7R1Ql0zNlNScB1Dv9Fp76&export=download'
20
+ output = 'yolor_p6.pt'
21
+ gdown.download(url, output, quiet=False)
22
+
23
+ url1 = 'https://cdn.pixabay.com/photo/2014/09/07/21/52/city-438393_1280.jpg'
24
+ r = requests.get(url1, allow_redirects=True)
25
+ open("city1.jpg", 'wb').write(r.content)
26
+ url2 = 'https://cdn.pixabay.com/photo/2016/02/19/11/36/canal-1209808_1280.jpg'
27
+ r = requests.get(url2, allow_redirects=True)
28
+ open("city2.jpg", 'wb').write(r.content)
29
+
30
+
31
+ conf_thres = 0.4
32
+ iou_thres = 0.5
33
+
34
+
35
+ def load_classes(path):
36
+ # Loads *.names file at 'path'
37
+ with open(path, 'r') as f:
38
+ names = f.read().split('\n')
39
+ return list(filter(None, names)) # filter removes empty strings (such as last line)
40
+
41
+ def detect(pil_img,names):
42
+ img_np = np.array(pil_img)
43
+ img = torch.from_numpy(img_np)
44
+ img = img.float()
45
+ img /= 255.0 # 0 - 255 to 0.0 - 1.0
46
+
47
+ # Inference
48
+ pred = model(img.unsqueeze(0).permute(0,3,1,2), augment=False)[0]
49
+
50
+ # Apply NMS
51
+ pred = non_max_suppression(pred, conf_thres, iou_thres, classes=None, agnostic=False)
52
+
53
+ # Process detections
54
+ for i, det in enumerate(pred): # detections per image
55
+ if det is not None and len(det):
56
+ # Rescale boxes from img_size to im0 size
57
+ det[:, :4] = scale_coords(img_np.shape, det[:, :4], img_np.shape).round()
58
+
59
+ # Print results
60
+ for c in det[:, -1].unique():
61
+ n = (det[:, -1] == c).sum() # detections per class
62
+
63
+ # Write results
64
+ for *xyxy, conf, cls in det:
65
+ label = '%s %.2f' % (names[int(cls)], conf)
66
+ plot_one_box(xyxy, img_np, label=label, color=colors[int(cls)], line_thickness=3)
67
+ cv2.imwrite('/tmp/aaa.jpg',img_np[:,:,::-1])
68
+ return Image.fromarray(img_np)
69
+
70
+
71
+ with torch.no_grad():
72
+ cfg = 'cfg/yolor_p6.cfg'
73
+ imgsz = 1280
74
+ names = 'data/coco.names'
75
+ weights = 'yolor_p6.pt'
76
+
77
+ # Load model
78
+ model = Darknet(cfg, imgsz)
79
+ model.load_state_dict(torch.load(weights)['model'])
80
+ model.eval()
81
+
82
+ # Get names and colors
83
+ names = load_classes(names)
84
+ colors = [[random.randint(0, 255) for _ in range(3)] for _ in range(len(names))]
85
+
86
+ def inference(image):
87
+ image = image.resize(size=(imgsz, imgsz))
88
+ return detect(image, names)
89
+
90
+ title = "YOLOR P6"
91
+ description = "demo for YOLOR. To use it, simply upload your image, or click one of the examples to load them. Read more at the links below.\nModel: YOLOR-P6"
92
+ article = "<p style='text-align: center'><a href='https://arxiv.org/abs/2105.04206'>You Only Learn One Representation: Unified Network for Multiple Tasks</a> | <a href='https://github.com/WongKinYiu/yolor'>Github Repo</a></p>"
93
+
94
+ gr.Interface(
95
+ inference,
96
+ [gr.inputs.Image(type="pil", label="Input")],
97
+ gr.outputs.Image(type="numpy", label="Output"),
98
+ title=title,
99
+ description=description,
100
+ article=article,
101
+ examples=[
102
+ ["city1.jpg"],
103
+ ["city2.jpg"]
104
+ ]).launch()
cfg/yolor_csp.cfg ADDED
@@ -0,0 +1,1376 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [net]
2
+ # Testing
3
+ #batch=1
4
+ #subdivisions=1
5
+ # Training
6
+ batch=64
7
+ subdivisions=8
8
+ width=512
9
+ height=512
10
+ channels=3
11
+ momentum=0.949
12
+ decay=0.0005
13
+ angle=0
14
+ saturation = 1.5
15
+ exposure = 1.5
16
+ hue=.1
17
+
18
+ learning_rate=0.00261
19
+ burn_in=1000
20
+ max_batches = 500500
21
+ policy=steps
22
+ steps=400000,450000
23
+ scales=.1,.1
24
+
25
+ #cutmix=1
26
+ mosaic=1
27
+
28
+
29
+ # ============ Backbone ============ #
30
+
31
+ # Stem
32
+
33
+ # 0
34
+ [convolutional]
35
+ batch_normalize=1
36
+ filters=32
37
+ size=3
38
+ stride=1
39
+ pad=1
40
+ activation=silu
41
+
42
+ # P1
43
+
44
+ # Downsample
45
+
46
+ [convolutional]
47
+ batch_normalize=1
48
+ filters=64
49
+ size=3
50
+ stride=2
51
+ pad=1
52
+ activation=silu
53
+
54
+ # Residual Block
55
+
56
+ [convolutional]
57
+ batch_normalize=1
58
+ filters=32
59
+ size=1
60
+ stride=1
61
+ pad=1
62
+ activation=silu
63
+
64
+ [convolutional]
65
+ batch_normalize=1
66
+ filters=64
67
+ size=3
68
+ stride=1
69
+ pad=1
70
+ activation=silu
71
+
72
+ # 4 (previous+1+3k)
73
+ [shortcut]
74
+ from=-3
75
+ activation=linear
76
+
77
+ # P2
78
+
79
+ # Downsample
80
+
81
+ [convolutional]
82
+ batch_normalize=1
83
+ filters=128
84
+ size=3
85
+ stride=2
86
+ pad=1
87
+ activation=silu
88
+
89
+ # Split
90
+
91
+ [convolutional]
92
+ batch_normalize=1
93
+ filters=64
94
+ size=1
95
+ stride=1
96
+ pad=1
97
+ activation=silu
98
+
99
+ [route]
100
+ layers = -2
101
+
102
+ [convolutional]
103
+ batch_normalize=1
104
+ filters=64
105
+ size=1
106
+ stride=1
107
+ pad=1
108
+ activation=silu
109
+
110
+ # Residual Block
111
+
112
+ [convolutional]
113
+ batch_normalize=1
114
+ filters=64
115
+ size=1
116
+ stride=1
117
+ pad=1
118
+ activation=silu
119
+
120
+ [convolutional]
121
+ batch_normalize=1
122
+ filters=64
123
+ size=3
124
+ stride=1
125
+ pad=1
126
+ activation=silu
127
+
128
+ [shortcut]
129
+ from=-3
130
+ activation=linear
131
+
132
+ [convolutional]
133
+ batch_normalize=1
134
+ filters=64
135
+ size=1
136
+ stride=1
137
+ pad=1
138
+ activation=silu
139
+
140
+ [convolutional]
141
+ batch_normalize=1
142
+ filters=64
143
+ size=3
144
+ stride=1
145
+ pad=1
146
+ activation=silu
147
+
148
+ [shortcut]
149
+ from=-3
150
+ activation=linear
151
+
152
+ # Transition first
153
+
154
+ [convolutional]
155
+ batch_normalize=1
156
+ filters=64
157
+ size=1
158
+ stride=1
159
+ pad=1
160
+ activation=silu
161
+
162
+ # Merge [-1, -(3k+4)]
163
+
164
+ [route]
165
+ layers = -1,-10
166
+
167
+ # Transition last
168
+
169
+ # 17 (previous+7+3k)
170
+ [convolutional]
171
+ batch_normalize=1
172
+ filters=128
173
+ size=1
174
+ stride=1
175
+ pad=1
176
+ activation=silu
177
+
178
+ # P3
179
+
180
+ # Downsample
181
+
182
+ [convolutional]
183
+ batch_normalize=1
184
+ filters=256
185
+ size=3
186
+ stride=2
187
+ pad=1
188
+ activation=silu
189
+
190
+ # Split
191
+
192
+ [convolutional]
193
+ batch_normalize=1
194
+ filters=128
195
+ size=1
196
+ stride=1
197
+ pad=1
198
+ activation=silu
199
+
200
+ [route]
201
+ layers = -2
202
+
203
+ [convolutional]
204
+ batch_normalize=1
205
+ filters=128
206
+ size=1
207
+ stride=1
208
+ pad=1
209
+ activation=silu
210
+
211
+ # Residual Block
212
+
213
+ [convolutional]
214
+ batch_normalize=1
215
+ filters=128
216
+ size=1
217
+ stride=1
218
+ pad=1
219
+ activation=silu
220
+
221
+ [convolutional]
222
+ batch_normalize=1
223
+ filters=128
224
+ size=3
225
+ stride=1
226
+ pad=1
227
+ activation=silu
228
+
229
+ [shortcut]
230
+ from=-3
231
+ activation=linear
232
+
233
+ [convolutional]
234
+ batch_normalize=1
235
+ filters=128
236
+ size=1
237
+ stride=1
238
+ pad=1
239
+ activation=silu
240
+
241
+ [convolutional]
242
+ batch_normalize=1
243
+ filters=128
244
+ size=3
245
+ stride=1
246
+ pad=1
247
+ activation=silu
248
+
249
+ [shortcut]
250
+ from=-3
251
+ activation=linear
252
+
253
+ [convolutional]
254
+ batch_normalize=1
255
+ filters=128
256
+ size=1
257
+ stride=1
258
+ pad=1
259
+ activation=silu
260
+
261
+ [convolutional]
262
+ batch_normalize=1
263
+ filters=128
264
+ size=3
265
+ stride=1
266
+ pad=1
267
+ activation=silu
268
+
269
+ [shortcut]
270
+ from=-3
271
+ activation=linear
272
+
273
+ [convolutional]
274
+ batch_normalize=1
275
+ filters=128
276
+ size=1
277
+ stride=1
278
+ pad=1
279
+ activation=silu
280
+
281
+ [convolutional]
282
+ batch_normalize=1
283
+ filters=128
284
+ size=3
285
+ stride=1
286
+ pad=1
287
+ activation=silu
288
+
289
+ [shortcut]
290
+ from=-3
291
+ activation=linear
292
+
293
+ [convolutional]
294
+ batch_normalize=1
295
+ filters=128
296
+ size=1
297
+ stride=1
298
+ pad=1
299
+ activation=silu
300
+
301
+ [convolutional]
302
+ batch_normalize=1
303
+ filters=128
304
+ size=3
305
+ stride=1
306
+ pad=1
307
+ activation=silu
308
+
309
+ [shortcut]
310
+ from=-3
311
+ activation=linear
312
+
313
+ [convolutional]
314
+ batch_normalize=1
315
+ filters=128
316
+ size=1
317
+ stride=1
318
+ pad=1
319
+ activation=silu
320
+
321
+ [convolutional]
322
+ batch_normalize=1
323
+ filters=128
324
+ size=3
325
+ stride=1
326
+ pad=1
327
+ activation=silu
328
+
329
+ [shortcut]
330
+ from=-3
331
+ activation=linear
332
+
333
+ [convolutional]
334
+ batch_normalize=1
335
+ filters=128
336
+ size=1
337
+ stride=1
338
+ pad=1
339
+ activation=silu
340
+
341
+ [convolutional]
342
+ batch_normalize=1
343
+ filters=128
344
+ size=3
345
+ stride=1
346
+ pad=1
347
+ activation=silu
348
+
349
+ [shortcut]
350
+ from=-3
351
+ activation=linear
352
+
353
+ [convolutional]
354
+ batch_normalize=1
355
+ filters=128
356
+ size=1
357
+ stride=1
358
+ pad=1
359
+ activation=silu
360
+
361
+ [convolutional]
362
+ batch_normalize=1
363
+ filters=128
364
+ size=3
365
+ stride=1
366
+ pad=1
367
+ activation=silu
368
+
369
+ [shortcut]
370
+ from=-3
371
+ activation=linear
372
+
373
+ # Transition first
374
+
375
+ [convolutional]
376
+ batch_normalize=1
377
+ filters=128
378
+ size=1
379
+ stride=1
380
+ pad=1
381
+ activation=silu
382
+
383
+ # Merge [-1 -(4+3k)]
384
+
385
+ [route]
386
+ layers = -1,-28
387
+
388
+ # Transition last
389
+
390
+ # 48 (previous+7+3k)
391
+ [convolutional]
392
+ batch_normalize=1
393
+ filters=256
394
+ size=1
395
+ stride=1
396
+ pad=1
397
+ activation=silu
398
+
399
+ # P4
400
+
401
+ # Downsample
402
+
403
+ [convolutional]
404
+ batch_normalize=1
405
+ filters=512
406
+ size=3
407
+ stride=2
408
+ pad=1
409
+ activation=silu
410
+
411
+ # Split
412
+
413
+ [convolutional]
414
+ batch_normalize=1
415
+ filters=256
416
+ size=1
417
+ stride=1
418
+ pad=1
419
+ activation=silu
420
+
421
+ [route]
422
+ layers = -2
423
+
424
+ [convolutional]
425
+ batch_normalize=1
426
+ filters=256
427
+ size=1
428
+ stride=1
429
+ pad=1
430
+ activation=silu
431
+
432
+ # Residual Block
433
+
434
+ [convolutional]
435
+ batch_normalize=1
436
+ filters=256
437
+ size=1
438
+ stride=1
439
+ pad=1
440
+ activation=silu
441
+
442
+ [convolutional]
443
+ batch_normalize=1
444
+ filters=256
445
+ size=3
446
+ stride=1
447
+ pad=1
448
+ activation=silu
449
+
450
+ [shortcut]
451
+ from=-3
452
+ activation=linear
453
+
454
+ [convolutional]
455
+ batch_normalize=1
456
+ filters=256
457
+ size=1
458
+ stride=1
459
+ pad=1
460
+ activation=silu
461
+
462
+ [convolutional]
463
+ batch_normalize=1
464
+ filters=256
465
+ size=3
466
+ stride=1
467
+ pad=1
468
+ activation=silu
469
+
470
+ [shortcut]
471
+ from=-3
472
+ activation=linear
473
+
474
+ [convolutional]
475
+ batch_normalize=1
476
+ filters=256
477
+ size=1
478
+ stride=1
479
+ pad=1
480
+ activation=silu
481
+
482
+ [convolutional]
483
+ batch_normalize=1
484
+ filters=256
485
+ size=3
486
+ stride=1
487
+ pad=1
488
+ activation=silu
489
+
490
+ [shortcut]
491
+ from=-3
492
+ activation=linear
493
+
494
+ [convolutional]
495
+ batch_normalize=1
496
+ filters=256
497
+ size=1
498
+ stride=1
499
+ pad=1
500
+ activation=silu
501
+
502
+ [convolutional]
503
+ batch_normalize=1
504
+ filters=256
505
+ size=3
506
+ stride=1
507
+ pad=1
508
+ activation=silu
509
+
510
+ [shortcut]
511
+ from=-3
512
+ activation=linear
513
+
514
+ [convolutional]
515
+ batch_normalize=1
516
+ filters=256
517
+ size=1
518
+ stride=1
519
+ pad=1
520
+ activation=silu
521
+
522
+ [convolutional]
523
+ batch_normalize=1
524
+ filters=256
525
+ size=3
526
+ stride=1
527
+ pad=1
528
+ activation=silu
529
+
530
+ [shortcut]
531
+ from=-3
532
+ activation=linear
533
+
534
+ [convolutional]
535
+ batch_normalize=1
536
+ filters=256
537
+ size=1
538
+ stride=1
539
+ pad=1
540
+ activation=silu
541
+
542
+ [convolutional]
543
+ batch_normalize=1
544
+ filters=256
545
+ size=3
546
+ stride=1
547
+ pad=1
548
+ activation=silu
549
+
550
+ [shortcut]
551
+ from=-3
552
+ activation=linear
553
+
554
+ [convolutional]
555
+ batch_normalize=1
556
+ filters=256
557
+ size=1
558
+ stride=1
559
+ pad=1
560
+ activation=silu
561
+
562
+ [convolutional]
563
+ batch_normalize=1
564
+ filters=256
565
+ size=3
566
+ stride=1
567
+ pad=1
568
+ activation=silu
569
+
570
+ [shortcut]
571
+ from=-3
572
+ activation=linear
573
+
574
+ [convolutional]
575
+ batch_normalize=1
576
+ filters=256
577
+ size=1
578
+ stride=1
579
+ pad=1
580
+ activation=silu
581
+
582
+ [convolutional]
583
+ batch_normalize=1
584
+ filters=256
585
+ size=3
586
+ stride=1
587
+ pad=1
588
+ activation=silu
589
+
590
+ [shortcut]
591
+ from=-3
592
+ activation=linear
593
+
594
+ # Transition first
595
+
596
+ [convolutional]
597
+ batch_normalize=1
598
+ filters=256
599
+ size=1
600
+ stride=1
601
+ pad=1
602
+ activation=silu
603
+
604
+ # Merge [-1 -(3k+4)]
605
+
606
+ [route]
607
+ layers = -1,-28
608
+
609
+ # Transition last
610
+
611
+ # 79 (previous+7+3k)
612
+ [convolutional]
613
+ batch_normalize=1
614
+ filters=512
615
+ size=1
616
+ stride=1
617
+ pad=1
618
+ activation=silu
619
+
620
+ # P5
621
+
622
+ # Downsample
623
+
624
+ [convolutional]
625
+ batch_normalize=1
626
+ filters=1024
627
+ size=3
628
+ stride=2
629
+ pad=1
630
+ activation=silu
631
+
632
+ # Split
633
+
634
+ [convolutional]
635
+ batch_normalize=1
636
+ filters=512
637
+ size=1
638
+ stride=1
639
+ pad=1
640
+ activation=silu
641
+
642
+ [route]
643
+ layers = -2
644
+
645
+ [convolutional]
646
+ batch_normalize=1
647
+ filters=512
648
+ size=1
649
+ stride=1
650
+ pad=1
651
+ activation=silu
652
+
653
+ # Residual Block
654
+
655
+ [convolutional]
656
+ batch_normalize=1
657
+ filters=512
658
+ size=1
659
+ stride=1
660
+ pad=1
661
+ activation=silu
662
+
663
+ [convolutional]
664
+ batch_normalize=1
665
+ filters=512
666
+ size=3
667
+ stride=1
668
+ pad=1
669
+ activation=silu
670
+
671
+ [shortcut]
672
+ from=-3
673
+ activation=linear
674
+
675
+ [convolutional]
676
+ batch_normalize=1
677
+ filters=512
678
+ size=1
679
+ stride=1
680
+ pad=1
681
+ activation=silu
682
+
683
+ [convolutional]
684
+ batch_normalize=1
685
+ filters=512
686
+ size=3
687
+ stride=1
688
+ pad=1
689
+ activation=silu
690
+
691
+ [shortcut]
692
+ from=-3
693
+ activation=linear
694
+
695
+ [convolutional]
696
+ batch_normalize=1
697
+ filters=512
698
+ size=1
699
+ stride=1
700
+ pad=1
701
+ activation=silu
702
+
703
+ [convolutional]
704
+ batch_normalize=1
705
+ filters=512
706
+ size=3
707
+ stride=1
708
+ pad=1
709
+ activation=silu
710
+
711
+ [shortcut]
712
+ from=-3
713
+ activation=linear
714
+
715
+ [convolutional]
716
+ batch_normalize=1
717
+ filters=512
718
+ size=1
719
+ stride=1
720
+ pad=1
721
+ activation=silu
722
+
723
+ [convolutional]
724
+ batch_normalize=1
725
+ filters=512
726
+ size=3
727
+ stride=1
728
+ pad=1
729
+ activation=silu
730
+
731
+ [shortcut]
732
+ from=-3
733
+ activation=linear
734
+
735
+ # Transition first
736
+
737
+ [convolutional]
738
+ batch_normalize=1
739
+ filters=512
740
+ size=1
741
+ stride=1
742
+ pad=1
743
+ activation=silu
744
+
745
+ # Merge [-1 -(3k+4)]
746
+
747
+ [route]
748
+ layers = -1,-16
749
+
750
+ # Transition last
751
+
752
+ # 98 (previous+7+3k)
753
+ [convolutional]
754
+ batch_normalize=1
755
+ filters=1024
756
+ size=1
757
+ stride=1
758
+ pad=1
759
+ activation=silu
760
+
761
+ # ============ End of Backbone ============ #
762
+
763
+ # ============ Neck ============ #
764
+
765
+ # CSPSPP
766
+
767
+ [convolutional]
768
+ batch_normalize=1
769
+ filters=512
770
+ size=1
771
+ stride=1
772
+ pad=1
773
+ activation=silu
774
+
775
+ [route]
776
+ layers = -2
777
+
778
+ [convolutional]
779
+ batch_normalize=1
780
+ filters=512
781
+ size=1
782
+ stride=1
783
+ pad=1
784
+ activation=silu
785
+
786
+ [convolutional]
787
+ batch_normalize=1
788
+ size=3
789
+ stride=1
790
+ pad=1
791
+ filters=512
792
+ activation=silu
793
+
794
+ [convolutional]
795
+ batch_normalize=1
796
+ filters=512
797
+ size=1
798
+ stride=1
799
+ pad=1
800
+ activation=silu
801
+
802
+ ### SPP ###
803
+ [maxpool]
804
+ stride=1
805
+ size=5
806
+
807
+ [route]
808
+ layers=-2
809
+
810
+ [maxpool]
811
+ stride=1
812
+ size=9
813
+
814
+ [route]
815
+ layers=-4
816
+
817
+ [maxpool]
818
+ stride=1
819
+ size=13
820
+
821
+ [route]
822
+ layers=-1,-3,-5,-6
823
+ ### End SPP ###
824
+
825
+ [convolutional]
826
+ batch_normalize=1
827
+ filters=512
828
+ size=1
829
+ stride=1
830
+ pad=1
831
+ activation=silu
832
+
833
+ [convolutional]
834
+ batch_normalize=1
835
+ size=3
836
+ stride=1
837
+ pad=1
838
+ filters=512
839
+ activation=silu
840
+
841
+ [route]
842
+ layers = -1, -13
843
+
844
+ # 113 (previous+6+5+2k)
845
+ [convolutional]
846
+ batch_normalize=1
847
+ filters=512
848
+ size=1
849
+ stride=1
850
+ pad=1
851
+ activation=silu
852
+
853
+ # End of CSPSPP
854
+
855
+
856
+ # FPN-4
857
+
858
+ [convolutional]
859
+ batch_normalize=1
860
+ filters=256
861
+ size=1
862
+ stride=1
863
+ pad=1
864
+ activation=silu
865
+
866
+ [upsample]
867
+ stride=2
868
+
869
+ [route]
870
+ layers = 79
871
+
872
+ [convolutional]
873
+ batch_normalize=1
874
+ filters=256
875
+ size=1
876
+ stride=1
877
+ pad=1
878
+ activation=silu
879
+
880
+ [route]
881
+ layers = -1, -3
882
+
883
+ [convolutional]
884
+ batch_normalize=1
885
+ filters=256
886
+ size=1
887
+ stride=1
888
+ pad=1
889
+ activation=silu
890
+
891
+ # Split
892
+
893
+ [convolutional]
894
+ batch_normalize=1
895
+ filters=256
896
+ size=1
897
+ stride=1
898
+ pad=1
899
+ activation=silu
900
+
901
+ [route]
902
+ layers = -2
903
+
904
+ # Plain Block
905
+
906
+ [convolutional]
907
+ batch_normalize=1
908
+ filters=256
909
+ size=1
910
+ stride=1
911
+ pad=1
912
+ activation=silu
913
+
914
+ [convolutional]
915
+ batch_normalize=1
916
+ size=3
917
+ stride=1
918
+ pad=1
919
+ filters=256
920
+ activation=silu
921
+
922
+ [convolutional]
923
+ batch_normalize=1
924
+ filters=256
925
+ size=1
926
+ stride=1
927
+ pad=1
928
+ activation=silu
929
+
930
+ [convolutional]
931
+ batch_normalize=1
932
+ size=3
933
+ stride=1
934
+ pad=1
935
+ filters=256
936
+ activation=silu
937
+
938
+ # Merge [-1, -(2k+2)]
939
+
940
+ [route]
941
+ layers = -1, -6
942
+
943
+ # Transition last
944
+
945
+ # 127 (previous+6+4+2k)
946
+ [convolutional]
947
+ batch_normalize=1
948
+ filters=256
949
+ size=1
950
+ stride=1
951
+ pad=1
952
+ activation=silu
953
+
954
+
955
+ # FPN-3
956
+
957
+ [convolutional]
958
+ batch_normalize=1
959
+ filters=128
960
+ size=1
961
+ stride=1
962
+ pad=1
963
+ activation=silu
964
+
965
+ [upsample]
966
+ stride=2
967
+
968
+ [route]
969
+ layers = 48
970
+
971
+ [convolutional]
972
+ batch_normalize=1
973
+ filters=128
974
+ size=1
975
+ stride=1
976
+ pad=1
977
+ activation=silu
978
+
979
+ [route]
980
+ layers = -1, -3
981
+
982
+ [convolutional]
983
+ batch_normalize=1
984
+ filters=128
985
+ size=1
986
+ stride=1
987
+ pad=1
988
+ activation=silu
989
+
990
+ # Split
991
+
992
+ [convolutional]
993
+ batch_normalize=1
994
+ filters=128
995
+ size=1
996
+ stride=1
997
+ pad=1
998
+ activation=silu
999
+
1000
+ [route]
1001
+ layers = -2
1002
+
1003
+ # Plain Block
1004
+
1005
+ [convolutional]
1006
+ batch_normalize=1
1007
+ filters=128
1008
+ size=1
1009
+ stride=1
1010
+ pad=1
1011
+ activation=silu
1012
+
1013
+ [convolutional]
1014
+ batch_normalize=1
1015
+ size=3
1016
+ stride=1
1017
+ pad=1
1018
+ filters=128
1019
+ activation=silu
1020
+
1021
+ [convolutional]
1022
+ batch_normalize=1
1023
+ filters=128
1024
+ size=1
1025
+ stride=1
1026
+ pad=1
1027
+ activation=silu
1028
+
1029
+ [convolutional]
1030
+ batch_normalize=1
1031
+ size=3
1032
+ stride=1
1033
+ pad=1
1034
+ filters=128
1035
+ activation=silu
1036
+
1037
+ # Merge [-1, -(2k+2)]
1038
+
1039
+ [route]
1040
+ layers = -1, -6
1041
+
1042
+ # Transition last
1043
+
1044
+ # 141 (previous+6+4+2k)
1045
+ [convolutional]
1046
+ batch_normalize=1
1047
+ filters=128
1048
+ size=1
1049
+ stride=1
1050
+ pad=1
1051
+ activation=silu
1052
+
1053
+
1054
+ # PAN-4
1055
+
1056
+ [convolutional]
1057
+ batch_normalize=1
1058
+ size=3
1059
+ stride=2
1060
+ pad=1
1061
+ filters=256
1062
+ activation=silu
1063
+
1064
+ [route]
1065
+ layers = -1, 127
1066
+
1067
+ [convolutional]
1068
+ batch_normalize=1
1069
+ filters=256
1070
+ size=1
1071
+ stride=1
1072
+ pad=1
1073
+ activation=silu
1074
+
1075
+ # Split
1076
+
1077
+ [convolutional]
1078
+ batch_normalize=1
1079
+ filters=256
1080
+ size=1
1081
+ stride=1
1082
+ pad=1
1083
+ activation=silu
1084
+
1085
+ [route]
1086
+ layers = -2
1087
+
1088
+ # Plain Block
1089
+
1090
+ [convolutional]
1091
+ batch_normalize=1
1092
+ filters=256
1093
+ size=1
1094
+ stride=1
1095
+ pad=1
1096
+ activation=silu
1097
+
1098
+ [convolutional]
1099
+ batch_normalize=1
1100
+ size=3
1101
+ stride=1
1102
+ pad=1
1103
+ filters=256
1104
+ activation=silu
1105
+
1106
+ [convolutional]
1107
+ batch_normalize=1
1108
+ filters=256
1109
+ size=1
1110
+ stride=1
1111
+ pad=1
1112
+ activation=silu
1113
+
1114
+ [convolutional]
1115
+ batch_normalize=1
1116
+ size=3
1117
+ stride=1
1118
+ pad=1
1119
+ filters=256
1120
+ activation=silu
1121
+
1122
+ [route]
1123
+ layers = -1,-6
1124
+
1125
+ # Transition last
1126
+
1127
+ # 152 (previous+3+4+2k)
1128
+ [convolutional]
1129
+ batch_normalize=1
1130
+ filters=256
1131
+ size=1
1132
+ stride=1
1133
+ pad=1
1134
+ activation=silu
1135
+
1136
+
1137
+ # PAN-5
1138
+
1139
+ [convolutional]
1140
+ batch_normalize=1
1141
+ size=3
1142
+ stride=2
1143
+ pad=1
1144
+ filters=512
1145
+ activation=silu
1146
+
1147
+ [route]
1148
+ layers = -1, 113
1149
+
1150
+ [convolutional]
1151
+ batch_normalize=1
1152
+ filters=512
1153
+ size=1
1154
+ stride=1
1155
+ pad=1
1156
+ activation=silu
1157
+
1158
+ # Split
1159
+
1160
+ [convolutional]
1161
+ batch_normalize=1
1162
+ filters=512
1163
+ size=1
1164
+ stride=1
1165
+ pad=1
1166
+ activation=silu
1167
+
1168
+ [route]
1169
+ layers = -2
1170
+
1171
+ # Plain Block
1172
+
1173
+ [convolutional]
1174
+ batch_normalize=1
1175
+ filters=512
1176
+ size=1
1177
+ stride=1
1178
+ pad=1
1179
+ activation=silu
1180
+
1181
+ [convolutional]
1182
+ batch_normalize=1
1183
+ size=3
1184
+ stride=1
1185
+ pad=1
1186
+ filters=512
1187
+ activation=silu
1188
+
1189
+ [convolutional]
1190
+ batch_normalize=1
1191
+ filters=512
1192
+ size=1
1193
+ stride=1
1194
+ pad=1
1195
+ activation=silu
1196
+
1197
+ [convolutional]
1198
+ batch_normalize=1
1199
+ size=3
1200
+ stride=1
1201
+ pad=1
1202
+ filters=512
1203
+ activation=silu
1204
+
1205
+ [route]
1206
+ layers = -1,-6
1207
+
1208
+ # Transition last
1209
+
1210
+ # 163 (previous+3+4+2k)
1211
+ [convolutional]
1212
+ batch_normalize=1
1213
+ filters=512
1214
+ size=1
1215
+ stride=1
1216
+ pad=1
1217
+ activation=silu
1218
+
1219
+ # ============ End of Neck ============ #
1220
+
1221
+ # 164
1222
+ [implicit_add]
1223
+ filters=256
1224
+
1225
+ # 165
1226
+ [implicit_add]
1227
+ filters=512
1228
+
1229
+ # 166
1230
+ [implicit_add]
1231
+ filters=1024
1232
+
1233
+ # 167
1234
+ [implicit_mul]
1235
+ filters=255
1236
+
1237
+ # 168
1238
+ [implicit_mul]
1239
+ filters=255
1240
+
1241
+ # 169
1242
+ [implicit_mul]
1243
+ filters=255
1244
+
1245
+ # ============ Head ============ #
1246
+
1247
+ # YOLO-3
1248
+
1249
+ [route]
1250
+ layers = 141
1251
+
1252
+ [convolutional]
1253
+ batch_normalize=1
1254
+ size=3
1255
+ stride=1
1256
+ pad=1
1257
+ filters=256
1258
+ activation=silu
1259
+
1260
+ [shift_channels]
1261
+ from=164
1262
+
1263
+ [convolutional]
1264
+ size=1
1265
+ stride=1
1266
+ pad=1
1267
+ filters=255
1268
+ activation=linear
1269
+
1270
+ [control_channels]
1271
+ from=167
1272
+
1273
+ [yolo]
1274
+ mask = 0,1,2
1275
+ anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401
1276
+ classes=80
1277
+ num=9
1278
+ jitter=.3
1279
+ ignore_thresh = .7
1280
+ truth_thresh = 1
1281
+ random=1
1282
+ scale_x_y = 1.05
1283
+ iou_thresh=0.213
1284
+ cls_normalizer=1.0
1285
+ iou_normalizer=0.07
1286
+ iou_loss=ciou
1287
+ nms_kind=greedynms
1288
+ beta_nms=0.6
1289
+
1290
+
1291
+ # YOLO-4
1292
+
1293
+ [route]
1294
+ layers = 152
1295
+
1296
+ [convolutional]
1297
+ batch_normalize=1
1298
+ size=3
1299
+ stride=1
1300
+ pad=1
1301
+ filters=512
1302
+ activation=silu
1303
+
1304
+ [shift_channels]
1305
+ from=165
1306
+
1307
+ [convolutional]
1308
+ size=1
1309
+ stride=1
1310
+ pad=1
1311
+ filters=255
1312
+ activation=linear
1313
+
1314
+ [control_channels]
1315
+ from=168
1316
+
1317
+ [yolo]
1318
+ mask = 3,4,5
1319
+ anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401
1320
+ classes=80
1321
+ num=9
1322
+ jitter=.3
1323
+ ignore_thresh = .7
1324
+ truth_thresh = 1
1325
+ random=1
1326
+ scale_x_y = 1.05
1327
+ iou_thresh=0.213
1328
+ cls_normalizer=1.0
1329
+ iou_normalizer=0.07
1330
+ iou_loss=ciou
1331
+ nms_kind=greedynms
1332
+ beta_nms=0.6
1333
+
1334
+
1335
+ # YOLO-5
1336
+
1337
+ [route]
1338
+ layers = 163
1339
+
1340
+ [convolutional]
1341
+ batch_normalize=1
1342
+ size=3
1343
+ stride=1
1344
+ pad=1
1345
+ filters=1024
1346
+ activation=silu
1347
+
1348
+ [shift_channels]
1349
+ from=166
1350
+
1351
+ [convolutional]
1352
+ size=1
1353
+ stride=1
1354
+ pad=1
1355
+ filters=255
1356
+ activation=linear
1357
+
1358
+ [control_channels]
1359
+ from=169
1360
+
1361
+ [yolo]
1362
+ mask = 6,7,8
1363
+ anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401
1364
+ classes=80
1365
+ num=9
1366
+ jitter=.3
1367
+ ignore_thresh = .7
1368
+ truth_thresh = 1
1369
+ random=1
1370
+ scale_x_y = 1.05
1371
+ iou_thresh=0.213
1372
+ cls_normalizer=1.0
1373
+ iou_normalizer=0.07
1374
+ iou_loss=ciou
1375
+ nms_kind=greedynms
1376
+ beta_nms=0.6
cfg/yolor_csp_x.cfg ADDED
@@ -0,0 +1,1576 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [net]
2
+ # Testing
3
+ #batch=1
4
+ #subdivisions=1
5
+ # Training
6
+ batch=64
7
+ subdivisions=8
8
+ width=512
9
+ height=512
10
+ channels=3
11
+ momentum=0.949
12
+ decay=0.0005
13
+ angle=0
14
+ saturation = 1.5
15
+ exposure = 1.5
16
+ hue=.1
17
+
18
+ learning_rate=0.00261
19
+ burn_in=1000
20
+ max_batches = 500500
21
+ policy=steps
22
+ steps=400000,450000
23
+ scales=.1,.1
24
+
25
+ #cutmix=1
26
+ mosaic=1
27
+
28
+
29
+ # ============ Backbone ============ #
30
+
31
+ # Stem
32
+
33
+ # 0
34
+ [convolutional]
35
+ batch_normalize=1
36
+ filters=32
37
+ size=3
38
+ stride=1
39
+ pad=1
40
+ activation=silu
41
+
42
+ # P1
43
+
44
+ # Downsample
45
+
46
+ [convolutional]
47
+ batch_normalize=1
48
+ filters=80
49
+ size=3
50
+ stride=2
51
+ pad=1
52
+ activation=silu
53
+
54
+ # Residual Block
55
+
56
+ [convolutional]
57
+ batch_normalize=1
58
+ filters=40
59
+ size=1
60
+ stride=1
61
+ pad=1
62
+ activation=silu
63
+
64
+ [convolutional]
65
+ batch_normalize=1
66
+ filters=80
67
+ size=3
68
+ stride=1
69
+ pad=1
70
+ activation=silu
71
+
72
+ # 4 (previous+1+3k)
73
+ [shortcut]
74
+ from=-3
75
+ activation=linear
76
+
77
+ # P2
78
+
79
+ # Downsample
80
+
81
+ [convolutional]
82
+ batch_normalize=1
83
+ filters=160
84
+ size=3
85
+ stride=2
86
+ pad=1
87
+ activation=silu
88
+
89
+ # Split
90
+
91
+ [convolutional]
92
+ batch_normalize=1
93
+ filters=80
94
+ size=1
95
+ stride=1
96
+ pad=1
97
+ activation=silu
98
+
99
+ [route]
100
+ layers = -2
101
+
102
+ [convolutional]
103
+ batch_normalize=1
104
+ filters=80
105
+ size=1
106
+ stride=1
107
+ pad=1
108
+ activation=silu
109
+
110
+ # Residual Block
111
+
112
+ [convolutional]
113
+ batch_normalize=1
114
+ filters=80
115
+ size=1
116
+ stride=1
117
+ pad=1
118
+ activation=silu
119
+
120
+ [convolutional]
121
+ batch_normalize=1
122
+ filters=80
123
+ size=3
124
+ stride=1
125
+ pad=1
126
+ activation=silu
127
+
128
+ [shortcut]
129
+ from=-3
130
+ activation=linear
131
+
132
+ [convolutional]
133
+ batch_normalize=1
134
+ filters=80
135
+ size=1
136
+ stride=1
137
+ pad=1
138
+ activation=silu
139
+
140
+ [convolutional]
141
+ batch_normalize=1
142
+ filters=80
143
+ size=3
144
+ stride=1
145
+ pad=1
146
+ activation=silu
147
+
148
+ [shortcut]
149
+ from=-3
150
+ activation=linear
151
+
152
+ [convolutional]
153
+ batch_normalize=1
154
+ filters=80
155
+ size=1
156
+ stride=1
157
+ pad=1
158
+ activation=silu
159
+
160
+ [convolutional]
161
+ batch_normalize=1
162
+ filters=80
163
+ size=3
164
+ stride=1
165
+ pad=1
166
+ activation=silu
167
+
168
+ [shortcut]
169
+ from=-3
170
+ activation=linear
171
+
172
+ # Transition first
173
+
174
+ [convolutional]
175
+ batch_normalize=1
176
+ filters=80
177
+ size=1
178
+ stride=1
179
+ pad=1
180
+ activation=silu
181
+
182
+ # Merge [-1, -(3k+4)]
183
+
184
+ [route]
185
+ layers = -1,-13
186
+
187
+ # Transition last
188
+
189
+ # 20 (previous+7+3k)
190
+ [convolutional]
191
+ batch_normalize=1
192
+ filters=160
193
+ size=1
194
+ stride=1
195
+ pad=1
196
+ activation=silu
197
+
198
+ # P3
199
+
200
+ # Downsample
201
+
202
+ [convolutional]
203
+ batch_normalize=1
204
+ filters=320
205
+ size=3
206
+ stride=2
207
+ pad=1
208
+ activation=silu
209
+
210
+ # Split
211
+
212
+ [convolutional]
213
+ batch_normalize=1
214
+ filters=160
215
+ size=1
216
+ stride=1
217
+ pad=1
218
+ activation=silu
219
+
220
+ [route]
221
+ layers = -2
222
+
223
+ [convolutional]
224
+ batch_normalize=1
225
+ filters=160
226
+ size=1
227
+ stride=1
228
+ pad=1
229
+ activation=silu
230
+
231
+ # Residual Block
232
+
233
+ [convolutional]
234
+ batch_normalize=1
235
+ filters=160
236
+ size=1
237
+ stride=1
238
+ pad=1
239
+ activation=silu
240
+
241
+ [convolutional]
242
+ batch_normalize=1
243
+ filters=160
244
+ size=3
245
+ stride=1
246
+ pad=1
247
+ activation=silu
248
+
249
+ [shortcut]
250
+ from=-3
251
+ activation=linear
252
+
253
+ [convolutional]
254
+ batch_normalize=1
255
+ filters=160
256
+ size=1
257
+ stride=1
258
+ pad=1
259
+ activation=silu
260
+
261
+ [convolutional]
262
+ batch_normalize=1
263
+ filters=160
264
+ size=3
265
+ stride=1
266
+ pad=1
267
+ activation=silu
268
+
269
+ [shortcut]
270
+ from=-3
271
+ activation=linear
272
+
273
+ [convolutional]
274
+ batch_normalize=1
275
+ filters=160
276
+ size=1
277
+ stride=1
278
+ pad=1
279
+ activation=silu
280
+
281
+ [convolutional]
282
+ batch_normalize=1
283
+ filters=160
284
+ size=3
285
+ stride=1
286
+ pad=1
287
+ activation=silu
288
+
289
+ [shortcut]
290
+ from=-3
291
+ activation=linear
292
+
293
+ [convolutional]
294
+ batch_normalize=1
295
+ filters=160
296
+ size=1
297
+ stride=1
298
+ pad=1
299
+ activation=silu
300
+
301
+ [convolutional]
302
+ batch_normalize=1
303
+ filters=160
304
+ size=3
305
+ stride=1
306
+ pad=1
307
+ activation=silu
308
+
309
+ [shortcut]
310
+ from=-3
311
+ activation=linear
312
+
313
+ [convolutional]
314
+ batch_normalize=1
315
+ filters=160
316
+ size=1
317
+ stride=1
318
+ pad=1
319
+ activation=silu
320
+
321
+ [convolutional]
322
+ batch_normalize=1
323
+ filters=160
324
+ size=3
325
+ stride=1
326
+ pad=1
327
+ activation=silu
328
+
329
+ [shortcut]
330
+ from=-3
331
+ activation=linear
332
+
333
+ [convolutional]
334
+ batch_normalize=1
335
+ filters=160
336
+ size=1
337
+ stride=1
338
+ pad=1
339
+ activation=silu
340
+
341
+ [convolutional]
342
+ batch_normalize=1
343
+ filters=160
344
+ size=3
345
+ stride=1
346
+ pad=1
347
+ activation=silu
348
+
349
+ [shortcut]
350
+ from=-3
351
+ activation=linear
352
+
353
+ [convolutional]
354
+ batch_normalize=1
355
+ filters=160
356
+ size=1
357
+ stride=1
358
+ pad=1
359
+ activation=silu
360
+
361
+ [convolutional]
362
+ batch_normalize=1
363
+ filters=160
364
+ size=3
365
+ stride=1
366
+ pad=1
367
+ activation=silu
368
+
369
+ [shortcut]
370
+ from=-3
371
+ activation=linear
372
+
373
+ [convolutional]
374
+ batch_normalize=1
375
+ filters=160
376
+ size=1
377
+ stride=1
378
+ pad=1
379
+ activation=silu
380
+
381
+ [convolutional]
382
+ batch_normalize=1
383
+ filters=160
384
+ size=3
385
+ stride=1
386
+ pad=1
387
+ activation=silu
388
+
389
+ [shortcut]
390
+ from=-3
391
+ activation=linear
392
+
393
+ [convolutional]
394
+ batch_normalize=1
395
+ filters=160
396
+ size=1
397
+ stride=1
398
+ pad=1
399
+ activation=silu
400
+
401
+ [convolutional]
402
+ batch_normalize=1
403
+ filters=160
404
+ size=3
405
+ stride=1
406
+ pad=1
407
+ activation=silu
408
+
409
+ [shortcut]
410
+ from=-3
411
+ activation=linear
412
+
413
+ [convolutional]
414
+ batch_normalize=1
415
+ filters=160
416
+ size=1
417
+ stride=1
418
+ pad=1
419
+ activation=silu
420
+
421
+ [convolutional]
422
+ batch_normalize=1
423
+ filters=160
424
+ size=3
425
+ stride=1
426
+ pad=1
427
+ activation=silu
428
+
429
+ [shortcut]
430
+ from=-3
431
+ activation=linear
432
+
433
+ # Transition first
434
+
435
+ [convolutional]
436
+ batch_normalize=1
437
+ filters=160
438
+ size=1
439
+ stride=1
440
+ pad=1
441
+ activation=silu
442
+
443
+ # Merge [-1 -(4+3k)]
444
+
445
+ [route]
446
+ layers = -1,-34
447
+
448
+ # Transition last
449
+
450
+ # 57 (previous+7+3k)
451
+ [convolutional]
452
+ batch_normalize=1
453
+ filters=320
454
+ size=1
455
+ stride=1
456
+ pad=1
457
+ activation=silu
458
+
459
+ # P4
460
+
461
+ # Downsample
462
+
463
+ [convolutional]
464
+ batch_normalize=1
465
+ filters=640
466
+ size=3
467
+ stride=2
468
+ pad=1
469
+ activation=silu
470
+
471
+ # Split
472
+
473
+ [convolutional]
474
+ batch_normalize=1
475
+ filters=320
476
+ size=1
477
+ stride=1
478
+ pad=1
479
+ activation=silu
480
+
481
+ [route]
482
+ layers = -2
483
+
484
+ [convolutional]
485
+ batch_normalize=1
486
+ filters=320
487
+ size=1
488
+ stride=1
489
+ pad=1
490
+ activation=silu
491
+
492
+ # Residual Block
493
+
494
+ [convolutional]
495
+ batch_normalize=1
496
+ filters=320
497
+ size=1
498
+ stride=1
499
+ pad=1
500
+ activation=silu
501
+
502
+ [convolutional]
503
+ batch_normalize=1
504
+ filters=320
505
+ size=3
506
+ stride=1
507
+ pad=1
508
+ activation=silu
509
+
510
+ [shortcut]
511
+ from=-3
512
+ activation=linear
513
+
514
+ [convolutional]
515
+ batch_normalize=1
516
+ filters=320
517
+ size=1
518
+ stride=1
519
+ pad=1
520
+ activation=silu
521
+
522
+ [convolutional]
523
+ batch_normalize=1
524
+ filters=320
525
+ size=3
526
+ stride=1
527
+ pad=1
528
+ activation=silu
529
+
530
+ [shortcut]
531
+ from=-3
532
+ activation=linear
533
+
534
+ [convolutional]
535
+ batch_normalize=1
536
+ filters=320
537
+ size=1
538
+ stride=1
539
+ pad=1
540
+ activation=silu
541
+
542
+ [convolutional]
543
+ batch_normalize=1
544
+ filters=320
545
+ size=3
546
+ stride=1
547
+ pad=1
548
+ activation=silu
549
+
550
+ [shortcut]
551
+ from=-3
552
+ activation=linear
553
+
554
+ [convolutional]
555
+ batch_normalize=1
556
+ filters=320
557
+ size=1
558
+ stride=1
559
+ pad=1
560
+ activation=silu
561
+
562
+ [convolutional]
563
+ batch_normalize=1
564
+ filters=320
565
+ size=3
566
+ stride=1
567
+ pad=1
568
+ activation=silu
569
+
570
+ [shortcut]
571
+ from=-3
572
+ activation=linear
573
+
574
+ [convolutional]
575
+ batch_normalize=1
576
+ filters=320
577
+ size=1
578
+ stride=1
579
+ pad=1
580
+ activation=silu
581
+
582
+ [convolutional]
583
+ batch_normalize=1
584
+ filters=320
585
+ size=3
586
+ stride=1
587
+ pad=1
588
+ activation=silu
589
+
590
+ [shortcut]
591
+ from=-3
592
+ activation=linear
593
+
594
+ [convolutional]
595
+ batch_normalize=1
596
+ filters=320
597
+ size=1
598
+ stride=1
599
+ pad=1
600
+ activation=silu
601
+
602
+ [convolutional]
603
+ batch_normalize=1
604
+ filters=320
605
+ size=3
606
+ stride=1
607
+ pad=1
608
+ activation=silu
609
+
610
+ [shortcut]
611
+ from=-3
612
+ activation=linear
613
+
614
+ [convolutional]
615
+ batch_normalize=1
616
+ filters=320
617
+ size=1
618
+ stride=1
619
+ pad=1
620
+ activation=silu
621
+
622
+ [convolutional]
623
+ batch_normalize=1
624
+ filters=320
625
+ size=3
626
+ stride=1
627
+ pad=1
628
+ activation=silu
629
+
630
+ [shortcut]
631
+ from=-3
632
+ activation=linear
633
+
634
+ [convolutional]
635
+ batch_normalize=1
636
+ filters=320
637
+ size=1
638
+ stride=1
639
+ pad=1
640
+ activation=silu
641
+
642
+ [convolutional]
643
+ batch_normalize=1
644
+ filters=320
645
+ size=3
646
+ stride=1
647
+ pad=1
648
+ activation=silu
649
+
650
+ [shortcut]
651
+ from=-3
652
+ activation=linear
653
+
654
+ [convolutional]
655
+ batch_normalize=1
656
+ filters=320
657
+ size=1
658
+ stride=1
659
+ pad=1
660
+ activation=silu
661
+
662
+ [convolutional]
663
+ batch_normalize=1
664
+ filters=320
665
+ size=3
666
+ stride=1
667
+ pad=1
668
+ activation=silu
669
+
670
+ [shortcut]
671
+ from=-3
672
+ activation=linear
673
+
674
+ [convolutional]
675
+ batch_normalize=1
676
+ filters=320
677
+ size=1
678
+ stride=1
679
+ pad=1
680
+ activation=silu
681
+
682
+ [convolutional]
683
+ batch_normalize=1
684
+ filters=320
685
+ size=3
686
+ stride=1
687
+ pad=1
688
+ activation=silu
689
+
690
+ [shortcut]
691
+ from=-3
692
+ activation=linear
693
+
694
+ # Transition first
695
+
696
+ [convolutional]
697
+ batch_normalize=1
698
+ filters=320
699
+ size=1
700
+ stride=1
701
+ pad=1
702
+ activation=silu
703
+
704
+ # Merge [-1 -(3k+4)]
705
+
706
+ [route]
707
+ layers = -1,-34
708
+
709
+ # Transition last
710
+
711
+ # 94 (previous+7+3k)
712
+ [convolutional]
713
+ batch_normalize=1
714
+ filters=640
715
+ size=1
716
+ stride=1
717
+ pad=1
718
+ activation=silu
719
+
720
+ # P5
721
+
722
+ # Downsample
723
+
724
+ [convolutional]
725
+ batch_normalize=1
726
+ filters=1280
727
+ size=3
728
+ stride=2
729
+ pad=1
730
+ activation=silu
731
+
732
+ # Split
733
+
734
+ [convolutional]
735
+ batch_normalize=1
736
+ filters=640
737
+ size=1
738
+ stride=1
739
+ pad=1
740
+ activation=silu
741
+
742
+ [route]
743
+ layers = -2
744
+
745
+ [convolutional]
746
+ batch_normalize=1
747
+ filters=640
748
+ size=1
749
+ stride=1
750
+ pad=1
751
+ activation=silu
752
+
753
+ # Residual Block
754
+
755
+ [convolutional]
756
+ batch_normalize=1
757
+ filters=640
758
+ size=1
759
+ stride=1
760
+ pad=1
761
+ activation=silu
762
+
763
+ [convolutional]
764
+ batch_normalize=1
765
+ filters=640
766
+ size=3
767
+ stride=1
768
+ pad=1
769
+ activation=silu
770
+
771
+ [shortcut]
772
+ from=-3
773
+ activation=linear
774
+
775
+ [convolutional]
776
+ batch_normalize=1
777
+ filters=640
778
+ size=1
779
+ stride=1
780
+ pad=1
781
+ activation=silu
782
+
783
+ [convolutional]
784
+ batch_normalize=1
785
+ filters=640
786
+ size=3
787
+ stride=1
788
+ pad=1
789
+ activation=silu
790
+
791
+ [shortcut]
792
+ from=-3
793
+ activation=linear
794
+
795
+ [convolutional]
796
+ batch_normalize=1
797
+ filters=640
798
+ size=1
799
+ stride=1
800
+ pad=1
801
+ activation=silu
802
+
803
+ [convolutional]
804
+ batch_normalize=1
805
+ filters=640
806
+ size=3
807
+ stride=1
808
+ pad=1
809
+ activation=silu
810
+
811
+ [shortcut]
812
+ from=-3
813
+ activation=linear
814
+
815
+ [convolutional]
816
+ batch_normalize=1
817
+ filters=640
818
+ size=1
819
+ stride=1
820
+ pad=1
821
+ activation=silu
822
+
823
+ [convolutional]
824
+ batch_normalize=1
825
+ filters=640
826
+ size=3
827
+ stride=1
828
+ pad=1
829
+ activation=silu
830
+
831
+ [shortcut]
832
+ from=-3
833
+ activation=linear
834
+
835
+ [convolutional]
836
+ batch_normalize=1
837
+ filters=640
838
+ size=1
839
+ stride=1
840
+ pad=1
841
+ activation=silu
842
+
843
+ [convolutional]
844
+ batch_normalize=1
845
+ filters=640
846
+ size=3
847
+ stride=1
848
+ pad=1
849
+ activation=silu
850
+
851
+ [shortcut]
852
+ from=-3
853
+ activation=linear
854
+
855
+ # Transition first
856
+
857
+ [convolutional]
858
+ batch_normalize=1
859
+ filters=640
860
+ size=1
861
+ stride=1
862
+ pad=1
863
+ activation=silu
864
+
865
+ # Merge [-1 -(3k+4)]
866
+
867
+ [route]
868
+ layers = -1,-19
869
+
870
+ # Transition last
871
+
872
+ # 116 (previous+7+3k)
873
+ [convolutional]
874
+ batch_normalize=1
875
+ filters=1280
876
+ size=1
877
+ stride=1
878
+ pad=1
879
+ activation=silu
880
+
881
+ # ============ End of Backbone ============ #
882
+
883
+ # ============ Neck ============ #
884
+
885
+ # CSPSPP
886
+
887
+ [convolutional]
888
+ batch_normalize=1
889
+ filters=640
890
+ size=1
891
+ stride=1
892
+ pad=1
893
+ activation=silu
894
+
895
+ [route]
896
+ layers = -2
897
+
898
+ [convolutional]
899
+ batch_normalize=1
900
+ filters=640
901
+ size=1
902
+ stride=1
903
+ pad=1
904
+ activation=silu
905
+
906
+ [convolutional]
907
+ batch_normalize=1
908
+ size=3
909
+ stride=1
910
+ pad=1
911
+ filters=640
912
+ activation=silu
913
+
914
+ [convolutional]
915
+ batch_normalize=1
916
+ filters=640
917
+ size=1
918
+ stride=1
919
+ pad=1
920
+ activation=silu
921
+
922
+ ### SPP ###
923
+ [maxpool]
924
+ stride=1
925
+ size=5
926
+
927
+ [route]
928
+ layers=-2
929
+
930
+ [maxpool]
931
+ stride=1
932
+ size=9
933
+
934
+ [route]
935
+ layers=-4
936
+
937
+ [maxpool]
938
+ stride=1
939
+ size=13
940
+
941
+ [route]
942
+ layers=-1,-3,-5,-6
943
+ ### End SPP ###
944
+
945
+ [convolutional]
946
+ batch_normalize=1
947
+ filters=640
948
+ size=1
949
+ stride=1
950
+ pad=1
951
+ activation=silu
952
+
953
+ [convolutional]
954
+ batch_normalize=1
955
+ size=3
956
+ stride=1
957
+ pad=1
958
+ filters=640
959
+ activation=silu
960
+
961
+ [convolutional]
962
+ batch_normalize=1
963
+ filters=640
964
+ size=1
965
+ stride=1
966
+ pad=1
967
+ activation=silu
968
+
969
+ [convolutional]
970
+ batch_normalize=1
971
+ size=3
972
+ stride=1
973
+ pad=1
974
+ filters=640
975
+ activation=silu
976
+
977
+ [route]
978
+ layers = -1, -15
979
+
980
+ # 133 (previous+6+5+2k)
981
+ [convolutional]
982
+ batch_normalize=1
983
+ filters=640
984
+ size=1
985
+ stride=1
986
+ pad=1
987
+ activation=silu
988
+
989
+ # End of CSPSPP
990
+
991
+
992
+ # FPN-4
993
+
994
+ [convolutional]
995
+ batch_normalize=1
996
+ filters=320
997
+ size=1
998
+ stride=1
999
+ pad=1
1000
+ activation=silu
1001
+
1002
+ [upsample]
1003
+ stride=2
1004
+
1005
+ [route]
1006
+ layers = 94
1007
+
1008
+ [convolutional]
1009
+ batch_normalize=1
1010
+ filters=320
1011
+ size=1
1012
+ stride=1
1013
+ pad=1
1014
+ activation=silu
1015
+
1016
+ [route]
1017
+ layers = -1, -3
1018
+
1019
+ [convolutional]
1020
+ batch_normalize=1
1021
+ filters=320
1022
+ size=1
1023
+ stride=1
1024
+ pad=1
1025
+ activation=silu
1026
+
1027
+ # Split
1028
+
1029
+ [convolutional]
1030
+ batch_normalize=1
1031
+ filters=320
1032
+ size=1
1033
+ stride=1
1034
+ pad=1
1035
+ activation=silu
1036
+
1037
+ [route]
1038
+ layers = -2
1039
+
1040
+ # Plain Block
1041
+
1042
+ [convolutional]
1043
+ batch_normalize=1
1044
+ filters=320
1045
+ size=1
1046
+ stride=1
1047
+ pad=1
1048
+ activation=silu
1049
+
1050
+ [convolutional]
1051
+ batch_normalize=1
1052
+ size=3
1053
+ stride=1
1054
+ pad=1
1055
+ filters=320
1056
+ activation=silu
1057
+
1058
+ [convolutional]
1059
+ batch_normalize=1
1060
+ filters=320
1061
+ size=1
1062
+ stride=1
1063
+ pad=1
1064
+ activation=silu
1065
+
1066
+ [convolutional]
1067
+ batch_normalize=1
1068
+ size=3
1069
+ stride=1
1070
+ pad=1
1071
+ filters=320
1072
+ activation=silu
1073
+
1074
+ [convolutional]
1075
+ batch_normalize=1
1076
+ filters=320
1077
+ size=1
1078
+ stride=1
1079
+ pad=1
1080
+ activation=silu
1081
+
1082
+ [convolutional]
1083
+ batch_normalize=1
1084
+ size=3
1085
+ stride=1
1086
+ pad=1
1087
+ filters=320
1088
+ activation=silu
1089
+
1090
+ # Merge [-1, -(2k+2)]
1091
+
1092
+ [route]
1093
+ layers = -1, -8
1094
+
1095
+ # Transition last
1096
+
1097
+ # 149 (previous+6+4+2k)
1098
+ [convolutional]
1099
+ batch_normalize=1
1100
+ filters=320
1101
+ size=1
1102
+ stride=1
1103
+ pad=1
1104
+ activation=silu
1105
+
1106
+
1107
+ # FPN-3
1108
+
1109
+ [convolutional]
1110
+ batch_normalize=1
1111
+ filters=160
1112
+ size=1
1113
+ stride=1
1114
+ pad=1
1115
+ activation=silu
1116
+
1117
+ [upsample]
1118
+ stride=2
1119
+
1120
+ [route]
1121
+ layers = 57
1122
+
1123
+ [convolutional]
1124
+ batch_normalize=1
1125
+ filters=160
1126
+ size=1
1127
+ stride=1
1128
+ pad=1
1129
+ activation=silu
1130
+
1131
+ [route]
1132
+ layers = -1, -3
1133
+
1134
+ [convolutional]
1135
+ batch_normalize=1
1136
+ filters=160
1137
+ size=1
1138
+ stride=1
1139
+ pad=1
1140
+ activation=silu
1141
+
1142
+ # Split
1143
+
1144
+ [convolutional]
1145
+ batch_normalize=1
1146
+ filters=160
1147
+ size=1
1148
+ stride=1
1149
+ pad=1
1150
+ activation=silu
1151
+
1152
+ [route]
1153
+ layers = -2
1154
+
1155
+ # Plain Block
1156
+
1157
+ [convolutional]
1158
+ batch_normalize=1
1159
+ filters=160
1160
+ size=1
1161
+ stride=1
1162
+ pad=1
1163
+ activation=silu
1164
+
1165
+ [convolutional]
1166
+ batch_normalize=1
1167
+ size=3
1168
+ stride=1
1169
+ pad=1
1170
+ filters=160
1171
+ activation=silu
1172
+
1173
+ [convolutional]
1174
+ batch_normalize=1
1175
+ filters=160
1176
+ size=1
1177
+ stride=1
1178
+ pad=1
1179
+ activation=silu
1180
+
1181
+ [convolutional]
1182
+ batch_normalize=1
1183
+ size=3
1184
+ stride=1
1185
+ pad=1
1186
+ filters=160
1187
+ activation=silu
1188
+
1189
+ [convolutional]
1190
+ batch_normalize=1
1191
+ filters=160
1192
+ size=1
1193
+ stride=1
1194
+ pad=1
1195
+ activation=silu
1196
+
1197
+ [convolutional]
1198
+ batch_normalize=1
1199
+ size=3
1200
+ stride=1
1201
+ pad=1
1202
+ filters=160
1203
+ activation=silu
1204
+
1205
+ # Merge [-1, -(2k+2)]
1206
+
1207
+ [route]
1208
+ layers = -1, -8
1209
+
1210
+ # Transition last
1211
+
1212
+ # 165 (previous+6+4+2k)
1213
+ [convolutional]
1214
+ batch_normalize=1
1215
+ filters=160
1216
+ size=1
1217
+ stride=1
1218
+ pad=1
1219
+ activation=silu
1220
+
1221
+
1222
+ # PAN-4
1223
+
1224
+ [convolutional]
1225
+ batch_normalize=1
1226
+ size=3
1227
+ stride=2
1228
+ pad=1
1229
+ filters=320
1230
+ activation=silu
1231
+
1232
+ [route]
1233
+ layers = -1, 149
1234
+
1235
+ [convolutional]
1236
+ batch_normalize=1
1237
+ filters=320
1238
+ size=1
1239
+ stride=1
1240
+ pad=1
1241
+ activation=silu
1242
+
1243
+ # Split
1244
+
1245
+ [convolutional]
1246
+ batch_normalize=1
1247
+ filters=320
1248
+ size=1
1249
+ stride=1
1250
+ pad=1
1251
+ activation=silu
1252
+
1253
+ [route]
1254
+ layers = -2
1255
+
1256
+ # Plain Block
1257
+
1258
+ [convolutional]
1259
+ batch_normalize=1
1260
+ filters=320
1261
+ size=1
1262
+ stride=1
1263
+ pad=1
1264
+ activation=silu
1265
+
1266
+ [convolutional]
1267
+ batch_normalize=1
1268
+ size=3
1269
+ stride=1
1270
+ pad=1
1271
+ filters=320
1272
+ activation=silu
1273
+
1274
+ [convolutional]
1275
+ batch_normalize=1
1276
+ filters=320
1277
+ size=1
1278
+ stride=1
1279
+ pad=1
1280
+ activation=silu
1281
+
1282
+ [convolutional]
1283
+ batch_normalize=1
1284
+ size=3
1285
+ stride=1
1286
+ pad=1
1287
+ filters=320
1288
+ activation=silu
1289
+
1290
+ [convolutional]
1291
+ batch_normalize=1
1292
+ filters=320
1293
+ size=1
1294
+ stride=1
1295
+ pad=1
1296
+ activation=silu
1297
+
1298
+ [convolutional]
1299
+ batch_normalize=1
1300
+ size=3
1301
+ stride=1
1302
+ pad=1
1303
+ filters=320
1304
+ activation=silu
1305
+
1306
+ [route]
1307
+ layers = -1,-8
1308
+
1309
+ # Transition last
1310
+
1311
+ # 178 (previous+3+4+2k)
1312
+ [convolutional]
1313
+ batch_normalize=1
1314
+ filters=320
1315
+ size=1
1316
+ stride=1
1317
+ pad=1
1318
+ activation=silu
1319
+
1320
+
1321
+ # PAN-5
1322
+
1323
+ [convolutional]
1324
+ batch_normalize=1
1325
+ size=3
1326
+ stride=2
1327
+ pad=1
1328
+ filters=640
1329
+ activation=silu
1330
+
1331
+ [route]
1332
+ layers = -1, 133
1333
+
1334
+ [convolutional]
1335
+ batch_normalize=1
1336
+ filters=640
1337
+ size=1
1338
+ stride=1
1339
+ pad=1
1340
+ activation=silu
1341
+
1342
+ # Split
1343
+
1344
+ [convolutional]
1345
+ batch_normalize=1
1346
+ filters=640
1347
+ size=1
1348
+ stride=1
1349
+ pad=1
1350
+ activation=silu
1351
+
1352
+ [route]
1353
+ layers = -2
1354
+
1355
+ # Plain Block
1356
+
1357
+ [convolutional]
1358
+ batch_normalize=1
1359
+ filters=640
1360
+ size=1
1361
+ stride=1
1362
+ pad=1
1363
+ activation=silu
1364
+
1365
+ [convolutional]
1366
+ batch_normalize=1
1367
+ size=3
1368
+ stride=1
1369
+ pad=1
1370
+ filters=640
1371
+ activation=silu
1372
+
1373
+ [convolutional]
1374
+ batch_normalize=1
1375
+ filters=640
1376
+ size=1
1377
+ stride=1
1378
+ pad=1
1379
+ activation=silu
1380
+
1381
+ [convolutional]
1382
+ batch_normalize=1
1383
+ size=3
1384
+ stride=1
1385
+ pad=1
1386
+ filters=640
1387
+ activation=silu
1388
+
1389
+ [convolutional]
1390
+ batch_normalize=1
1391
+ filters=640
1392
+ size=1
1393
+ stride=1
1394
+ pad=1
1395
+ activation=silu
1396
+
1397
+ [convolutional]
1398
+ batch_normalize=1
1399
+ size=3
1400
+ stride=1
1401
+ pad=1
1402
+ filters=640
1403
+ activation=silu
1404
+
1405
+ [route]
1406
+ layers = -1,-8
1407
+
1408
+ # Transition last
1409
+
1410
+ # 191 (previous+3+4+2k)
1411
+ [convolutional]
1412
+ batch_normalize=1
1413
+ filters=640
1414
+ size=1
1415
+ stride=1
1416
+ pad=1
1417
+ activation=silu
1418
+
1419
+ # ============ End of Neck ============ #
1420
+
1421
+ # 192
1422
+ [implicit_add]
1423
+ filters=320
1424
+
1425
+ # 193
1426
+ [implicit_add]
1427
+ filters=640
1428
+
1429
+ # 194
1430
+ [implicit_add]
1431
+ filters=1280
1432
+
1433
+ # 195
1434
+ [implicit_mul]
1435
+ filters=255
1436
+
1437
+ # 196
1438
+ [implicit_mul]
1439
+ filters=255
1440
+
1441
+ # 197
1442
+ [implicit_mul]
1443
+ filters=255
1444
+
1445
+ # ============ Head ============ #
1446
+
1447
+ # YOLO-3
1448
+
1449
+ [route]
1450
+ layers = 165
1451
+
1452
+ [convolutional]
1453
+ batch_normalize=1
1454
+ size=3
1455
+ stride=1
1456
+ pad=1
1457
+ filters=320
1458
+ activation=silu
1459
+
1460
+ [shift_channels]
1461
+ from=192
1462
+
1463
+ [convolutional]
1464
+ size=1
1465
+ stride=1
1466
+ pad=1
1467
+ filters=255
1468
+ activation=linear
1469
+
1470
+ [control_channels]
1471
+ from=195
1472
+
1473
+ [yolo]
1474
+ mask = 0,1,2
1475
+ anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401
1476
+ classes=80
1477
+ num=9
1478
+ jitter=.3
1479
+ ignore_thresh = .7
1480
+ truth_thresh = 1
1481
+ random=1
1482
+ scale_x_y = 1.05
1483
+ iou_thresh=0.213
1484
+ cls_normalizer=1.0
1485
+ iou_normalizer=0.07
1486
+ iou_loss=ciou
1487
+ nms_kind=greedynms
1488
+ beta_nms=0.6
1489
+
1490
+
1491
+ # YOLO-4
1492
+
1493
+ [route]
1494
+ layers = 178
1495
+
1496
+ [convolutional]
1497
+ batch_normalize=1
1498
+ size=3
1499
+ stride=1
1500
+ pad=1
1501
+ filters=640
1502
+ activation=silu
1503
+
1504
+ [shift_channels]
1505
+ from=193
1506
+
1507
+ [convolutional]
1508
+ size=1
1509
+ stride=1
1510
+ pad=1
1511
+ filters=255
1512
+ activation=linear
1513
+
1514
+ [control_channels]
1515
+ from=196
1516
+
1517
+ [yolo]
1518
+ mask = 3,4,5
1519
+ anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401
1520
+ classes=80
1521
+ num=9
1522
+ jitter=.3
1523
+ ignore_thresh = .7
1524
+ truth_thresh = 1
1525
+ random=1
1526
+ scale_x_y = 1.05
1527
+ iou_thresh=0.213
1528
+ cls_normalizer=1.0
1529
+ iou_normalizer=0.07
1530
+ iou_loss=ciou
1531
+ nms_kind=greedynms
1532
+ beta_nms=0.6
1533
+
1534
+
1535
+ # YOLO-5
1536
+
1537
+ [route]
1538
+ layers = 191
1539
+
1540
+ [convolutional]
1541
+ batch_normalize=1
1542
+ size=3
1543
+ stride=1
1544
+ pad=1
1545
+ filters=1280
1546
+ activation=silu
1547
+
1548
+ [shift_channels]
1549
+ from=194
1550
+
1551
+ [convolutional]
1552
+ size=1
1553
+ stride=1
1554
+ pad=1
1555
+ filters=255
1556
+ activation=linear
1557
+
1558
+ [control_channels]
1559
+ from=197
1560
+
1561
+ [yolo]
1562
+ mask = 6,7,8
1563
+ anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401
1564
+ classes=80
1565
+ num=9
1566
+ jitter=.3
1567
+ ignore_thresh = .7
1568
+ truth_thresh = 1
1569
+ random=1
1570
+ scale_x_y = 1.05
1571
+ iou_thresh=0.213
1572
+ cls_normalizer=1.0
1573
+ iou_normalizer=0.07
1574
+ iou_loss=ciou
1575
+ nms_kind=greedynms
1576
+ beta_nms=0.6
cfg/yolor_p6.cfg ADDED
@@ -0,0 +1,1760 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [net]
2
+ batch=64
3
+ subdivisions=8
4
+ width=1280
5
+ height=1280
6
+ channels=3
7
+ momentum=0.949
8
+ decay=0.0005
9
+ angle=0
10
+ saturation = 1.5
11
+ exposure = 1.5
12
+ hue=.1
13
+
14
+ learning_rate=0.00261
15
+ burn_in=1000
16
+ max_batches = 500500
17
+ policy=steps
18
+ steps=400000,450000
19
+ scales=.1,.1
20
+
21
+ mosaic=1
22
+
23
+
24
+ # ============ Backbone ============ #
25
+
26
+ # Stem
27
+
28
+ # P1
29
+
30
+ # Downsample
31
+
32
+ # 0
33
+ [reorg]
34
+
35
+ [convolutional]
36
+ batch_normalize=1
37
+ filters=64
38
+ size=3
39
+ stride=1
40
+ pad=1
41
+ activation=silu
42
+
43
+
44
+ # P2
45
+
46
+ # Downsample
47
+
48
+ [convolutional]
49
+ batch_normalize=1
50
+ filters=128
51
+ size=3
52
+ stride=2
53
+ pad=1
54
+ activation=silu
55
+
56
+ # Split
57
+
58
+ [convolutional]
59
+ batch_normalize=1
60
+ filters=64
61
+ size=1
62
+ stride=1
63
+ pad=1
64
+ activation=silu
65
+
66
+ [route]
67
+ layers = -2
68
+
69
+ [convolutional]
70
+ batch_normalize=1
71
+ filters=64
72
+ size=1
73
+ stride=1
74
+ pad=1
75
+ activation=silu
76
+
77
+ # Residual Block
78
+
79
+ [convolutional]
80
+ batch_normalize=1
81
+ filters=64
82
+ size=1
83
+ stride=1
84
+ pad=1
85
+ activation=silu
86
+
87
+ [convolutional]
88
+ batch_normalize=1
89
+ filters=64
90
+ size=3
91
+ stride=1
92
+ pad=1
93
+ activation=silu
94
+
95
+ [shortcut]
96
+ from=-3
97
+ activation=linear
98
+
99
+ [convolutional]
100
+ batch_normalize=1
101
+ filters=64
102
+ size=1
103
+ stride=1
104
+ pad=1
105
+ activation=silu
106
+
107
+ [convolutional]
108
+ batch_normalize=1
109
+ filters=64
110
+ size=3
111
+ stride=1
112
+ pad=1
113
+ activation=silu
114
+
115
+ [shortcut]
116
+ from=-3
117
+ activation=linear
118
+
119
+ [convolutional]
120
+ batch_normalize=1
121
+ filters=64
122
+ size=1
123
+ stride=1
124
+ pad=1
125
+ activation=silu
126
+
127
+ [convolutional]
128
+ batch_normalize=1
129
+ filters=64
130
+ size=3
131
+ stride=1
132
+ pad=1
133
+ activation=silu
134
+
135
+ [shortcut]
136
+ from=-3
137
+ activation=linear
138
+
139
+ # Transition first
140
+ #
141
+ #[convolutional]
142
+ #batch_normalize=1
143
+ #filters=64
144
+ #size=1
145
+ #stride=1
146
+ #pad=1
147
+ #activation=silu
148
+
149
+ # Merge [-1, -(3k+3)]
150
+
151
+ [route]
152
+ layers = -1,-12
153
+
154
+ # Transition last
155
+
156
+ # 16 (previous+6+3k)
157
+ [convolutional]
158
+ batch_normalize=1
159
+ filters=128
160
+ size=1
161
+ stride=1
162
+ pad=1
163
+ activation=silu
164
+
165
+
166
+ # P3
167
+
168
+ # Downsample
169
+
170
+ [convolutional]
171
+ batch_normalize=1
172
+ filters=256
173
+ size=3
174
+ stride=2
175
+ pad=1
176
+ activation=silu
177
+
178
+ # Split
179
+
180
+ [convolutional]
181
+ batch_normalize=1
182
+ filters=128
183
+ size=1
184
+ stride=1
185
+ pad=1
186
+ activation=silu
187
+
188
+ [route]
189
+ layers = -2
190
+
191
+ [convolutional]
192
+ batch_normalize=1
193
+ filters=128
194
+ size=1
195
+ stride=1
196
+ pad=1
197
+ activation=silu
198
+
199
+ # Residual Block
200
+
201
+ [convolutional]
202
+ batch_normalize=1
203
+ filters=128
204
+ size=1
205
+ stride=1
206
+ pad=1
207
+ activation=silu
208
+
209
+ [convolutional]
210
+ batch_normalize=1
211
+ filters=128
212
+ size=3
213
+ stride=1
214
+ pad=1
215
+ activation=silu
216
+
217
+ [shortcut]
218
+ from=-3
219
+ activation=linear
220
+
221
+ [convolutional]
222
+ batch_normalize=1
223
+ filters=128
224
+ size=1
225
+ stride=1
226
+ pad=1
227
+ activation=silu
228
+
229
+ [convolutional]
230
+ batch_normalize=1
231
+ filters=128
232
+ size=3
233
+ stride=1
234
+ pad=1
235
+ activation=silu
236
+
237
+ [shortcut]
238
+ from=-3
239
+ activation=linear
240
+
241
+ [convolutional]
242
+ batch_normalize=1
243
+ filters=128
244
+ size=1
245
+ stride=1
246
+ pad=1
247
+ activation=silu
248
+
249
+ [convolutional]
250
+ batch_normalize=1
251
+ filters=128
252
+ size=3
253
+ stride=1
254
+ pad=1
255
+ activation=silu
256
+
257
+ [shortcut]
258
+ from=-3
259
+ activation=linear
260
+
261
+ [convolutional]
262
+ batch_normalize=1
263
+ filters=128
264
+ size=1
265
+ stride=1
266
+ pad=1
267
+ activation=silu
268
+
269
+ [convolutional]
270
+ batch_normalize=1
271
+ filters=128
272
+ size=3
273
+ stride=1
274
+ pad=1
275
+ activation=silu
276
+
277
+ [shortcut]
278
+ from=-3
279
+ activation=linear
280
+
281
+ [convolutional]
282
+ batch_normalize=1
283
+ filters=128
284
+ size=1
285
+ stride=1
286
+ pad=1
287
+ activation=silu
288
+
289
+ [convolutional]
290
+ batch_normalize=1
291
+ filters=128
292
+ size=3
293
+ stride=1
294
+ pad=1
295
+ activation=silu
296
+
297
+ [shortcut]
298
+ from=-3
299
+ activation=linear
300
+
301
+ [convolutional]
302
+ batch_normalize=1
303
+ filters=128
304
+ size=1
305
+ stride=1
306
+ pad=1
307
+ activation=silu
308
+
309
+ [convolutional]
310
+ batch_normalize=1
311
+ filters=128
312
+ size=3
313
+ stride=1
314
+ pad=1
315
+ activation=silu
316
+
317
+ [shortcut]
318
+ from=-3
319
+ activation=linear
320
+
321
+ [convolutional]
322
+ batch_normalize=1
323
+ filters=128
324
+ size=1
325
+ stride=1
326
+ pad=1
327
+ activation=silu
328
+
329
+ [convolutional]
330
+ batch_normalize=1
331
+ filters=128
332
+ size=3
333
+ stride=1
334
+ pad=1
335
+ activation=silu
336
+
337
+ [shortcut]
338
+ from=-3
339
+ activation=linear
340
+
341
+ # Transition first
342
+ #
343
+ #[convolutional]
344
+ #batch_normalize=1
345
+ #filters=128
346
+ #size=1
347
+ #stride=1
348
+ #pad=1
349
+ #activation=silu
350
+
351
+ # Merge [-1, -(3k+3)]
352
+
353
+ [route]
354
+ layers = -1,-24
355
+
356
+ # Transition last
357
+
358
+ # 43 (previous+6+3k)
359
+ [convolutional]
360
+ batch_normalize=1
361
+ filters=256
362
+ size=1
363
+ stride=1
364
+ pad=1
365
+ activation=silu
366
+
367
+
368
+ # P4
369
+
370
+ # Downsample
371
+
372
+ [convolutional]
373
+ batch_normalize=1
374
+ filters=384
375
+ size=3
376
+ stride=2
377
+ pad=1
378
+ activation=silu
379
+
380
+ # Split
381
+
382
+ [convolutional]
383
+ batch_normalize=1
384
+ filters=192
385
+ size=1
386
+ stride=1
387
+ pad=1
388
+ activation=silu
389
+
390
+ [route]
391
+ layers = -2
392
+
393
+ [convolutional]
394
+ batch_normalize=1
395
+ filters=192
396
+ size=1
397
+ stride=1
398
+ pad=1
399
+ activation=silu
400
+
401
+ # Residual Block
402
+
403
+ [convolutional]
404
+ batch_normalize=1
405
+ filters=192
406
+ size=1
407
+ stride=1
408
+ pad=1
409
+ activation=silu
410
+
411
+ [convolutional]
412
+ batch_normalize=1
413
+ filters=192
414
+ size=3
415
+ stride=1
416
+ pad=1
417
+ activation=silu
418
+
419
+ [shortcut]
420
+ from=-3
421
+ activation=linear
422
+
423
+ [convolutional]
424
+ batch_normalize=1
425
+ filters=192
426
+ size=1
427
+ stride=1
428
+ pad=1
429
+ activation=silu
430
+
431
+ [convolutional]
432
+ batch_normalize=1
433
+ filters=192
434
+ size=3
435
+ stride=1
436
+ pad=1
437
+ activation=silu
438
+
439
+ [shortcut]
440
+ from=-3
441
+ activation=linear
442
+
443
+ [convolutional]
444
+ batch_normalize=1
445
+ filters=192
446
+ size=1
447
+ stride=1
448
+ pad=1
449
+ activation=silu
450
+
451
+ [convolutional]
452
+ batch_normalize=1
453
+ filters=192
454
+ size=3
455
+ stride=1
456
+ pad=1
457
+ activation=silu
458
+
459
+ [shortcut]
460
+ from=-3
461
+ activation=linear
462
+
463
+ [convolutional]
464
+ batch_normalize=1
465
+ filters=192
466
+ size=1
467
+ stride=1
468
+ pad=1
469
+ activation=silu
470
+
471
+ [convolutional]
472
+ batch_normalize=1
473
+ filters=192
474
+ size=3
475
+ stride=1
476
+ pad=1
477
+ activation=silu
478
+
479
+ [shortcut]
480
+ from=-3
481
+ activation=linear
482
+
483
+ [convolutional]
484
+ batch_normalize=1
485
+ filters=192
486
+ size=1
487
+ stride=1
488
+ pad=1
489
+ activation=silu
490
+
491
+ [convolutional]
492
+ batch_normalize=1
493
+ filters=192
494
+ size=3
495
+ stride=1
496
+ pad=1
497
+ activation=silu
498
+
499
+ [shortcut]
500
+ from=-3
501
+ activation=linear
502
+
503
+ [convolutional]
504
+ batch_normalize=1
505
+ filters=192
506
+ size=1
507
+ stride=1
508
+ pad=1
509
+ activation=silu
510
+
511
+ [convolutional]
512
+ batch_normalize=1
513
+ filters=192
514
+ size=3
515
+ stride=1
516
+ pad=1
517
+ activation=silu
518
+
519
+ [shortcut]
520
+ from=-3
521
+ activation=linear
522
+
523
+ [convolutional]
524
+ batch_normalize=1
525
+ filters=192
526
+ size=1
527
+ stride=1
528
+ pad=1
529
+ activation=silu
530
+
531
+ [convolutional]
532
+ batch_normalize=1
533
+ filters=192
534
+ size=3
535
+ stride=1
536
+ pad=1
537
+ activation=silu
538
+
539
+ [shortcut]
540
+ from=-3
541
+ activation=linear
542
+
543
+ # Transition first
544
+ #
545
+ #[convolutional]
546
+ #batch_normalize=1
547
+ #filters=192
548
+ #size=1
549
+ #stride=1
550
+ #pad=1
551
+ #activation=silu
552
+
553
+ # Merge [-1, -(3k+3)]
554
+
555
+ [route]
556
+ layers = -1,-24
557
+
558
+ # Transition last
559
+
560
+ # 70 (previous+6+3k)
561
+ [convolutional]
562
+ batch_normalize=1
563
+ filters=384
564
+ size=1
565
+ stride=1
566
+ pad=1
567
+ activation=silu
568
+
569
+
570
+ # P5
571
+
572
+ # Downsample
573
+
574
+ [convolutional]
575
+ batch_normalize=1
576
+ filters=512
577
+ size=3
578
+ stride=2
579
+ pad=1
580
+ activation=silu
581
+
582
+ # Split
583
+
584
+ [convolutional]
585
+ batch_normalize=1
586
+ filters=256
587
+ size=1
588
+ stride=1
589
+ pad=1
590
+ activation=silu
591
+
592
+ [route]
593
+ layers = -2
594
+
595
+ [convolutional]
596
+ batch_normalize=1
597
+ filters=256
598
+ size=1
599
+ stride=1
600
+ pad=1
601
+ activation=silu
602
+
603
+ # Residual Block
604
+
605
+ [convolutional]
606
+ batch_normalize=1
607
+ filters=256
608
+ size=1
609
+ stride=1
610
+ pad=1
611
+ activation=silu
612
+
613
+ [convolutional]
614
+ batch_normalize=1
615
+ filters=256
616
+ size=3
617
+ stride=1
618
+ pad=1
619
+ activation=silu
620
+
621
+ [shortcut]
622
+ from=-3
623
+ activation=linear
624
+
625
+ [convolutional]
626
+ batch_normalize=1
627
+ filters=256
628
+ size=1
629
+ stride=1
630
+ pad=1
631
+ activation=silu
632
+
633
+ [convolutional]
634
+ batch_normalize=1
635
+ filters=256
636
+ size=3
637
+ stride=1
638
+ pad=1
639
+ activation=silu
640
+
641
+ [shortcut]
642
+ from=-3
643
+ activation=linear
644
+
645
+ [convolutional]
646
+ batch_normalize=1
647
+ filters=256
648
+ size=1
649
+ stride=1
650
+ pad=1
651
+ activation=silu
652
+
653
+ [convolutional]
654
+ batch_normalize=1
655
+ filters=256
656
+ size=3
657
+ stride=1
658
+ pad=1
659
+ activation=silu
660
+
661
+ [shortcut]
662
+ from=-3
663
+ activation=linear
664
+
665
+ # Transition first
666
+ #
667
+ #[convolutional]
668
+ #batch_normalize=1
669
+ #filters=256
670
+ #size=1
671
+ #stride=1
672
+ #pad=1
673
+ #activation=silu
674
+
675
+ # Merge [-1, -(3k+3)]
676
+
677
+ [route]
678
+ layers = -1,-12
679
+
680
+ # Transition last
681
+
682
+ # 85 (previous+6+3k)
683
+ [convolutional]
684
+ batch_normalize=1
685
+ filters=512
686
+ size=1
687
+ stride=1
688
+ pad=1
689
+ activation=silu
690
+
691
+
692
+ # P6
693
+
694
+ # Downsample
695
+
696
+ [convolutional]
697
+ batch_normalize=1
698
+ filters=640
699
+ size=3
700
+ stride=2
701
+ pad=1
702
+ activation=silu
703
+
704
+ # Split
705
+
706
+ [convolutional]
707
+ batch_normalize=1
708
+ filters=320
709
+ size=1
710
+ stride=1
711
+ pad=1
712
+ activation=silu
713
+
714
+ [route]
715
+ layers = -2
716
+
717
+ [convolutional]
718
+ batch_normalize=1
719
+ filters=320
720
+ size=1
721
+ stride=1
722
+ pad=1
723
+ activation=silu
724
+
725
+ # Residual Block
726
+
727
+ [convolutional]
728
+ batch_normalize=1
729
+ filters=320
730
+ size=1
731
+ stride=1
732
+ pad=1
733
+ activation=silu
734
+
735
+ [convolutional]
736
+ batch_normalize=1
737
+ filters=320
738
+ size=3
739
+ stride=1
740
+ pad=1
741
+ activation=silu
742
+
743
+ [shortcut]
744
+ from=-3
745
+ activation=linear
746
+
747
+ [convolutional]
748
+ batch_normalize=1
749
+ filters=320
750
+ size=1
751
+ stride=1
752
+ pad=1
753
+ activation=silu
754
+
755
+ [convolutional]
756
+ batch_normalize=1
757
+ filters=320
758
+ size=3
759
+ stride=1
760
+ pad=1
761
+ activation=silu
762
+
763
+ [shortcut]
764
+ from=-3
765
+ activation=linear
766
+
767
+ [convolutional]
768
+ batch_normalize=1
769
+ filters=320
770
+ size=1
771
+ stride=1
772
+ pad=1
773
+ activation=silu
774
+
775
+ [convolutional]
776
+ batch_normalize=1
777
+ filters=320
778
+ size=3
779
+ stride=1
780
+ pad=1
781
+ activation=silu
782
+
783
+ [shortcut]
784
+ from=-3
785
+ activation=linear
786
+
787
+ # Transition first
788
+ #
789
+ #[convolutional]
790
+ #batch_normalize=1
791
+ #filters=320
792
+ #size=1
793
+ #stride=1
794
+ #pad=1
795
+ #activation=silu
796
+
797
+ # Merge [-1, -(3k+3)]
798
+
799
+ [route]
800
+ layers = -1,-12
801
+
802
+ # Transition last
803
+
804
+ # 100 (previous+6+3k)
805
+ [convolutional]
806
+ batch_normalize=1
807
+ filters=640
808
+ size=1
809
+ stride=1
810
+ pad=1
811
+ activation=silu
812
+
813
+ # ============ End of Backbone ============ #
814
+
815
+ # ============ Neck ============ #
816
+
817
+ # CSPSPP
818
+
819
+ [convolutional]
820
+ batch_normalize=1
821
+ filters=320
822
+ size=1
823
+ stride=1
824
+ pad=1
825
+ activation=silu
826
+
827
+ [route]
828
+ layers = -2
829
+
830
+ [convolutional]
831
+ batch_normalize=1
832
+ filters=320
833
+ size=1
834
+ stride=1
835
+ pad=1
836
+ activation=silu
837
+
838
+ [convolutional]
839
+ batch_normalize=1
840
+ size=3
841
+ stride=1
842
+ pad=1
843
+ filters=320
844
+ activation=silu
845
+
846
+ [convolutional]
847
+ batch_normalize=1
848
+ filters=320
849
+ size=1
850
+ stride=1
851
+ pad=1
852
+ activation=silu
853
+
854
+ ### SPP ###
855
+ [maxpool]
856
+ stride=1
857
+ size=5
858
+
859
+ [route]
860
+ layers=-2
861
+
862
+ [maxpool]
863
+ stride=1
864
+ size=9
865
+
866
+ [route]
867
+ layers=-4
868
+
869
+ [maxpool]
870
+ stride=1
871
+ size=13
872
+
873
+ [route]
874
+ layers=-1,-3,-5,-6
875
+ ### End SPP ###
876
+
877
+ [convolutional]
878
+ batch_normalize=1
879
+ filters=320
880
+ size=1
881
+ stride=1
882
+ pad=1
883
+ activation=silu
884
+
885
+ [convolutional]
886
+ batch_normalize=1
887
+ size=3
888
+ stride=1
889
+ pad=1
890
+ filters=320
891
+ activation=silu
892
+
893
+ [route]
894
+ layers = -1, -13
895
+
896
+ # 115 (previous+6+5+2k)
897
+ [convolutional]
898
+ batch_normalize=1
899
+ filters=320
900
+ size=1
901
+ stride=1
902
+ pad=1
903
+ activation=silu
904
+
905
+ # End of CSPSPP
906
+
907
+
908
+ # FPN-5
909
+
910
+ [convolutional]
911
+ batch_normalize=1
912
+ filters=256
913
+ size=1
914
+ stride=1
915
+ pad=1
916
+ activation=silu
917
+
918
+ [upsample]
919
+ stride=2
920
+
921
+ [route]
922
+ layers = 85
923
+
924
+ [convolutional]
925
+ batch_normalize=1
926
+ filters=256
927
+ size=1
928
+ stride=1
929
+ pad=1
930
+ activation=silu
931
+
932
+ [route]
933
+ layers = -1, -3
934
+
935
+ [convolutional]
936
+ batch_normalize=1
937
+ filters=256
938
+ size=1
939
+ stride=1
940
+ pad=1
941
+ activation=silu
942
+
943
+ # Split
944
+
945
+ [convolutional]
946
+ batch_normalize=1
947
+ filters=256
948
+ size=1
949
+ stride=1
950
+ pad=1
951
+ activation=silu
952
+
953
+ [route]
954
+ layers = -2
955
+
956
+ # Plain Block
957
+
958
+ [convolutional]
959
+ batch_normalize=1
960
+ filters=256
961
+ size=1
962
+ stride=1
963
+ pad=1
964
+ activation=silu
965
+
966
+ [convolutional]
967
+ batch_normalize=1
968
+ size=3
969
+ stride=1
970
+ pad=1
971
+ filters=256
972
+ activation=silu
973
+
974
+ [convolutional]
975
+ batch_normalize=1
976
+ filters=256
977
+ size=1
978
+ stride=1
979
+ pad=1
980
+ activation=silu
981
+
982
+ [convolutional]
983
+ batch_normalize=1
984
+ size=3
985
+ stride=1
986
+ pad=1
987
+ filters=256
988
+ activation=silu
989
+
990
+ [convolutional]
991
+ batch_normalize=1
992
+ filters=256
993
+ size=1
994
+ stride=1
995
+ pad=1
996
+ activation=silu
997
+
998
+ [convolutional]
999
+ batch_normalize=1
1000
+ size=3
1001
+ stride=1
1002
+ pad=1
1003
+ filters=256
1004
+ activation=silu
1005
+
1006
+ # Merge [-1, -(2k+2)]
1007
+
1008
+ [route]
1009
+ layers = -1, -8
1010
+
1011
+ # Transition last
1012
+
1013
+ # 131 (previous+6+4+2k)
1014
+ [convolutional]
1015
+ batch_normalize=1
1016
+ filters=256
1017
+ size=1
1018
+ stride=1
1019
+ pad=1
1020
+ activation=silu
1021
+
1022
+
1023
+ # FPN-4
1024
+
1025
+ [convolutional]
1026
+ batch_normalize=1
1027
+ filters=192
1028
+ size=1
1029
+ stride=1
1030
+ pad=1
1031
+ activation=silu
1032
+
1033
+ [upsample]
1034
+ stride=2
1035
+
1036
+ [route]
1037
+ layers = 70
1038
+
1039
+ [convolutional]
1040
+ batch_normalize=1
1041
+ filters=192
1042
+ size=1
1043
+ stride=1
1044
+ pad=1
1045
+ activation=silu
1046
+
1047
+ [route]
1048
+ layers = -1, -3
1049
+
1050
+ [convolutional]
1051
+ batch_normalize=1
1052
+ filters=192
1053
+ size=1
1054
+ stride=1
1055
+ pad=1
1056
+ activation=silu
1057
+
1058
+ # Split
1059
+
1060
+ [convolutional]
1061
+ batch_normalize=1
1062
+ filters=192
1063
+ size=1
1064
+ stride=1
1065
+ pad=1
1066
+ activation=silu
1067
+
1068
+ [route]
1069
+ layers = -2
1070
+
1071
+ # Plain Block
1072
+
1073
+ [convolutional]
1074
+ batch_normalize=1
1075
+ filters=192
1076
+ size=1
1077
+ stride=1
1078
+ pad=1
1079
+ activation=silu
1080
+
1081
+ [convolutional]
1082
+ batch_normalize=1
1083
+ size=3
1084
+ stride=1
1085
+ pad=1
1086
+ filters=192
1087
+ activation=silu
1088
+
1089
+ [convolutional]
1090
+ batch_normalize=1
1091
+ filters=192
1092
+ size=1
1093
+ stride=1
1094
+ pad=1
1095
+ activation=silu
1096
+
1097
+ [convolutional]
1098
+ batch_normalize=1
1099
+ size=3
1100
+ stride=1
1101
+ pad=1
1102
+ filters=192
1103
+ activation=silu
1104
+
1105
+ [convolutional]
1106
+ batch_normalize=1
1107
+ filters=192
1108
+ size=1
1109
+ stride=1
1110
+ pad=1
1111
+ activation=silu
1112
+
1113
+ [convolutional]
1114
+ batch_normalize=1
1115
+ size=3
1116
+ stride=1
1117
+ pad=1
1118
+ filters=192
1119
+ activation=silu
1120
+
1121
+ # Merge [-1, -(2k+2)]
1122
+
1123
+ [route]
1124
+ layers = -1, -8
1125
+
1126
+ # Transition last
1127
+
1128
+ # 147 (previous+6+4+2k)
1129
+ [convolutional]
1130
+ batch_normalize=1
1131
+ filters=192
1132
+ size=1
1133
+ stride=1
1134
+ pad=1
1135
+ activation=silu
1136
+
1137
+
1138
+ # FPN-3
1139
+
1140
+ [convolutional]
1141
+ batch_normalize=1
1142
+ filters=128
1143
+ size=1
1144
+ stride=1
1145
+ pad=1
1146
+ activation=silu
1147
+
1148
+ [upsample]
1149
+ stride=2
1150
+
1151
+ [route]
1152
+ layers = 43
1153
+
1154
+ [convolutional]
1155
+ batch_normalize=1
1156
+ filters=128
1157
+ size=1
1158
+ stride=1
1159
+ pad=1
1160
+ activation=silu
1161
+
1162
+ [route]
1163
+ layers = -1, -3
1164
+
1165
+ [convolutional]
1166
+ batch_normalize=1
1167
+ filters=128
1168
+ size=1
1169
+ stride=1
1170
+ pad=1
1171
+ activation=silu
1172
+
1173
+ # Split
1174
+
1175
+ [convolutional]
1176
+ batch_normalize=1
1177
+ filters=128
1178
+ size=1
1179
+ stride=1
1180
+ pad=1
1181
+ activation=silu
1182
+
1183
+ [route]
1184
+ layers = -2
1185
+
1186
+ # Plain Block
1187
+
1188
+ [convolutional]
1189
+ batch_normalize=1
1190
+ filters=128
1191
+ size=1
1192
+ stride=1
1193
+ pad=1
1194
+ activation=silu
1195
+
1196
+ [convolutional]
1197
+ batch_normalize=1
1198
+ size=3
1199
+ stride=1
1200
+ pad=1
1201
+ filters=128
1202
+ activation=silu
1203
+
1204
+ [convolutional]
1205
+ batch_normalize=1
1206
+ filters=128
1207
+ size=1
1208
+ stride=1
1209
+ pad=1
1210
+ activation=silu
1211
+
1212
+ [convolutional]
1213
+ batch_normalize=1
1214
+ size=3
1215
+ stride=1
1216
+ pad=1
1217
+ filters=128
1218
+ activation=silu
1219
+
1220
+ [convolutional]
1221
+ batch_normalize=1
1222
+ filters=128
1223
+ size=1
1224
+ stride=1
1225
+ pad=1
1226
+ activation=silu
1227
+
1228
+ [convolutional]
1229
+ batch_normalize=1
1230
+ size=3
1231
+ stride=1
1232
+ pad=1
1233
+ filters=128
1234
+ activation=silu
1235
+
1236
+ # Merge [-1, -(2k+2)]
1237
+
1238
+ [route]
1239
+ layers = -1, -8
1240
+
1241
+ # Transition last
1242
+
1243
+ # 163 (previous+6+4+2k)
1244
+ [convolutional]
1245
+ batch_normalize=1
1246
+ filters=128
1247
+ size=1
1248
+ stride=1
1249
+ pad=1
1250
+ activation=silu
1251
+
1252
+
1253
+ # PAN-4
1254
+
1255
+ [convolutional]
1256
+ batch_normalize=1
1257
+ size=3
1258
+ stride=2
1259
+ pad=1
1260
+ filters=192
1261
+ activation=silu
1262
+
1263
+ [route]
1264
+ layers = -1, 147
1265
+
1266
+ [convolutional]
1267
+ batch_normalize=1
1268
+ filters=192
1269
+ size=1
1270
+ stride=1
1271
+ pad=1
1272
+ activation=silu
1273
+
1274
+ # Split
1275
+
1276
+ [convolutional]
1277
+ batch_normalize=1
1278
+ filters=192
1279
+ size=1
1280
+ stride=1
1281
+ pad=1
1282
+ activation=silu
1283
+
1284
+ [route]
1285
+ layers = -2
1286
+
1287
+ # Plain Block
1288
+
1289
+ [convolutional]
1290
+ batch_normalize=1
1291
+ filters=192
1292
+ size=1
1293
+ stride=1
1294
+ pad=1
1295
+ activation=silu
1296
+
1297
+ [convolutional]
1298
+ batch_normalize=1
1299
+ size=3
1300
+ stride=1
1301
+ pad=1
1302
+ filters=192
1303
+ activation=silu
1304
+
1305
+ [convolutional]
1306
+ batch_normalize=1
1307
+ filters=192
1308
+ size=1
1309
+ stride=1
1310
+ pad=1
1311
+ activation=silu
1312
+
1313
+ [convolutional]
1314
+ batch_normalize=1
1315
+ size=3
1316
+ stride=1
1317
+ pad=1
1318
+ filters=192
1319
+ activation=silu
1320
+
1321
+ [convolutional]
1322
+ batch_normalize=1
1323
+ filters=192
1324
+ size=1
1325
+ stride=1
1326
+ pad=1
1327
+ activation=silu
1328
+
1329
+ [convolutional]
1330
+ batch_normalize=1
1331
+ size=3
1332
+ stride=1
1333
+ pad=1
1334
+ filters=192
1335
+ activation=silu
1336
+
1337
+ [route]
1338
+ layers = -1,-8
1339
+
1340
+ # Transition last
1341
+
1342
+ # 176 (previous+3+4+2k)
1343
+ [convolutional]
1344
+ batch_normalize=1
1345
+ filters=192
1346
+ size=1
1347
+ stride=1
1348
+ pad=1
1349
+ activation=silu
1350
+
1351
+
1352
+ # PAN-5
1353
+
1354
+ [convolutional]
1355
+ batch_normalize=1
1356
+ size=3
1357
+ stride=2
1358
+ pad=1
1359
+ filters=256
1360
+ activation=silu
1361
+
1362
+ [route]
1363
+ layers = -1, 131
1364
+
1365
+ [convolutional]
1366
+ batch_normalize=1
1367
+ filters=256
1368
+ size=1
1369
+ stride=1
1370
+ pad=1
1371
+ activation=silu
1372
+
1373
+ # Split
1374
+
1375
+ [convolutional]
1376
+ batch_normalize=1
1377
+ filters=256
1378
+ size=1
1379
+ stride=1
1380
+ pad=1
1381
+ activation=silu
1382
+
1383
+ [route]
1384
+ layers = -2
1385
+
1386
+ # Plain Block
1387
+
1388
+ [convolutional]
1389
+ batch_normalize=1
1390
+ filters=256
1391
+ size=1
1392
+ stride=1
1393
+ pad=1
1394
+ activation=silu
1395
+
1396
+ [convolutional]
1397
+ batch_normalize=1
1398
+ size=3
1399
+ stride=1
1400
+ pad=1
1401
+ filters=256
1402
+ activation=silu
1403
+
1404
+ [convolutional]
1405
+ batch_normalize=1
1406
+ filters=256
1407
+ size=1
1408
+ stride=1
1409
+ pad=1
1410
+ activation=silu
1411
+
1412
+ [convolutional]
1413
+ batch_normalize=1
1414
+ size=3
1415
+ stride=1
1416
+ pad=1
1417
+ filters=256
1418
+ activation=silu
1419
+
1420
+ [convolutional]
1421
+ batch_normalize=1
1422
+ filters=256
1423
+ size=1
1424
+ stride=1
1425
+ pad=1
1426
+ activation=silu
1427
+
1428
+ [convolutional]
1429
+ batch_normalize=1
1430
+ size=3
1431
+ stride=1
1432
+ pad=1
1433
+ filters=256
1434
+ activation=silu
1435
+
1436
+ [route]
1437
+ layers = -1,-8
1438
+
1439
+ # Transition last
1440
+
1441
+ # 189 (previous+3+4+2k)
1442
+ [convolutional]
1443
+ batch_normalize=1
1444
+ filters=256
1445
+ size=1
1446
+ stride=1
1447
+ pad=1
1448
+ activation=silu
1449
+
1450
+
1451
+ # PAN-6
1452
+
1453
+ [convolutional]
1454
+ batch_normalize=1
1455
+ size=3
1456
+ stride=2
1457
+ pad=1
1458
+ filters=320
1459
+ activation=silu
1460
+
1461
+ [route]
1462
+ layers = -1, 115
1463
+
1464
+ [convolutional]
1465
+ batch_normalize=1
1466
+ filters=320
1467
+ size=1
1468
+ stride=1
1469
+ pad=1
1470
+ activation=silu
1471
+
1472
+ # Split
1473
+
1474
+ [convolutional]
1475
+ batch_normalize=1
1476
+ filters=320
1477
+ size=1
1478
+ stride=1
1479
+ pad=1
1480
+ activation=silu
1481
+
1482
+ [route]
1483
+ layers = -2
1484
+
1485
+ # Plain Block
1486
+
1487
+ [convolutional]
1488
+ batch_normalize=1
1489
+ filters=320
1490
+ size=1
1491
+ stride=1
1492
+ pad=1
1493
+ activation=silu
1494
+
1495
+ [convolutional]
1496
+ batch_normalize=1
1497
+ size=3
1498
+ stride=1
1499
+ pad=1
1500
+ filters=320
1501
+ activation=silu
1502
+
1503
+ [convolutional]
1504
+ batch_normalize=1
1505
+ filters=320
1506
+ size=1
1507
+ stride=1
1508
+ pad=1
1509
+ activation=silu
1510
+
1511
+ [convolutional]
1512
+ batch_normalize=1
1513
+ size=3
1514
+ stride=1
1515
+ pad=1
1516
+ filters=320
1517
+ activation=silu
1518
+
1519
+ [convolutional]
1520
+ batch_normalize=1
1521
+ filters=320
1522
+ size=1
1523
+ stride=1
1524
+ pad=1
1525
+ activation=silu
1526
+
1527
+ [convolutional]
1528
+ batch_normalize=1
1529
+ size=3
1530
+ stride=1
1531
+ pad=1
1532
+ filters=320
1533
+ activation=silu
1534
+
1535
+ [route]
1536
+ layers = -1,-8
1537
+
1538
+ # Transition last
1539
+
1540
+ # 202 (previous+3+4+2k)
1541
+ [convolutional]
1542
+ batch_normalize=1
1543
+ filters=320
1544
+ size=1
1545
+ stride=1
1546
+ pad=1
1547
+ activation=silu
1548
+
1549
+ # ============ End of Neck ============ #
1550
+
1551
+ # 203
1552
+ [implicit_add]
1553
+ filters=256
1554
+
1555
+ # 204
1556
+ [implicit_add]
1557
+ filters=384
1558
+
1559
+ # 205
1560
+ [implicit_add]
1561
+ filters=512
1562
+
1563
+ # 206
1564
+ [implicit_add]
1565
+ filters=640
1566
+
1567
+ # 207
1568
+ [implicit_mul]
1569
+ filters=255
1570
+
1571
+ # 208
1572
+ [implicit_mul]
1573
+ filters=255
1574
+
1575
+ # 209
1576
+ [implicit_mul]
1577
+ filters=255
1578
+
1579
+ # 210
1580
+ [implicit_mul]
1581
+ filters=255
1582
+
1583
+ # ============ Head ============ #
1584
+
1585
+ # YOLO-3
1586
+
1587
+ [route]
1588
+ layers = 163
1589
+
1590
+ [convolutional]
1591
+ batch_normalize=1
1592
+ size=3
1593
+ stride=1
1594
+ pad=1
1595
+ filters=256
1596
+ activation=silu
1597
+
1598
+ [shift_channels]
1599
+ from=203
1600
+
1601
+ [convolutional]
1602
+ size=1
1603
+ stride=1
1604
+ pad=1
1605
+ filters=255
1606
+ activation=linear
1607
+
1608
+ [control_channels]
1609
+ from=207
1610
+
1611
+ [yolo]
1612
+ mask = 0,1,2
1613
+ anchors = 19,27, 44,40, 38,94, 96,68, 86,152, 180,137, 140,301, 303,264, 238,542, 436,615, 739,380, 925,792
1614
+ classes=80
1615
+ num=12
1616
+ jitter=.3
1617
+ ignore_thresh = .7
1618
+ truth_thresh = 1
1619
+ random=1
1620
+ scale_x_y = 1.05
1621
+ iou_thresh=0.213
1622
+ cls_normalizer=1.0
1623
+ iou_normalizer=0.07
1624
+ iou_loss=ciou
1625
+ nms_kind=greedynms
1626
+ beta_nms=0.6
1627
+
1628
+
1629
+ # YOLO-4
1630
+
1631
+ [route]
1632
+ layers = 176
1633
+
1634
+ [convolutional]
1635
+ batch_normalize=1
1636
+ size=3
1637
+ stride=1
1638
+ pad=1
1639
+ filters=384
1640
+ activation=silu
1641
+
1642
+ [shift_channels]
1643
+ from=204
1644
+
1645
+ [convolutional]
1646
+ size=1
1647
+ stride=1
1648
+ pad=1
1649
+ filters=255
1650
+ activation=linear
1651
+
1652
+ [control_channels]
1653
+ from=208
1654
+
1655
+ [yolo]
1656
+ mask = 3,4,5
1657
+ anchors = 19,27, 44,40, 38,94, 96,68, 86,152, 180,137, 140,301, 303,264, 238,542, 436,615, 739,380, 925,792
1658
+ classes=80
1659
+ num=12
1660
+ jitter=.3
1661
+ ignore_thresh = .7
1662
+ truth_thresh = 1
1663
+ random=1
1664
+ scale_x_y = 1.05
1665
+ iou_thresh=0.213
1666
+ cls_normalizer=1.0
1667
+ iou_normalizer=0.07
1668
+ iou_loss=ciou
1669
+ nms_kind=greedynms
1670
+ beta_nms=0.6
1671
+
1672
+
1673
+ # YOLO-5
1674
+
1675
+ [route]
1676
+ layers = 189
1677
+
1678
+ [convolutional]
1679
+ batch_normalize=1
1680
+ size=3
1681
+ stride=1
1682
+ pad=1
1683
+ filters=512
1684
+ activation=silu
1685
+
1686
+ [shift_channels]
1687
+ from=205
1688
+
1689
+ [convolutional]
1690
+ size=1
1691
+ stride=1
1692
+ pad=1
1693
+ filters=255
1694
+ activation=linear
1695
+
1696
+ [control_channels]
1697
+ from=209
1698
+
1699
+ [yolo]
1700
+ mask = 6,7,8
1701
+ anchors = 19,27, 44,40, 38,94, 96,68, 86,152, 180,137, 140,301, 303,264, 238,542, 436,615, 739,380, 925,792
1702
+ classes=80
1703
+ num=12
1704
+ jitter=.3
1705
+ ignore_thresh = .7
1706
+ truth_thresh = 1
1707
+ random=1
1708
+ scale_x_y = 1.05
1709
+ iou_thresh=0.213
1710
+ cls_normalizer=1.0
1711
+ iou_normalizer=0.07
1712
+ iou_loss=ciou
1713
+ nms_kind=greedynms
1714
+ beta_nms=0.6
1715
+
1716
+
1717
+ # YOLO-6
1718
+
1719
+ [route]
1720
+ layers = 202
1721
+
1722
+ [convolutional]
1723
+ batch_normalize=1
1724
+ size=3
1725
+ stride=1
1726
+ pad=1
1727
+ filters=640
1728
+ activation=silu
1729
+
1730
+ [shift_channels]
1731
+ from=206
1732
+
1733
+ [convolutional]
1734
+ size=1
1735
+ stride=1
1736
+ pad=1
1737
+ filters=255
1738
+ activation=linear
1739
+
1740
+ [control_channels]
1741
+ from=210
1742
+
1743
+ [yolo]
1744
+ mask = 9,10,11
1745
+ anchors = 19,27, 44,40, 38,94, 96,68, 86,152, 180,137, 140,301, 303,264, 238,542, 436,615, 739,380, 925,792
1746
+ classes=80
1747
+ num=12
1748
+ jitter=.3
1749
+ ignore_thresh = .7
1750
+ truth_thresh = 1
1751
+ random=1
1752
+ scale_x_y = 1.05
1753
+ iou_thresh=0.213
1754
+ cls_normalizer=1.0
1755
+ iou_normalizer=0.07
1756
+ iou_loss=ciou
1757
+ nms_kind=greedynms
1758
+ beta_nms=0.6
1759
+
1760
+ # ============ End of Head ============ #
cfg/yolor_w6.cfg ADDED
@@ -0,0 +1,1760 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [net]
2
+ batch=64
3
+ subdivisions=8
4
+ width=1280
5
+ height=1280
6
+ channels=3
7
+ momentum=0.949
8
+ decay=0.0005
9
+ angle=0
10
+ saturation = 1.5
11
+ exposure = 1.5
12
+ hue=.1
13
+
14
+ learning_rate=0.00261
15
+ burn_in=1000
16
+ max_batches = 500500
17
+ policy=steps
18
+ steps=400000,450000
19
+ scales=.1,.1
20
+
21
+ mosaic=1
22
+
23
+
24
+ # ============ Backbone ============ #
25
+
26
+ # Stem
27
+
28
+ # P1
29
+
30
+ # Downsample
31
+
32
+ # 0
33
+ [reorg]
34
+
35
+ [convolutional]
36
+ batch_normalize=1
37
+ filters=64
38
+ size=3
39
+ stride=1
40
+ pad=1
41
+ activation=silu
42
+
43
+
44
+ # P2
45
+
46
+ # Downsample
47
+
48
+ [convolutional]
49
+ batch_normalize=1
50
+ filters=128
51
+ size=3
52
+ stride=2
53
+ pad=1
54
+ activation=silu
55
+
56
+ # Split
57
+
58
+ [convolutional]
59
+ batch_normalize=1
60
+ filters=64
61
+ size=1
62
+ stride=1
63
+ pad=1
64
+ activation=silu
65
+
66
+ [route]
67
+ layers = -2
68
+
69
+ [convolutional]
70
+ batch_normalize=1
71
+ filters=64
72
+ size=1
73
+ stride=1
74
+ pad=1
75
+ activation=silu
76
+
77
+ # Residual Block
78
+
79
+ [convolutional]
80
+ batch_normalize=1
81
+ filters=64
82
+ size=1
83
+ stride=1
84
+ pad=1
85
+ activation=silu
86
+
87
+ [convolutional]
88
+ batch_normalize=1
89
+ filters=64
90
+ size=3
91
+ stride=1
92
+ pad=1
93
+ activation=silu
94
+
95
+ [shortcut]
96
+ from=-3
97
+ activation=linear
98
+
99
+ [convolutional]
100
+ batch_normalize=1
101
+ filters=64
102
+ size=1
103
+ stride=1
104
+ pad=1
105
+ activation=silu
106
+
107
+ [convolutional]
108
+ batch_normalize=1
109
+ filters=64
110
+ size=3
111
+ stride=1
112
+ pad=1
113
+ activation=silu
114
+
115
+ [shortcut]
116
+ from=-3
117
+ activation=linear
118
+
119
+ [convolutional]
120
+ batch_normalize=1
121
+ filters=64
122
+ size=1
123
+ stride=1
124
+ pad=1
125
+ activation=silu
126
+
127
+ [convolutional]
128
+ batch_normalize=1
129
+ filters=64
130
+ size=3
131
+ stride=1
132
+ pad=1
133
+ activation=silu
134
+
135
+ [shortcut]
136
+ from=-3
137
+ activation=linear
138
+
139
+ # Transition first
140
+ #
141
+ #[convolutional]
142
+ #batch_normalize=1
143
+ #filters=64
144
+ #size=1
145
+ #stride=1
146
+ #pad=1
147
+ #activation=silu
148
+
149
+ # Merge [-1, -(3k+3)]
150
+
151
+ [route]
152
+ layers = -1,-12
153
+
154
+ # Transition last
155
+
156
+ # 16 (previous+6+3k)
157
+ [convolutional]
158
+ batch_normalize=1
159
+ filters=128
160
+ size=1
161
+ stride=1
162
+ pad=1
163
+ activation=silu
164
+
165
+
166
+ # P3
167
+
168
+ # Downsample
169
+
170
+ [convolutional]
171
+ batch_normalize=1
172
+ filters=256
173
+ size=3
174
+ stride=2
175
+ pad=1
176
+ activation=silu
177
+
178
+ # Split
179
+
180
+ [convolutional]
181
+ batch_normalize=1
182
+ filters=128
183
+ size=1
184
+ stride=1
185
+ pad=1
186
+ activation=silu
187
+
188
+ [route]
189
+ layers = -2
190
+
191
+ [convolutional]
192
+ batch_normalize=1
193
+ filters=128
194
+ size=1
195
+ stride=1
196
+ pad=1
197
+ activation=silu
198
+
199
+ # Residual Block
200
+
201
+ [convolutional]
202
+ batch_normalize=1
203
+ filters=128
204
+ size=1
205
+ stride=1
206
+ pad=1
207
+ activation=silu
208
+
209
+ [convolutional]
210
+ batch_normalize=1
211
+ filters=128
212
+ size=3
213
+ stride=1
214
+ pad=1
215
+ activation=silu
216
+
217
+ [shortcut]
218
+ from=-3
219
+ activation=linear
220
+
221
+ [convolutional]
222
+ batch_normalize=1
223
+ filters=128
224
+ size=1
225
+ stride=1
226
+ pad=1
227
+ activation=silu
228
+
229
+ [convolutional]
230
+ batch_normalize=1
231
+ filters=128
232
+ size=3
233
+ stride=1
234
+ pad=1
235
+ activation=silu
236
+
237
+ [shortcut]
238
+ from=-3
239
+ activation=linear
240
+
241
+ [convolutional]
242
+ batch_normalize=1
243
+ filters=128
244
+ size=1
245
+ stride=1
246
+ pad=1
247
+ activation=silu
248
+
249
+ [convolutional]
250
+ batch_normalize=1
251
+ filters=128
252
+ size=3
253
+ stride=1
254
+ pad=1
255
+ activation=silu
256
+
257
+ [shortcut]
258
+ from=-3
259
+ activation=linear
260
+
261
+ [convolutional]
262
+ batch_normalize=1
263
+ filters=128
264
+ size=1
265
+ stride=1
266
+ pad=1
267
+ activation=silu
268
+
269
+ [convolutional]
270
+ batch_normalize=1
271
+ filters=128
272
+ size=3
273
+ stride=1
274
+ pad=1
275
+ activation=silu
276
+
277
+ [shortcut]
278
+ from=-3
279
+ activation=linear
280
+
281
+ [convolutional]
282
+ batch_normalize=1
283
+ filters=128
284
+ size=1
285
+ stride=1
286
+ pad=1
287
+ activation=silu
288
+
289
+ [convolutional]
290
+ batch_normalize=1
291
+ filters=128
292
+ size=3
293
+ stride=1
294
+ pad=1
295
+ activation=silu
296
+
297
+ [shortcut]
298
+ from=-3
299
+ activation=linear
300
+
301
+ [convolutional]
302
+ batch_normalize=1
303
+ filters=128
304
+ size=1
305
+ stride=1
306
+ pad=1
307
+ activation=silu
308
+
309
+ [convolutional]
310
+ batch_normalize=1
311
+ filters=128
312
+ size=3
313
+ stride=1
314
+ pad=1
315
+ activation=silu
316
+
317
+ [shortcut]
318
+ from=-3
319
+ activation=linear
320
+
321
+ [convolutional]
322
+ batch_normalize=1
323
+ filters=128
324
+ size=1
325
+ stride=1
326
+ pad=1
327
+ activation=silu
328
+
329
+ [convolutional]
330
+ batch_normalize=1
331
+ filters=128
332
+ size=3
333
+ stride=1
334
+ pad=1
335
+ activation=silu
336
+
337
+ [shortcut]
338
+ from=-3
339
+ activation=linear
340
+
341
+ # Transition first
342
+ #
343
+ #[convolutional]
344
+ #batch_normalize=1
345
+ #filters=128
346
+ #size=1
347
+ #stride=1
348
+ #pad=1
349
+ #activation=silu
350
+
351
+ # Merge [-1, -(3k+3)]
352
+
353
+ [route]
354
+ layers = -1,-24
355
+
356
+ # Transition last
357
+
358
+ # 43 (previous+6+3k)
359
+ [convolutional]
360
+ batch_normalize=1
361
+ filters=256
362
+ size=1
363
+ stride=1
364
+ pad=1
365
+ activation=silu
366
+
367
+
368
+ # P4
369
+
370
+ # Downsample
371
+
372
+ [convolutional]
373
+ batch_normalize=1
374
+ filters=512
375
+ size=3
376
+ stride=2
377
+ pad=1
378
+ activation=silu
379
+
380
+ # Split
381
+
382
+ [convolutional]
383
+ batch_normalize=1
384
+ filters=256
385
+ size=1
386
+ stride=1
387
+ pad=1
388
+ activation=silu
389
+
390
+ [route]
391
+ layers = -2
392
+
393
+ [convolutional]
394
+ batch_normalize=1
395
+ filters=256
396
+ size=1
397
+ stride=1
398
+ pad=1
399
+ activation=silu
400
+
401
+ # Residual Block
402
+
403
+ [convolutional]
404
+ batch_normalize=1
405
+ filters=256
406
+ size=1
407
+ stride=1
408
+ pad=1
409
+ activation=silu
410
+
411
+ [convolutional]
412
+ batch_normalize=1
413
+ filters=256
414
+ size=3
415
+ stride=1
416
+ pad=1
417
+ activation=silu
418
+
419
+ [shortcut]
420
+ from=-3
421
+ activation=linear
422
+
423
+ [convolutional]
424
+ batch_normalize=1
425
+ filters=256
426
+ size=1
427
+ stride=1
428
+ pad=1
429
+ activation=silu
430
+
431
+ [convolutional]
432
+ batch_normalize=1
433
+ filters=256
434
+ size=3
435
+ stride=1
436
+ pad=1
437
+ activation=silu
438
+
439
+ [shortcut]
440
+ from=-3
441
+ activation=linear
442
+
443
+ [convolutional]
444
+ batch_normalize=1
445
+ filters=256
446
+ size=1
447
+ stride=1
448
+ pad=1
449
+ activation=silu
450
+
451
+ [convolutional]
452
+ batch_normalize=1
453
+ filters=256
454
+ size=3
455
+ stride=1
456
+ pad=1
457
+ activation=silu
458
+
459
+ [shortcut]
460
+ from=-3
461
+ activation=linear
462
+
463
+ [convolutional]
464
+ batch_normalize=1
465
+ filters=256
466
+ size=1
467
+ stride=1
468
+ pad=1
469
+ activation=silu
470
+
471
+ [convolutional]
472
+ batch_normalize=1
473
+ filters=256
474
+ size=3
475
+ stride=1
476
+ pad=1
477
+ activation=silu
478
+
479
+ [shortcut]
480
+ from=-3
481
+ activation=linear
482
+
483
+ [convolutional]
484
+ batch_normalize=1
485
+ filters=256
486
+ size=1
487
+ stride=1
488
+ pad=1
489
+ activation=silu
490
+
491
+ [convolutional]
492
+ batch_normalize=1
493
+ filters=256
494
+ size=3
495
+ stride=1
496
+ pad=1
497
+ activation=silu
498
+
499
+ [shortcut]
500
+ from=-3
501
+ activation=linear
502
+
503
+ [convolutional]
504
+ batch_normalize=1
505
+ filters=256
506
+ size=1
507
+ stride=1
508
+ pad=1
509
+ activation=silu
510
+
511
+ [convolutional]
512
+ batch_normalize=1
513
+ filters=256
514
+ size=3
515
+ stride=1
516
+ pad=1
517
+ activation=silu
518
+
519
+ [shortcut]
520
+ from=-3
521
+ activation=linear
522
+
523
+ [convolutional]
524
+ batch_normalize=1
525
+ filters=256
526
+ size=1
527
+ stride=1
528
+ pad=1
529
+ activation=silu
530
+
531
+ [convolutional]
532
+ batch_normalize=1
533
+ filters=256
534
+ size=3
535
+ stride=1
536
+ pad=1
537
+ activation=silu
538
+
539
+ [shortcut]
540
+ from=-3
541
+ activation=linear
542
+
543
+ # Transition first
544
+ #
545
+ #[convolutional]
546
+ #batch_normalize=1
547
+ #filters=256
548
+ #size=1
549
+ #stride=1
550
+ #pad=1
551
+ #activation=silu
552
+
553
+ # Merge [-1, -(3k+3)]
554
+
555
+ [route]
556
+ layers = -1,-24
557
+
558
+ # Transition last
559
+
560
+ # 70 (previous+6+3k)
561
+ [convolutional]
562
+ batch_normalize=1
563
+ filters=512
564
+ size=1
565
+ stride=1
566
+ pad=1
567
+ activation=silu
568
+
569
+
570
+ # P5
571
+
572
+ # Downsample
573
+
574
+ [convolutional]
575
+ batch_normalize=1
576
+ filters=768
577
+ size=3
578
+ stride=2
579
+ pad=1
580
+ activation=silu
581
+
582
+ # Split
583
+
584
+ [convolutional]
585
+ batch_normalize=1
586
+ filters=384
587
+ size=1
588
+ stride=1
589
+ pad=1
590
+ activation=silu
591
+
592
+ [route]
593
+ layers = -2
594
+
595
+ [convolutional]
596
+ batch_normalize=1
597
+ filters=384
598
+ size=1
599
+ stride=1
600
+ pad=1
601
+ activation=silu
602
+
603
+ # Residual Block
604
+
605
+ [convolutional]
606
+ batch_normalize=1
607
+ filters=384
608
+ size=1
609
+ stride=1
610
+ pad=1
611
+ activation=silu
612
+
613
+ [convolutional]
614
+ batch_normalize=1
615
+ filters=384
616
+ size=3
617
+ stride=1
618
+ pad=1
619
+ activation=silu
620
+
621
+ [shortcut]
622
+ from=-3
623
+ activation=linear
624
+
625
+ [convolutional]
626
+ batch_normalize=1
627
+ filters=384
628
+ size=1
629
+ stride=1
630
+ pad=1
631
+ activation=silu
632
+
633
+ [convolutional]
634
+ batch_normalize=1
635
+ filters=384
636
+ size=3
637
+ stride=1
638
+ pad=1
639
+ activation=silu
640
+
641
+ [shortcut]
642
+ from=-3
643
+ activation=linear
644
+
645
+ [convolutional]
646
+ batch_normalize=1
647
+ filters=384
648
+ size=1
649
+ stride=1
650
+ pad=1
651
+ activation=silu
652
+
653
+ [convolutional]
654
+ batch_normalize=1
655
+ filters=384
656
+ size=3
657
+ stride=1
658
+ pad=1
659
+ activation=silu
660
+
661
+ [shortcut]
662
+ from=-3
663
+ activation=linear
664
+
665
+ # Transition first
666
+ #
667
+ #[convolutional]
668
+ #batch_normalize=1
669
+ #filters=384
670
+ #size=1
671
+ #stride=1
672
+ #pad=1
673
+ #activation=silu
674
+
675
+ # Merge [-1, -(3k+3)]
676
+
677
+ [route]
678
+ layers = -1,-12
679
+
680
+ # Transition last
681
+
682
+ # 85 (previous+6+3k)
683
+ [convolutional]
684
+ batch_normalize=1
685
+ filters=768
686
+ size=1
687
+ stride=1
688
+ pad=1
689
+ activation=silu
690
+
691
+
692
+ # P6
693
+
694
+ # Downsample
695
+
696
+ [convolutional]
697
+ batch_normalize=1
698
+ filters=1024
699
+ size=3
700
+ stride=2
701
+ pad=1
702
+ activation=silu
703
+
704
+ # Split
705
+
706
+ [convolutional]
707
+ batch_normalize=1
708
+ filters=512
709
+ size=1
710
+ stride=1
711
+ pad=1
712
+ activation=silu
713
+
714
+ [route]
715
+ layers = -2
716
+
717
+ [convolutional]
718
+ batch_normalize=1
719
+ filters=512
720
+ size=1
721
+ stride=1
722
+ pad=1
723
+ activation=silu
724
+
725
+ # Residual Block
726
+
727
+ [convolutional]
728
+ batch_normalize=1
729
+ filters=512
730
+ size=1
731
+ stride=1
732
+ pad=1
733
+ activation=silu
734
+
735
+ [convolutional]
736
+ batch_normalize=1
737
+ filters=512
738
+ size=3
739
+ stride=1
740
+ pad=1
741
+ activation=silu
742
+
743
+ [shortcut]
744
+ from=-3
745
+ activation=linear
746
+
747
+ [convolutional]
748
+ batch_normalize=1
749
+ filters=512
750
+ size=1
751
+ stride=1
752
+ pad=1
753
+ activation=silu
754
+
755
+ [convolutional]
756
+ batch_normalize=1
757
+ filters=512
758
+ size=3
759
+ stride=1
760
+ pad=1
761
+ activation=silu
762
+
763
+ [shortcut]
764
+ from=-3
765
+ activation=linear
766
+
767
+ [convolutional]
768
+ batch_normalize=1
769
+ filters=512
770
+ size=1
771
+ stride=1
772
+ pad=1
773
+ activation=silu
774
+
775
+ [convolutional]
776
+ batch_normalize=1
777
+ filters=512
778
+ size=3
779
+ stride=1
780
+ pad=1
781
+ activation=silu
782
+
783
+ [shortcut]
784
+ from=-3
785
+ activation=linear
786
+
787
+ # Transition first
788
+ #
789
+ #[convolutional]
790
+ #batch_normalize=1
791
+ #filters=512
792
+ #size=1
793
+ #stride=1
794
+ #pad=1
795
+ #activation=silu
796
+
797
+ # Merge [-1, -(3k+3)]
798
+
799
+ [route]
800
+ layers = -1,-12
801
+
802
+ # Transition last
803
+
804
+ # 100 (previous+6+3k)
805
+ [convolutional]
806
+ batch_normalize=1
807
+ filters=1024
808
+ size=1
809
+ stride=1
810
+ pad=1
811
+ activation=silu
812
+
813
+ # ============ End of Backbone ============ #
814
+
815
+ # ============ Neck ============ #
816
+
817
+ # CSPSPP
818
+
819
+ [convolutional]
820
+ batch_normalize=1
821
+ filters=512
822
+ size=1
823
+ stride=1
824
+ pad=1
825
+ activation=silu
826
+
827
+ [route]
828
+ layers = -2
829
+
830
+ [convolutional]
831
+ batch_normalize=1
832
+ filters=512
833
+ size=1
834
+ stride=1
835
+ pad=1
836
+ activation=silu
837
+
838
+ [convolutional]
839
+ batch_normalize=1
840
+ size=3
841
+ stride=1
842
+ pad=1
843
+ filters=512
844
+ activation=silu
845
+
846
+ [convolutional]
847
+ batch_normalize=1
848
+ filters=512
849
+ size=1
850
+ stride=1
851
+ pad=1
852
+ activation=silu
853
+
854
+ ### SPP ###
855
+ [maxpool]
856
+ stride=1
857
+ size=5
858
+
859
+ [route]
860
+ layers=-2
861
+
862
+ [maxpool]
863
+ stride=1
864
+ size=9
865
+
866
+ [route]
867
+ layers=-4
868
+
869
+ [maxpool]
870
+ stride=1
871
+ size=13
872
+
873
+ [route]
874
+ layers=-1,-3,-5,-6
875
+ ### End SPP ###
876
+
877
+ [convolutional]
878
+ batch_normalize=1
879
+ filters=512
880
+ size=1
881
+ stride=1
882
+ pad=1
883
+ activation=silu
884
+
885
+ [convolutional]
886
+ batch_normalize=1
887
+ size=3
888
+ stride=1
889
+ pad=1
890
+ filters=512
891
+ activation=silu
892
+
893
+ [route]
894
+ layers = -1, -13
895
+
896
+ # 115 (previous+6+5+2k)
897
+ [convolutional]
898
+ batch_normalize=1
899
+ filters=512
900
+ size=1
901
+ stride=1
902
+ pad=1
903
+ activation=silu
904
+
905
+ # End of CSPSPP
906
+
907
+
908
+ # FPN-5
909
+
910
+ [convolutional]
911
+ batch_normalize=1
912
+ filters=384
913
+ size=1
914
+ stride=1
915
+ pad=1
916
+ activation=silu
917
+
918
+ [upsample]
919
+ stride=2
920
+
921
+ [route]
922
+ layers = 85
923
+
924
+ [convolutional]
925
+ batch_normalize=1
926
+ filters=384
927
+ size=1
928
+ stride=1
929
+ pad=1
930
+ activation=silu
931
+
932
+ [route]
933
+ layers = -1, -3
934
+
935
+ [convolutional]
936
+ batch_normalize=1
937
+ filters=384
938
+ size=1
939
+ stride=1
940
+ pad=1
941
+ activation=silu
942
+
943
+ # Split
944
+
945
+ [convolutional]
946
+ batch_normalize=1
947
+ filters=384
948
+ size=1
949
+ stride=1
950
+ pad=1
951
+ activation=silu
952
+
953
+ [route]
954
+ layers = -2
955
+
956
+ # Plain Block
957
+
958
+ [convolutional]
959
+ batch_normalize=1
960
+ filters=384
961
+ size=1
962
+ stride=1
963
+ pad=1
964
+ activation=silu
965
+
966
+ [convolutional]
967
+ batch_normalize=1
968
+ size=3
969
+ stride=1
970
+ pad=1
971
+ filters=384
972
+ activation=silu
973
+
974
+ [convolutional]
975
+ batch_normalize=1
976
+ filters=384
977
+ size=1
978
+ stride=1
979
+ pad=1
980
+ activation=silu
981
+
982
+ [convolutional]
983
+ batch_normalize=1
984
+ size=3
985
+ stride=1
986
+ pad=1
987
+ filters=384
988
+ activation=silu
989
+
990
+ [convolutional]
991
+ batch_normalize=1
992
+ filters=384
993
+ size=1
994
+ stride=1
995
+ pad=1
996
+ activation=silu
997
+
998
+ [convolutional]
999
+ batch_normalize=1
1000
+ size=3
1001
+ stride=1
1002
+ pad=1
1003
+ filters=384
1004
+ activation=silu
1005
+
1006
+ # Merge [-1, -(2k+2)]
1007
+
1008
+ [route]
1009
+ layers = -1, -8
1010
+
1011
+ # Transition last
1012
+
1013
+ # 131 (previous+6+4+2k)
1014
+ [convolutional]
1015
+ batch_normalize=1
1016
+ filters=384
1017
+ size=1
1018
+ stride=1
1019
+ pad=1
1020
+ activation=silu
1021
+
1022
+
1023
+ # FPN-4
1024
+
1025
+ [convolutional]
1026
+ batch_normalize=1
1027
+ filters=256
1028
+ size=1
1029
+ stride=1
1030
+ pad=1
1031
+ activation=silu
1032
+
1033
+ [upsample]
1034
+ stride=2
1035
+
1036
+ [route]
1037
+ layers = 70
1038
+
1039
+ [convolutional]
1040
+ batch_normalize=1
1041
+ filters=256
1042
+ size=1
1043
+ stride=1
1044
+ pad=1
1045
+ activation=silu
1046
+
1047
+ [route]
1048
+ layers = -1, -3
1049
+
1050
+ [convolutional]
1051
+ batch_normalize=1
1052
+ filters=256
1053
+ size=1
1054
+ stride=1
1055
+ pad=1
1056
+ activation=silu
1057
+
1058
+ # Split
1059
+
1060
+ [convolutional]
1061
+ batch_normalize=1
1062
+ filters=256
1063
+ size=1
1064
+ stride=1
1065
+ pad=1
1066
+ activation=silu
1067
+
1068
+ [route]
1069
+ layers = -2
1070
+
1071
+ # Plain Block
1072
+
1073
+ [convolutional]
1074
+ batch_normalize=1
1075
+ filters=256
1076
+ size=1
1077
+ stride=1
1078
+ pad=1
1079
+ activation=silu
1080
+
1081
+ [convolutional]
1082
+ batch_normalize=1
1083
+ size=3
1084
+ stride=1
1085
+ pad=1
1086
+ filters=256
1087
+ activation=silu
1088
+
1089
+ [convolutional]
1090
+ batch_normalize=1
1091
+ filters=256
1092
+ size=1
1093
+ stride=1
1094
+ pad=1
1095
+ activation=silu
1096
+
1097
+ [convolutional]
1098
+ batch_normalize=1
1099
+ size=3
1100
+ stride=1
1101
+ pad=1
1102
+ filters=256
1103
+ activation=silu
1104
+
1105
+ [convolutional]
1106
+ batch_normalize=1
1107
+ filters=256
1108
+ size=1
1109
+ stride=1
1110
+ pad=1
1111
+ activation=silu
1112
+
1113
+ [convolutional]
1114
+ batch_normalize=1
1115
+ size=3
1116
+ stride=1
1117
+ pad=1
1118
+ filters=256
1119
+ activation=silu
1120
+
1121
+ # Merge [-1, -(2k+2)]
1122
+
1123
+ [route]
1124
+ layers = -1, -8
1125
+
1126
+ # Transition last
1127
+
1128
+ # 147 (previous+6+4+2k)
1129
+ [convolutional]
1130
+ batch_normalize=1
1131
+ filters=256
1132
+ size=1
1133
+ stride=1
1134
+ pad=1
1135
+ activation=silu
1136
+
1137
+
1138
+ # FPN-3
1139
+
1140
+ [convolutional]
1141
+ batch_normalize=1
1142
+ filters=128
1143
+ size=1
1144
+ stride=1
1145
+ pad=1
1146
+ activation=silu
1147
+
1148
+ [upsample]
1149
+ stride=2
1150
+
1151
+ [route]
1152
+ layers = 43
1153
+
1154
+ [convolutional]
1155
+ batch_normalize=1
1156
+ filters=128
1157
+ size=1
1158
+ stride=1
1159
+ pad=1
1160
+ activation=silu
1161
+
1162
+ [route]
1163
+ layers = -1, -3
1164
+
1165
+ [convolutional]
1166
+ batch_normalize=1
1167
+ filters=128
1168
+ size=1
1169
+ stride=1
1170
+ pad=1
1171
+ activation=silu
1172
+
1173
+ # Split
1174
+
1175
+ [convolutional]
1176
+ batch_normalize=1
1177
+ filters=128
1178
+ size=1
1179
+ stride=1
1180
+ pad=1
1181
+ activation=silu
1182
+
1183
+ [route]
1184
+ layers = -2
1185
+
1186
+ # Plain Block
1187
+
1188
+ [convolutional]
1189
+ batch_normalize=1
1190
+ filters=128
1191
+ size=1
1192
+ stride=1
1193
+ pad=1
1194
+ activation=silu
1195
+
1196
+ [convolutional]
1197
+ batch_normalize=1
1198
+ size=3
1199
+ stride=1
1200
+ pad=1
1201
+ filters=128
1202
+ activation=silu
1203
+
1204
+ [convolutional]
1205
+ batch_normalize=1
1206
+ filters=128
1207
+ size=1
1208
+ stride=1
1209
+ pad=1
1210
+ activation=silu
1211
+
1212
+ [convolutional]
1213
+ batch_normalize=1
1214
+ size=3
1215
+ stride=1
1216
+ pad=1
1217
+ filters=128
1218
+ activation=silu
1219
+
1220
+ [convolutional]
1221
+ batch_normalize=1
1222
+ filters=128
1223
+ size=1
1224
+ stride=1
1225
+ pad=1
1226
+ activation=silu
1227
+
1228
+ [convolutional]
1229
+ batch_normalize=1
1230
+ size=3
1231
+ stride=1
1232
+ pad=1
1233
+ filters=128
1234
+ activation=silu
1235
+
1236
+ # Merge [-1, -(2k+2)]
1237
+
1238
+ [route]
1239
+ layers = -1, -8
1240
+
1241
+ # Transition last
1242
+
1243
+ # 163 (previous+6+4+2k)
1244
+ [convolutional]
1245
+ batch_normalize=1
1246
+ filters=128
1247
+ size=1
1248
+ stride=1
1249
+ pad=1
1250
+ activation=silu
1251
+
1252
+
1253
+ # PAN-4
1254
+
1255
+ [convolutional]
1256
+ batch_normalize=1
1257
+ size=3
1258
+ stride=2
1259
+ pad=1
1260
+ filters=256
1261
+ activation=silu
1262
+
1263
+ [route]
1264
+ layers = -1, 147
1265
+
1266
+ [convolutional]
1267
+ batch_normalize=1
1268
+ filters=256
1269
+ size=1
1270
+ stride=1
1271
+ pad=1
1272
+ activation=silu
1273
+
1274
+ # Split
1275
+
1276
+ [convolutional]
1277
+ batch_normalize=1
1278
+ filters=256
1279
+ size=1
1280
+ stride=1
1281
+ pad=1
1282
+ activation=silu
1283
+
1284
+ [route]
1285
+ layers = -2
1286
+
1287
+ # Plain Block
1288
+
1289
+ [convolutional]
1290
+ batch_normalize=1
1291
+ filters=256
1292
+ size=1
1293
+ stride=1
1294
+ pad=1
1295
+ activation=silu
1296
+
1297
+ [convolutional]
1298
+ batch_normalize=1
1299
+ size=3
1300
+ stride=1
1301
+ pad=1
1302
+ filters=256
1303
+ activation=silu
1304
+
1305
+ [convolutional]
1306
+ batch_normalize=1
1307
+ filters=256
1308
+ size=1
1309
+ stride=1
1310
+ pad=1
1311
+ activation=silu
1312
+
1313
+ [convolutional]
1314
+ batch_normalize=1
1315
+ size=3
1316
+ stride=1
1317
+ pad=1
1318
+ filters=256
1319
+ activation=silu
1320
+
1321
+ [convolutional]
1322
+ batch_normalize=1
1323
+ filters=256
1324
+ size=1
1325
+ stride=1
1326
+ pad=1
1327
+ activation=silu
1328
+
1329
+ [convolutional]
1330
+ batch_normalize=1
1331
+ size=3
1332
+ stride=1
1333
+ pad=1
1334
+ filters=256
1335
+ activation=silu
1336
+
1337
+ [route]
1338
+ layers = -1,-8
1339
+
1340
+ # Transition last
1341
+
1342
+ # 176 (previous+3+4+2k)
1343
+ [convolutional]
1344
+ batch_normalize=1
1345
+ filters=256
1346
+ size=1
1347
+ stride=1
1348
+ pad=1
1349
+ activation=silu
1350
+
1351
+
1352
+ # PAN-5
1353
+
1354
+ [convolutional]
1355
+ batch_normalize=1
1356
+ size=3
1357
+ stride=2
1358
+ pad=1
1359
+ filters=384
1360
+ activation=silu
1361
+
1362
+ [route]
1363
+ layers = -1, 131
1364
+
1365
+ [convolutional]
1366
+ batch_normalize=1
1367
+ filters=384
1368
+ size=1
1369
+ stride=1
1370
+ pad=1
1371
+ activation=silu
1372
+
1373
+ # Split
1374
+
1375
+ [convolutional]
1376
+ batch_normalize=1
1377
+ filters=384
1378
+ size=1
1379
+ stride=1
1380
+ pad=1
1381
+ activation=silu
1382
+
1383
+ [route]
1384
+ layers = -2
1385
+
1386
+ # Plain Block
1387
+
1388
+ [convolutional]
1389
+ batch_normalize=1
1390
+ filters=384
1391
+ size=1
1392
+ stride=1
1393
+ pad=1
1394
+ activation=silu
1395
+
1396
+ [convolutional]
1397
+ batch_normalize=1
1398
+ size=3
1399
+ stride=1
1400
+ pad=1
1401
+ filters=384
1402
+ activation=silu
1403
+
1404
+ [convolutional]
1405
+ batch_normalize=1
1406
+ filters=384
1407
+ size=1
1408
+ stride=1
1409
+ pad=1
1410
+ activation=silu
1411
+
1412
+ [convolutional]
1413
+ batch_normalize=1
1414
+ size=3
1415
+ stride=1
1416
+ pad=1
1417
+ filters=384
1418
+ activation=silu
1419
+
1420
+ [convolutional]
1421
+ batch_normalize=1
1422
+ filters=384
1423
+ size=1
1424
+ stride=1
1425
+ pad=1
1426
+ activation=silu
1427
+
1428
+ [convolutional]
1429
+ batch_normalize=1
1430
+ size=3
1431
+ stride=1
1432
+ pad=1
1433
+ filters=384
1434
+ activation=silu
1435
+
1436
+ [route]
1437
+ layers = -1,-8
1438
+
1439
+ # Transition last
1440
+
1441
+ # 189 (previous+3+4+2k)
1442
+ [convolutional]
1443
+ batch_normalize=1
1444
+ filters=384
1445
+ size=1
1446
+ stride=1
1447
+ pad=1
1448
+ activation=silu
1449
+
1450
+
1451
+ # PAN-6
1452
+
1453
+ [convolutional]
1454
+ batch_normalize=1
1455
+ size=3
1456
+ stride=2
1457
+ pad=1
1458
+ filters=512
1459
+ activation=silu
1460
+
1461
+ [route]
1462
+ layers = -1, 115
1463
+
1464
+ [convolutional]
1465
+ batch_normalize=1
1466
+ filters=512
1467
+ size=1
1468
+ stride=1
1469
+ pad=1
1470
+ activation=silu
1471
+
1472
+ # Split
1473
+
1474
+ [convolutional]
1475
+ batch_normalize=1
1476
+ filters=512
1477
+ size=1
1478
+ stride=1
1479
+ pad=1
1480
+ activation=silu
1481
+
1482
+ [route]
1483
+ layers = -2
1484
+
1485
+ # Plain Block
1486
+
1487
+ [convolutional]
1488
+ batch_normalize=1
1489
+ filters=512
1490
+ size=1
1491
+ stride=1
1492
+ pad=1
1493
+ activation=silu
1494
+
1495
+ [convolutional]
1496
+ batch_normalize=1
1497
+ size=3
1498
+ stride=1
1499
+ pad=1
1500
+ filters=512
1501
+ activation=silu
1502
+
1503
+ [convolutional]
1504
+ batch_normalize=1
1505
+ filters=512
1506
+ size=1
1507
+ stride=1
1508
+ pad=1
1509
+ activation=silu
1510
+
1511
+ [convolutional]
1512
+ batch_normalize=1
1513
+ size=3
1514
+ stride=1
1515
+ pad=1
1516
+ filters=512
1517
+ activation=silu
1518
+
1519
+ [convolutional]
1520
+ batch_normalize=1
1521
+ filters=512
1522
+ size=1
1523
+ stride=1
1524
+ pad=1
1525
+ activation=silu
1526
+
1527
+ [convolutional]
1528
+ batch_normalize=1
1529
+ size=3
1530
+ stride=1
1531
+ pad=1
1532
+ filters=512
1533
+ activation=silu
1534
+
1535
+ [route]
1536
+ layers = -1,-8
1537
+
1538
+ # Transition last
1539
+
1540
+ # 202 (previous+3+4+2k)
1541
+ [convolutional]
1542
+ batch_normalize=1
1543
+ filters=512
1544
+ size=1
1545
+ stride=1
1546
+ pad=1
1547
+ activation=silu
1548
+
1549
+ # ============ End of Neck ============ #
1550
+
1551
+ # 203
1552
+ [implicit_add]
1553
+ filters=256
1554
+
1555
+ # 204
1556
+ [implicit_add]
1557
+ filters=512
1558
+
1559
+ # 205
1560
+ [implicit_add]
1561
+ filters=768
1562
+
1563
+ # 206
1564
+ [implicit_add]
1565
+ filters=1024
1566
+
1567
+ # 207
1568
+ [implicit_mul]
1569
+ filters=255
1570
+
1571
+ # 208
1572
+ [implicit_mul]
1573
+ filters=255
1574
+
1575
+ # 209
1576
+ [implicit_mul]
1577
+ filters=255
1578
+
1579
+ # 210
1580
+ [implicit_mul]
1581
+ filters=255
1582
+
1583
+ # ============ Head ============ #
1584
+
1585
+ # YOLO-3
1586
+
1587
+ [route]
1588
+ layers = 163
1589
+
1590
+ [convolutional]
1591
+ batch_normalize=1
1592
+ size=3
1593
+ stride=1
1594
+ pad=1
1595
+ filters=256
1596
+ activation=silu
1597
+
1598
+ [shift_channels]
1599
+ from=203
1600
+
1601
+ [convolutional]
1602
+ size=1
1603
+ stride=1
1604
+ pad=1
1605
+ filters=255
1606
+ activation=linear
1607
+
1608
+ [control_channels]
1609
+ from=207
1610
+
1611
+ [yolo]
1612
+ mask = 0,1,2
1613
+ anchors = 19,27, 44,40, 38,94, 96,68, 86,152, 180,137, 140,301, 303,264, 238,542, 436,615, 739,380, 925,792
1614
+ classes=80
1615
+ num=12
1616
+ jitter=.3
1617
+ ignore_thresh = .7
1618
+ truth_thresh = 1
1619
+ random=1
1620
+ scale_x_y = 1.05
1621
+ iou_thresh=0.213
1622
+ cls_normalizer=1.0
1623
+ iou_normalizer=0.07
1624
+ iou_loss=ciou
1625
+ nms_kind=greedynms
1626
+ beta_nms=0.6
1627
+
1628
+
1629
+ # YOLO-4
1630
+
1631
+ [route]
1632
+ layers = 176
1633
+
1634
+ [convolutional]
1635
+ batch_normalize=1
1636
+ size=3
1637
+ stride=1
1638
+ pad=1
1639
+ filters=512
1640
+ activation=silu
1641
+
1642
+ [shift_channels]
1643
+ from=204
1644
+
1645
+ [convolutional]
1646
+ size=1
1647
+ stride=1
1648
+ pad=1
1649
+ filters=255
1650
+ activation=linear
1651
+
1652
+ [control_channels]
1653
+ from=208
1654
+
1655
+ [yolo]
1656
+ mask = 3,4,5
1657
+ anchors = 19,27, 44,40, 38,94, 96,68, 86,152, 180,137, 140,301, 303,264, 238,542, 436,615, 739,380, 925,792
1658
+ classes=80
1659
+ num=12
1660
+ jitter=.3
1661
+ ignore_thresh = .7
1662
+ truth_thresh = 1
1663
+ random=1
1664
+ scale_x_y = 1.05
1665
+ iou_thresh=0.213
1666
+ cls_normalizer=1.0
1667
+ iou_normalizer=0.07
1668
+ iou_loss=ciou
1669
+ nms_kind=greedynms
1670
+ beta_nms=0.6
1671
+
1672
+
1673
+ # YOLO-5
1674
+
1675
+ [route]
1676
+ layers = 189
1677
+
1678
+ [convolutional]
1679
+ batch_normalize=1
1680
+ size=3
1681
+ stride=1
1682
+ pad=1
1683
+ filters=768
1684
+ activation=silu
1685
+
1686
+ [shift_channels]
1687
+ from=205
1688
+
1689
+ [convolutional]
1690
+ size=1
1691
+ stride=1
1692
+ pad=1
1693
+ filters=255
1694
+ activation=linear
1695
+
1696
+ [control_channels]
1697
+ from=209
1698
+
1699
+ [yolo]
1700
+ mask = 6,7,8
1701
+ anchors = 19,27, 44,40, 38,94, 96,68, 86,152, 180,137, 140,301, 303,264, 238,542, 436,615, 739,380, 925,792
1702
+ classes=80
1703
+ num=12
1704
+ jitter=.3
1705
+ ignore_thresh = .7
1706
+ truth_thresh = 1
1707
+ random=1
1708
+ scale_x_y = 1.05
1709
+ iou_thresh=0.213
1710
+ cls_normalizer=1.0
1711
+ iou_normalizer=0.07
1712
+ iou_loss=ciou
1713
+ nms_kind=greedynms
1714
+ beta_nms=0.6
1715
+
1716
+
1717
+ # YOLO-6
1718
+
1719
+ [route]
1720
+ layers = 202
1721
+
1722
+ [convolutional]
1723
+ batch_normalize=1
1724
+ size=3
1725
+ stride=1
1726
+ pad=1
1727
+ filters=1024
1728
+ activation=silu
1729
+
1730
+ [shift_channels]
1731
+ from=206
1732
+
1733
+ [convolutional]
1734
+ size=1
1735
+ stride=1
1736
+ pad=1
1737
+ filters=255
1738
+ activation=linear
1739
+
1740
+ [control_channels]
1741
+ from=210
1742
+
1743
+ [yolo]
1744
+ mask = 9,10,11
1745
+ anchors = 19,27, 44,40, 38,94, 96,68, 86,152, 180,137, 140,301, 303,264, 238,542, 436,615, 739,380, 925,792
1746
+ classes=80
1747
+ num=12
1748
+ jitter=.3
1749
+ ignore_thresh = .7
1750
+ truth_thresh = 1
1751
+ random=1
1752
+ scale_x_y = 1.05
1753
+ iou_thresh=0.213
1754
+ cls_normalizer=1.0
1755
+ iou_normalizer=0.07
1756
+ iou_loss=ciou
1757
+ nms_kind=greedynms
1758
+ beta_nms=0.6
1759
+
1760
+ # ============ End of Head ============ #
cfg/yolov4_csp.cfg ADDED
@@ -0,0 +1,1334 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [net]
2
+ # Testing
3
+ #batch=1
4
+ #subdivisions=1
5
+ # Training
6
+ batch=64
7
+ subdivisions=8
8
+ width=512
9
+ height=512
10
+ channels=3
11
+ momentum=0.949
12
+ decay=0.0005
13
+ angle=0
14
+ saturation = 1.5
15
+ exposure = 1.5
16
+ hue=.1
17
+
18
+ learning_rate=0.00261
19
+ burn_in=1000
20
+ max_batches = 500500
21
+ policy=steps
22
+ steps=400000,450000
23
+ scales=.1,.1
24
+
25
+ #cutmix=1
26
+ mosaic=1
27
+
28
+
29
+ # ============ Backbone ============ #
30
+
31
+ # Stem
32
+
33
+ # 0
34
+ [convolutional]
35
+ batch_normalize=1
36
+ filters=32
37
+ size=3
38
+ stride=1
39
+ pad=1
40
+ activation=silu
41
+
42
+ # P1
43
+
44
+ # Downsample
45
+
46
+ [convolutional]
47
+ batch_normalize=1
48
+ filters=64
49
+ size=3
50
+ stride=2
51
+ pad=1
52
+ activation=silu
53
+
54
+ # Residual Block
55
+
56
+ [convolutional]
57
+ batch_normalize=1
58
+ filters=32
59
+ size=1
60
+ stride=1
61
+ pad=1
62
+ activation=silu
63
+
64
+ [convolutional]
65
+ batch_normalize=1
66
+ filters=64
67
+ size=3
68
+ stride=1
69
+ pad=1
70
+ activation=silu
71
+
72
+ # 4 (previous+1+3k)
73
+ [shortcut]
74
+ from=-3
75
+ activation=linear
76
+
77
+ # P2
78
+
79
+ # Downsample
80
+
81
+ [convolutional]
82
+ batch_normalize=1
83
+ filters=128
84
+ size=3
85
+ stride=2
86
+ pad=1
87
+ activation=silu
88
+
89
+ # Split
90
+
91
+ [convolutional]
92
+ batch_normalize=1
93
+ filters=64
94
+ size=1
95
+ stride=1
96
+ pad=1
97
+ activation=silu
98
+
99
+ [route]
100
+ layers = -2
101
+
102
+ [convolutional]
103
+ batch_normalize=1
104
+ filters=64
105
+ size=1
106
+ stride=1
107
+ pad=1
108
+ activation=silu
109
+
110
+ # Residual Block
111
+
112
+ [convolutional]
113
+ batch_normalize=1
114
+ filters=64
115
+ size=1
116
+ stride=1
117
+ pad=1
118
+ activation=silu
119
+
120
+ [convolutional]
121
+ batch_normalize=1
122
+ filters=64
123
+ size=3
124
+ stride=1
125
+ pad=1
126
+ activation=silu
127
+
128
+ [shortcut]
129
+ from=-3
130
+ activation=linear
131
+
132
+ [convolutional]
133
+ batch_normalize=1
134
+ filters=64
135
+ size=1
136
+ stride=1
137
+ pad=1
138
+ activation=silu
139
+
140
+ [convolutional]
141
+ batch_normalize=1
142
+ filters=64
143
+ size=3
144
+ stride=1
145
+ pad=1
146
+ activation=silu
147
+
148
+ [shortcut]
149
+ from=-3
150
+ activation=linear
151
+
152
+ # Transition first
153
+
154
+ [convolutional]
155
+ batch_normalize=1
156
+ filters=64
157
+ size=1
158
+ stride=1
159
+ pad=1
160
+ activation=silu
161
+
162
+ # Merge [-1, -(3k+4)]
163
+
164
+ [route]
165
+ layers = -1,-10
166
+
167
+ # Transition last
168
+
169
+ # 17 (previous+7+3k)
170
+ [convolutional]
171
+ batch_normalize=1
172
+ filters=128
173
+ size=1
174
+ stride=1
175
+ pad=1
176
+ activation=silu
177
+
178
+ # P3
179
+
180
+ # Downsample
181
+
182
+ [convolutional]
183
+ batch_normalize=1
184
+ filters=256
185
+ size=3
186
+ stride=2
187
+ pad=1
188
+ activation=silu
189
+
190
+ # Split
191
+
192
+ [convolutional]
193
+ batch_normalize=1
194
+ filters=128
195
+ size=1
196
+ stride=1
197
+ pad=1
198
+ activation=silu
199
+
200
+ [route]
201
+ layers = -2
202
+
203
+ [convolutional]
204
+ batch_normalize=1
205
+ filters=128
206
+ size=1
207
+ stride=1
208
+ pad=1
209
+ activation=silu
210
+
211
+ # Residual Block
212
+
213
+ [convolutional]
214
+ batch_normalize=1
215
+ filters=128
216
+ size=1
217
+ stride=1
218
+ pad=1
219
+ activation=silu
220
+
221
+ [convolutional]
222
+ batch_normalize=1
223
+ filters=128
224
+ size=3
225
+ stride=1
226
+ pad=1
227
+ activation=silu
228
+
229
+ [shortcut]
230
+ from=-3
231
+ activation=linear
232
+
233
+ [convolutional]
234
+ batch_normalize=1
235
+ filters=128
236
+ size=1
237
+ stride=1
238
+ pad=1
239
+ activation=silu
240
+
241
+ [convolutional]
242
+ batch_normalize=1
243
+ filters=128
244
+ size=3
245
+ stride=1
246
+ pad=1
247
+ activation=silu
248
+
249
+ [shortcut]
250
+ from=-3
251
+ activation=linear
252
+
253
+ [convolutional]
254
+ batch_normalize=1
255
+ filters=128
256
+ size=1
257
+ stride=1
258
+ pad=1
259
+ activation=silu
260
+
261
+ [convolutional]
262
+ batch_normalize=1
263
+ filters=128
264
+ size=3
265
+ stride=1
266
+ pad=1
267
+ activation=silu
268
+
269
+ [shortcut]
270
+ from=-3
271
+ activation=linear
272
+
273
+ [convolutional]
274
+ batch_normalize=1
275
+ filters=128
276
+ size=1
277
+ stride=1
278
+ pad=1
279
+ activation=silu
280
+
281
+ [convolutional]
282
+ batch_normalize=1
283
+ filters=128
284
+ size=3
285
+ stride=1
286
+ pad=1
287
+ activation=silu
288
+
289
+ [shortcut]
290
+ from=-3
291
+ activation=linear
292
+
293
+ [convolutional]
294
+ batch_normalize=1
295
+ filters=128
296
+ size=1
297
+ stride=1
298
+ pad=1
299
+ activation=silu
300
+
301
+ [convolutional]
302
+ batch_normalize=1
303
+ filters=128
304
+ size=3
305
+ stride=1
306
+ pad=1
307
+ activation=silu
308
+
309
+ [shortcut]
310
+ from=-3
311
+ activation=linear
312
+
313
+ [convolutional]
314
+ batch_normalize=1
315
+ filters=128
316
+ size=1
317
+ stride=1
318
+ pad=1
319
+ activation=silu
320
+
321
+ [convolutional]
322
+ batch_normalize=1
323
+ filters=128
324
+ size=3
325
+ stride=1
326
+ pad=1
327
+ activation=silu
328
+
329
+ [shortcut]
330
+ from=-3
331
+ activation=linear
332
+
333
+ [convolutional]
334
+ batch_normalize=1
335
+ filters=128
336
+ size=1
337
+ stride=1
338
+ pad=1
339
+ activation=silu
340
+
341
+ [convolutional]
342
+ batch_normalize=1
343
+ filters=128
344
+ size=3
345
+ stride=1
346
+ pad=1
347
+ activation=silu
348
+
349
+ [shortcut]
350
+ from=-3
351
+ activation=linear
352
+
353
+ [convolutional]
354
+ batch_normalize=1
355
+ filters=128
356
+ size=1
357
+ stride=1
358
+ pad=1
359
+ activation=silu
360
+
361
+ [convolutional]
362
+ batch_normalize=1
363
+ filters=128
364
+ size=3
365
+ stride=1
366
+ pad=1
367
+ activation=silu
368
+
369
+ [shortcut]
370
+ from=-3
371
+ activation=linear
372
+
373
+ # Transition first
374
+
375
+ [convolutional]
376
+ batch_normalize=1
377
+ filters=128
378
+ size=1
379
+ stride=1
380
+ pad=1
381
+ activation=silu
382
+
383
+ # Merge [-1 -(4+3k)]
384
+
385
+ [route]
386
+ layers = -1,-28
387
+
388
+ # Transition last
389
+
390
+ # 48 (previous+7+3k)
391
+ [convolutional]
392
+ batch_normalize=1
393
+ filters=256
394
+ size=1
395
+ stride=1
396
+ pad=1
397
+ activation=silu
398
+
399
+ # P4
400
+
401
+ # Downsample
402
+
403
+ [convolutional]
404
+ batch_normalize=1
405
+ filters=512
406
+ size=3
407
+ stride=2
408
+ pad=1
409
+ activation=silu
410
+
411
+ # Split
412
+
413
+ [convolutional]
414
+ batch_normalize=1
415
+ filters=256
416
+ size=1
417
+ stride=1
418
+ pad=1
419
+ activation=silu
420
+
421
+ [route]
422
+ layers = -2
423
+
424
+ [convolutional]
425
+ batch_normalize=1
426
+ filters=256
427
+ size=1
428
+ stride=1
429
+ pad=1
430
+ activation=silu
431
+
432
+ # Residual Block
433
+
434
+ [convolutional]
435
+ batch_normalize=1
436
+ filters=256
437
+ size=1
438
+ stride=1
439
+ pad=1
440
+ activation=silu
441
+
442
+ [convolutional]
443
+ batch_normalize=1
444
+ filters=256
445
+ size=3
446
+ stride=1
447
+ pad=1
448
+ activation=silu
449
+
450
+ [shortcut]
451
+ from=-3
452
+ activation=linear
453
+
454
+ [convolutional]
455
+ batch_normalize=1
456
+ filters=256
457
+ size=1
458
+ stride=1
459
+ pad=1
460
+ activation=silu
461
+
462
+ [convolutional]
463
+ batch_normalize=1
464
+ filters=256
465
+ size=3
466
+ stride=1
467
+ pad=1
468
+ activation=silu
469
+
470
+ [shortcut]
471
+ from=-3
472
+ activation=linear
473
+
474
+ [convolutional]
475
+ batch_normalize=1
476
+ filters=256
477
+ size=1
478
+ stride=1
479
+ pad=1
480
+ activation=silu
481
+
482
+ [convolutional]
483
+ batch_normalize=1
484
+ filters=256
485
+ size=3
486
+ stride=1
487
+ pad=1
488
+ activation=silu
489
+
490
+ [shortcut]
491
+ from=-3
492
+ activation=linear
493
+
494
+ [convolutional]
495
+ batch_normalize=1
496
+ filters=256
497
+ size=1
498
+ stride=1
499
+ pad=1
500
+ activation=silu
501
+
502
+ [convolutional]
503
+ batch_normalize=1
504
+ filters=256
505
+ size=3
506
+ stride=1
507
+ pad=1
508
+ activation=silu
509
+
510
+ [shortcut]
511
+ from=-3
512
+ activation=linear
513
+
514
+ [convolutional]
515
+ batch_normalize=1
516
+ filters=256
517
+ size=1
518
+ stride=1
519
+ pad=1
520
+ activation=silu
521
+
522
+ [convolutional]
523
+ batch_normalize=1
524
+ filters=256
525
+ size=3
526
+ stride=1
527
+ pad=1
528
+ activation=silu
529
+
530
+ [shortcut]
531
+ from=-3
532
+ activation=linear
533
+
534
+ [convolutional]
535
+ batch_normalize=1
536
+ filters=256
537
+ size=1
538
+ stride=1
539
+ pad=1
540
+ activation=silu
541
+
542
+ [convolutional]
543
+ batch_normalize=1
544
+ filters=256
545
+ size=3
546
+ stride=1
547
+ pad=1
548
+ activation=silu
549
+
550
+ [shortcut]
551
+ from=-3
552
+ activation=linear
553
+
554
+ [convolutional]
555
+ batch_normalize=1
556
+ filters=256
557
+ size=1
558
+ stride=1
559
+ pad=1
560
+ activation=silu
561
+
562
+ [convolutional]
563
+ batch_normalize=1
564
+ filters=256
565
+ size=3
566
+ stride=1
567
+ pad=1
568
+ activation=silu
569
+
570
+ [shortcut]
571
+ from=-3
572
+ activation=linear
573
+
574
+ [convolutional]
575
+ batch_normalize=1
576
+ filters=256
577
+ size=1
578
+ stride=1
579
+ pad=1
580
+ activation=silu
581
+
582
+ [convolutional]
583
+ batch_normalize=1
584
+ filters=256
585
+ size=3
586
+ stride=1
587
+ pad=1
588
+ activation=silu
589
+
590
+ [shortcut]
591
+ from=-3
592
+ activation=linear
593
+
594
+ # Transition first
595
+
596
+ [convolutional]
597
+ batch_normalize=1
598
+ filters=256
599
+ size=1
600
+ stride=1
601
+ pad=1
602
+ activation=silu
603
+
604
+ # Merge [-1 -(3k+4)]
605
+
606
+ [route]
607
+ layers = -1,-28
608
+
609
+ # Transition last
610
+
611
+ # 79 (previous+7+3k)
612
+ [convolutional]
613
+ batch_normalize=1
614
+ filters=512
615
+ size=1
616
+ stride=1
617
+ pad=1
618
+ activation=silu
619
+
620
+ # P5
621
+
622
+ # Downsample
623
+
624
+ [convolutional]
625
+ batch_normalize=1
626
+ filters=1024
627
+ size=3
628
+ stride=2
629
+ pad=1
630
+ activation=silu
631
+
632
+ # Split
633
+
634
+ [convolutional]
635
+ batch_normalize=1
636
+ filters=512
637
+ size=1
638
+ stride=1
639
+ pad=1
640
+ activation=silu
641
+
642
+ [route]
643
+ layers = -2
644
+
645
+ [convolutional]
646
+ batch_normalize=1
647
+ filters=512
648
+ size=1
649
+ stride=1
650
+ pad=1
651
+ activation=silu
652
+
653
+ # Residual Block
654
+
655
+ [convolutional]
656
+ batch_normalize=1
657
+ filters=512
658
+ size=1
659
+ stride=1
660
+ pad=1
661
+ activation=silu
662
+
663
+ [convolutional]
664
+ batch_normalize=1
665
+ filters=512
666
+ size=3
667
+ stride=1
668
+ pad=1
669
+ activation=silu
670
+
671
+ [shortcut]
672
+ from=-3
673
+ activation=linear
674
+
675
+ [convolutional]
676
+ batch_normalize=1
677
+ filters=512
678
+ size=1
679
+ stride=1
680
+ pad=1
681
+ activation=silu
682
+
683
+ [convolutional]
684
+ batch_normalize=1
685
+ filters=512
686
+ size=3
687
+ stride=1
688
+ pad=1
689
+ activation=silu
690
+
691
+ [shortcut]
692
+ from=-3
693
+ activation=linear
694
+
695
+ [convolutional]
696
+ batch_normalize=1
697
+ filters=512
698
+ size=1
699
+ stride=1
700
+ pad=1
701
+ activation=silu
702
+
703
+ [convolutional]
704
+ batch_normalize=1
705
+ filters=512
706
+ size=3
707
+ stride=1
708
+ pad=1
709
+ activation=silu
710
+
711
+ [shortcut]
712
+ from=-3
713
+ activation=linear
714
+
715
+ [convolutional]
716
+ batch_normalize=1
717
+ filters=512
718
+ size=1
719
+ stride=1
720
+ pad=1
721
+ activation=silu
722
+
723
+ [convolutional]
724
+ batch_normalize=1
725
+ filters=512
726
+ size=3
727
+ stride=1
728
+ pad=1
729
+ activation=silu
730
+
731
+ [shortcut]
732
+ from=-3
733
+ activation=linear
734
+
735
+ # Transition first
736
+
737
+ [convolutional]
738
+ batch_normalize=1
739
+ filters=512
740
+ size=1
741
+ stride=1
742
+ pad=1
743
+ activation=silu
744
+
745
+ # Merge [-1 -(3k+4)]
746
+
747
+ [route]
748
+ layers = -1,-16
749
+
750
+ # Transition last
751
+
752
+ # 98 (previous+7+3k)
753
+ [convolutional]
754
+ batch_normalize=1
755
+ filters=1024
756
+ size=1
757
+ stride=1
758
+ pad=1
759
+ activation=silu
760
+
761
+ # ============ End of Backbone ============ #
762
+
763
+ # ============ Neck ============ #
764
+
765
+ # CSPSPP
766
+
767
+ [convolutional]
768
+ batch_normalize=1
769
+ filters=512
770
+ size=1
771
+ stride=1
772
+ pad=1
773
+ activation=silu
774
+
775
+ [route]
776
+ layers = -2
777
+
778
+ [convolutional]
779
+ batch_normalize=1
780
+ filters=512
781
+ size=1
782
+ stride=1
783
+ pad=1
784
+ activation=silu
785
+
786
+ [convolutional]
787
+ batch_normalize=1
788
+ size=3
789
+ stride=1
790
+ pad=1
791
+ filters=512
792
+ activation=silu
793
+
794
+ [convolutional]
795
+ batch_normalize=1
796
+ filters=512
797
+ size=1
798
+ stride=1
799
+ pad=1
800
+ activation=silu
801
+
802
+ ### SPP ###
803
+ [maxpool]
804
+ stride=1
805
+ size=5
806
+
807
+ [route]
808
+ layers=-2
809
+
810
+ [maxpool]
811
+ stride=1
812
+ size=9
813
+
814
+ [route]
815
+ layers=-4
816
+
817
+ [maxpool]
818
+ stride=1
819
+ size=13
820
+
821
+ [route]
822
+ layers=-1,-3,-5,-6
823
+ ### End SPP ###
824
+
825
+ [convolutional]
826
+ batch_normalize=1
827
+ filters=512
828
+ size=1
829
+ stride=1
830
+ pad=1
831
+ activation=silu
832
+
833
+ [convolutional]
834
+ batch_normalize=1
835
+ size=3
836
+ stride=1
837
+ pad=1
838
+ filters=512
839
+ activation=silu
840
+
841
+ [route]
842
+ layers = -1, -13
843
+
844
+ # 113 (previous+6+5+2k)
845
+ [convolutional]
846
+ batch_normalize=1
847
+ filters=512
848
+ size=1
849
+ stride=1
850
+ pad=1
851
+ activation=silu
852
+
853
+ # End of CSPSPP
854
+
855
+
856
+ # FPN-4
857
+
858
+ [convolutional]
859
+ batch_normalize=1
860
+ filters=256
861
+ size=1
862
+ stride=1
863
+ pad=1
864
+ activation=silu
865
+
866
+ [upsample]
867
+ stride=2
868
+
869
+ [route]
870
+ layers = 79
871
+
872
+ [convolutional]
873
+ batch_normalize=1
874
+ filters=256
875
+ size=1
876
+ stride=1
877
+ pad=1
878
+ activation=silu
879
+
880
+ [route]
881
+ layers = -1, -3
882
+
883
+ [convolutional]
884
+ batch_normalize=1
885
+ filters=256
886
+ size=1
887
+ stride=1
888
+ pad=1
889
+ activation=silu
890
+
891
+ # Split
892
+
893
+ [convolutional]
894
+ batch_normalize=1
895
+ filters=256
896
+ size=1
897
+ stride=1
898
+ pad=1
899
+ activation=silu
900
+
901
+ [route]
902
+ layers = -2
903
+
904
+ # Plain Block
905
+
906
+ [convolutional]
907
+ batch_normalize=1
908
+ filters=256
909
+ size=1
910
+ stride=1
911
+ pad=1
912
+ activation=silu
913
+
914
+ [convolutional]
915
+ batch_normalize=1
916
+ size=3
917
+ stride=1
918
+ pad=1
919
+ filters=256
920
+ activation=silu
921
+
922
+ [convolutional]
923
+ batch_normalize=1
924
+ filters=256
925
+ size=1
926
+ stride=1
927
+ pad=1
928
+ activation=silu
929
+
930
+ [convolutional]
931
+ batch_normalize=1
932
+ size=3
933
+ stride=1
934
+ pad=1
935
+ filters=256
936
+ activation=silu
937
+
938
+ # Merge [-1, -(2k+2)]
939
+
940
+ [route]
941
+ layers = -1, -6
942
+
943
+ # Transition last
944
+
945
+ # 127 (previous+6+4+2k)
946
+ [convolutional]
947
+ batch_normalize=1
948
+ filters=256
949
+ size=1
950
+ stride=1
951
+ pad=1
952
+ activation=silu
953
+
954
+
955
+ # FPN-3
956
+
957
+ [convolutional]
958
+ batch_normalize=1
959
+ filters=128
960
+ size=1
961
+ stride=1
962
+ pad=1
963
+ activation=silu
964
+
965
+ [upsample]
966
+ stride=2
967
+
968
+ [route]
969
+ layers = 48
970
+
971
+ [convolutional]
972
+ batch_normalize=1
973
+ filters=128
974
+ size=1
975
+ stride=1
976
+ pad=1
977
+ activation=silu
978
+
979
+ [route]
980
+ layers = -1, -3
981
+
982
+ [convolutional]
983
+ batch_normalize=1
984
+ filters=128
985
+ size=1
986
+ stride=1
987
+ pad=1
988
+ activation=silu
989
+
990
+ # Split
991
+
992
+ [convolutional]
993
+ batch_normalize=1
994
+ filters=128
995
+ size=1
996
+ stride=1
997
+ pad=1
998
+ activation=silu
999
+
1000
+ [route]
1001
+ layers = -2
1002
+
1003
+ # Plain Block
1004
+
1005
+ [convolutional]
1006
+ batch_normalize=1
1007
+ filters=128
1008
+ size=1
1009
+ stride=1
1010
+ pad=1
1011
+ activation=silu
1012
+
1013
+ [convolutional]
1014
+ batch_normalize=1
1015
+ size=3
1016
+ stride=1
1017
+ pad=1
1018
+ filters=128
1019
+ activation=silu
1020
+
1021
+ [convolutional]
1022
+ batch_normalize=1
1023
+ filters=128
1024
+ size=1
1025
+ stride=1
1026
+ pad=1
1027
+ activation=silu
1028
+
1029
+ [convolutional]
1030
+ batch_normalize=1
1031
+ size=3
1032
+ stride=1
1033
+ pad=1
1034
+ filters=128
1035
+ activation=silu
1036
+
1037
+ # Merge [-1, -(2k+2)]
1038
+
1039
+ [route]
1040
+ layers = -1, -6
1041
+
1042
+ # Transition last
1043
+
1044
+ # 141 (previous+6+4+2k)
1045
+ [convolutional]
1046
+ batch_normalize=1
1047
+ filters=128
1048
+ size=1
1049
+ stride=1
1050
+ pad=1
1051
+ activation=silu
1052
+
1053
+
1054
+ # PAN-4
1055
+
1056
+ [convolutional]
1057
+ batch_normalize=1
1058
+ size=3
1059
+ stride=2
1060
+ pad=1
1061
+ filters=256
1062
+ activation=silu
1063
+
1064
+ [route]
1065
+ layers = -1, 127
1066
+
1067
+ [convolutional]
1068
+ batch_normalize=1
1069
+ filters=256
1070
+ size=1
1071
+ stride=1
1072
+ pad=1
1073
+ activation=silu
1074
+
1075
+ # Split
1076
+
1077
+ [convolutional]
1078
+ batch_normalize=1
1079
+ filters=256
1080
+ size=1
1081
+ stride=1
1082
+ pad=1
1083
+ activation=silu
1084
+
1085
+ [route]
1086
+ layers = -2
1087
+
1088
+ # Plain Block
1089
+
1090
+ [convolutional]
1091
+ batch_normalize=1
1092
+ filters=256
1093
+ size=1
1094
+ stride=1
1095
+ pad=1
1096
+ activation=silu
1097
+
1098
+ [convolutional]
1099
+ batch_normalize=1
1100
+ size=3
1101
+ stride=1
1102
+ pad=1
1103
+ filters=256
1104
+ activation=silu
1105
+
1106
+ [convolutional]
1107
+ batch_normalize=1
1108
+ filters=256
1109
+ size=1
1110
+ stride=1
1111
+ pad=1
1112
+ activation=silu
1113
+
1114
+ [convolutional]
1115
+ batch_normalize=1
1116
+ size=3
1117
+ stride=1
1118
+ pad=1
1119
+ filters=256
1120
+ activation=silu
1121
+
1122
+ [route]
1123
+ layers = -1,-6
1124
+
1125
+ # Transition last
1126
+
1127
+ # 152 (previous+3+4+2k)
1128
+ [convolutional]
1129
+ batch_normalize=1
1130
+ filters=256
1131
+ size=1
1132
+ stride=1
1133
+ pad=1
1134
+ activation=silu
1135
+
1136
+
1137
+ # PAN-5
1138
+
1139
+ [convolutional]
1140
+ batch_normalize=1
1141
+ size=3
1142
+ stride=2
1143
+ pad=1
1144
+ filters=512
1145
+ activation=silu
1146
+
1147
+ [route]
1148
+ layers = -1, 113
1149
+
1150
+ [convolutional]
1151
+ batch_normalize=1
1152
+ filters=512
1153
+ size=1
1154
+ stride=1
1155
+ pad=1
1156
+ activation=silu
1157
+
1158
+ # Split
1159
+
1160
+ [convolutional]
1161
+ batch_normalize=1
1162
+ filters=512
1163
+ size=1
1164
+ stride=1
1165
+ pad=1
1166
+ activation=silu
1167
+
1168
+ [route]
1169
+ layers = -2
1170
+
1171
+ # Plain Block
1172
+
1173
+ [convolutional]
1174
+ batch_normalize=1
1175
+ filters=512
1176
+ size=1
1177
+ stride=1
1178
+ pad=1
1179
+ activation=silu
1180
+
1181
+ [convolutional]
1182
+ batch_normalize=1
1183
+ size=3
1184
+ stride=1
1185
+ pad=1
1186
+ filters=512
1187
+ activation=silu
1188
+
1189
+ [convolutional]
1190
+ batch_normalize=1
1191
+ filters=512
1192
+ size=1
1193
+ stride=1
1194
+ pad=1
1195
+ activation=silu
1196
+
1197
+ [convolutional]
1198
+ batch_normalize=1
1199
+ size=3
1200
+ stride=1
1201
+ pad=1
1202
+ filters=512
1203
+ activation=silu
1204
+
1205
+ [route]
1206
+ layers = -1,-6
1207
+
1208
+ # Transition last
1209
+
1210
+ # 163 (previous+3+4+2k)
1211
+ [convolutional]
1212
+ batch_normalize=1
1213
+ filters=512
1214
+ size=1
1215
+ stride=1
1216
+ pad=1
1217
+ activation=silu
1218
+
1219
+ # ============ End of Neck ============ #
1220
+
1221
+ # ============ Head ============ #
1222
+
1223
+ # YOLO-3
1224
+
1225
+ [route]
1226
+ layers = 141
1227
+
1228
+ [convolutional]
1229
+ batch_normalize=1
1230
+ size=3
1231
+ stride=1
1232
+ pad=1
1233
+ filters=256
1234
+ activation=silu
1235
+
1236
+ [convolutional]
1237
+ size=1
1238
+ stride=1
1239
+ pad=1
1240
+ filters=255
1241
+ activation=linear
1242
+
1243
+ [yolo]
1244
+ mask = 0,1,2
1245
+ anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401
1246
+ classes=80
1247
+ num=9
1248
+ jitter=.3
1249
+ ignore_thresh = .7
1250
+ truth_thresh = 1
1251
+ random=1
1252
+ scale_x_y = 1.05
1253
+ iou_thresh=0.213
1254
+ cls_normalizer=1.0
1255
+ iou_normalizer=0.07
1256
+ iou_loss=ciou
1257
+ nms_kind=greedynms
1258
+ beta_nms=0.6
1259
+
1260
+
1261
+ # YOLO-4
1262
+
1263
+ [route]
1264
+ layers = 152
1265
+
1266
+ [convolutional]
1267
+ batch_normalize=1
1268
+ size=3
1269
+ stride=1
1270
+ pad=1
1271
+ filters=512
1272
+ activation=silu
1273
+
1274
+ [convolutional]
1275
+ size=1
1276
+ stride=1
1277
+ pad=1
1278
+ filters=255
1279
+ activation=linear
1280
+
1281
+ [yolo]
1282
+ mask = 3,4,5
1283
+ anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401
1284
+ classes=80
1285
+ num=9
1286
+ jitter=.3
1287
+ ignore_thresh = .7
1288
+ truth_thresh = 1
1289
+ random=1
1290
+ scale_x_y = 1.05
1291
+ iou_thresh=0.213
1292
+ cls_normalizer=1.0
1293
+ iou_normalizer=0.07
1294
+ iou_loss=ciou
1295
+ nms_kind=greedynms
1296
+ beta_nms=0.6
1297
+
1298
+
1299
+ # YOLO-5
1300
+
1301
+ [route]
1302
+ layers = 163
1303
+
1304
+ [convolutional]
1305
+ batch_normalize=1
1306
+ size=3
1307
+ stride=1
1308
+ pad=1
1309
+ filters=1024
1310
+ activation=silu
1311
+
1312
+ [convolutional]
1313
+ size=1
1314
+ stride=1
1315
+ pad=1
1316
+ filters=255
1317
+ activation=linear
1318
+
1319
+ [yolo]
1320
+ mask = 6,7,8
1321
+ anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401
1322
+ classes=80
1323
+ num=9
1324
+ jitter=.3
1325
+ ignore_thresh = .7
1326
+ truth_thresh = 1
1327
+ random=1
1328
+ scale_x_y = 1.05
1329
+ iou_thresh=0.213
1330
+ cls_normalizer=1.0
1331
+ iou_normalizer=0.07
1332
+ iou_loss=ciou
1333
+ nms_kind=greedynms
1334
+ beta_nms=0.6
cfg/yolov4_csp_x.cfg ADDED
@@ -0,0 +1,1534 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [net]
2
+ # Testing
3
+ #batch=1
4
+ #subdivisions=1
5
+ # Training
6
+ batch=64
7
+ subdivisions=8
8
+ width=512
9
+ height=512
10
+ channels=3
11
+ momentum=0.949
12
+ decay=0.0005
13
+ angle=0
14
+ saturation = 1.5
15
+ exposure = 1.5
16
+ hue=.1
17
+
18
+ learning_rate=0.00261
19
+ burn_in=1000
20
+ max_batches = 500500
21
+ policy=steps
22
+ steps=400000,450000
23
+ scales=.1,.1
24
+
25
+ #cutmix=1
26
+ mosaic=1
27
+
28
+
29
+ # ============ Backbone ============ #
30
+
31
+ # Stem
32
+
33
+ # 0
34
+ [convolutional]
35
+ batch_normalize=1
36
+ filters=32
37
+ size=3
38
+ stride=1
39
+ pad=1
40
+ activation=silu
41
+
42
+ # P1
43
+
44
+ # Downsample
45
+
46
+ [convolutional]
47
+ batch_normalize=1
48
+ filters=80
49
+ size=3
50
+ stride=2
51
+ pad=1
52
+ activation=silu
53
+
54
+ # Residual Block
55
+
56
+ [convolutional]
57
+ batch_normalize=1
58
+ filters=40
59
+ size=1
60
+ stride=1
61
+ pad=1
62
+ activation=silu
63
+
64
+ [convolutional]
65
+ batch_normalize=1
66
+ filters=80
67
+ size=3
68
+ stride=1
69
+ pad=1
70
+ activation=silu
71
+
72
+ # 4 (previous+1+3k)
73
+ [shortcut]
74
+ from=-3
75
+ activation=linear
76
+
77
+ # P2
78
+
79
+ # Downsample
80
+
81
+ [convolutional]
82
+ batch_normalize=1
83
+ filters=160
84
+ size=3
85
+ stride=2
86
+ pad=1
87
+ activation=silu
88
+
89
+ # Split
90
+
91
+ [convolutional]
92
+ batch_normalize=1
93
+ filters=80
94
+ size=1
95
+ stride=1
96
+ pad=1
97
+ activation=silu
98
+
99
+ [route]
100
+ layers = -2
101
+
102
+ [convolutional]
103
+ batch_normalize=1
104
+ filters=80
105
+ size=1
106
+ stride=1
107
+ pad=1
108
+ activation=silu
109
+
110
+ # Residual Block
111
+
112
+ [convolutional]
113
+ batch_normalize=1
114
+ filters=80
115
+ size=1
116
+ stride=1
117
+ pad=1
118
+ activation=silu
119
+
120
+ [convolutional]
121
+ batch_normalize=1
122
+ filters=80
123
+ size=3
124
+ stride=1
125
+ pad=1
126
+ activation=silu
127
+
128
+ [shortcut]
129
+ from=-3
130
+ activation=linear
131
+
132
+ [convolutional]
133
+ batch_normalize=1
134
+ filters=80
135
+ size=1
136
+ stride=1
137
+ pad=1
138
+ activation=silu
139
+
140
+ [convolutional]
141
+ batch_normalize=1
142
+ filters=80
143
+ size=3
144
+ stride=1
145
+ pad=1
146
+ activation=silu
147
+
148
+ [shortcut]
149
+ from=-3
150
+ activation=linear
151
+
152
+ [convolutional]
153
+ batch_normalize=1
154
+ filters=80
155
+ size=1
156
+ stride=1
157
+ pad=1
158
+ activation=silu
159
+
160
+ [convolutional]
161
+ batch_normalize=1
162
+ filters=80
163
+ size=3
164
+ stride=1
165
+ pad=1
166
+ activation=silu
167
+
168
+ [shortcut]
169
+ from=-3
170
+ activation=linear
171
+
172
+ # Transition first
173
+
174
+ [convolutional]
175
+ batch_normalize=1
176
+ filters=80
177
+ size=1
178
+ stride=1
179
+ pad=1
180
+ activation=silu
181
+
182
+ # Merge [-1, -(3k+4)]
183
+
184
+ [route]
185
+ layers = -1,-13
186
+
187
+ # Transition last
188
+
189
+ # 20 (previous+7+3k)
190
+ [convolutional]
191
+ batch_normalize=1
192
+ filters=160
193
+ size=1
194
+ stride=1
195
+ pad=1
196
+ activation=silu
197
+
198
+ # P3
199
+
200
+ # Downsample
201
+
202
+ [convolutional]
203
+ batch_normalize=1
204
+ filters=320
205
+ size=3
206
+ stride=2
207
+ pad=1
208
+ activation=silu
209
+
210
+ # Split
211
+
212
+ [convolutional]
213
+ batch_normalize=1
214
+ filters=160
215
+ size=1
216
+ stride=1
217
+ pad=1
218
+ activation=silu
219
+
220
+ [route]
221
+ layers = -2
222
+
223
+ [convolutional]
224
+ batch_normalize=1
225
+ filters=160
226
+ size=1
227
+ stride=1
228
+ pad=1
229
+ activation=silu
230
+
231
+ # Residual Block
232
+
233
+ [convolutional]
234
+ batch_normalize=1
235
+ filters=160
236
+ size=1
237
+ stride=1
238
+ pad=1
239
+ activation=silu
240
+
241
+ [convolutional]
242
+ batch_normalize=1
243
+ filters=160
244
+ size=3
245
+ stride=1
246
+ pad=1
247
+ activation=silu
248
+
249
+ [shortcut]
250
+ from=-3
251
+ activation=linear
252
+
253
+ [convolutional]
254
+ batch_normalize=1
255
+ filters=160
256
+ size=1
257
+ stride=1
258
+ pad=1
259
+ activation=silu
260
+
261
+ [convolutional]
262
+ batch_normalize=1
263
+ filters=160
264
+ size=3
265
+ stride=1
266
+ pad=1
267
+ activation=silu
268
+
269
+ [shortcut]
270
+ from=-3
271
+ activation=linear
272
+
273
+ [convolutional]
274
+ batch_normalize=1
275
+ filters=160
276
+ size=1
277
+ stride=1
278
+ pad=1
279
+ activation=silu
280
+
281
+ [convolutional]
282
+ batch_normalize=1
283
+ filters=160
284
+ size=3
285
+ stride=1
286
+ pad=1
287
+ activation=silu
288
+
289
+ [shortcut]
290
+ from=-3
291
+ activation=linear
292
+
293
+ [convolutional]
294
+ batch_normalize=1
295
+ filters=160
296
+ size=1
297
+ stride=1
298
+ pad=1
299
+ activation=silu
300
+
301
+ [convolutional]
302
+ batch_normalize=1
303
+ filters=160
304
+ size=3
305
+ stride=1
306
+ pad=1
307
+ activation=silu
308
+
309
+ [shortcut]
310
+ from=-3
311
+ activation=linear
312
+
313
+ [convolutional]
314
+ batch_normalize=1
315
+ filters=160
316
+ size=1
317
+ stride=1
318
+ pad=1
319
+ activation=silu
320
+
321
+ [convolutional]
322
+ batch_normalize=1
323
+ filters=160
324
+ size=3
325
+ stride=1
326
+ pad=1
327
+ activation=silu
328
+
329
+ [shortcut]
330
+ from=-3
331
+ activation=linear
332
+
333
+ [convolutional]
334
+ batch_normalize=1
335
+ filters=160
336
+ size=1
337
+ stride=1
338
+ pad=1
339
+ activation=silu
340
+
341
+ [convolutional]
342
+ batch_normalize=1
343
+ filters=160
344
+ size=3
345
+ stride=1
346
+ pad=1
347
+ activation=silu
348
+
349
+ [shortcut]
350
+ from=-3
351
+ activation=linear
352
+
353
+ [convolutional]
354
+ batch_normalize=1
355
+ filters=160
356
+ size=1
357
+ stride=1
358
+ pad=1
359
+ activation=silu
360
+
361
+ [convolutional]
362
+ batch_normalize=1
363
+ filters=160
364
+ size=3
365
+ stride=1
366
+ pad=1
367
+ activation=silu
368
+
369
+ [shortcut]
370
+ from=-3
371
+ activation=linear
372
+
373
+ [convolutional]
374
+ batch_normalize=1
375
+ filters=160
376
+ size=1
377
+ stride=1
378
+ pad=1
379
+ activation=silu
380
+
381
+ [convolutional]
382
+ batch_normalize=1
383
+ filters=160
384
+ size=3
385
+ stride=1
386
+ pad=1
387
+ activation=silu
388
+
389
+ [shortcut]
390
+ from=-3
391
+ activation=linear
392
+
393
+ [convolutional]
394
+ batch_normalize=1
395
+ filters=160
396
+ size=1
397
+ stride=1
398
+ pad=1
399
+ activation=silu
400
+
401
+ [convolutional]
402
+ batch_normalize=1
403
+ filters=160
404
+ size=3
405
+ stride=1
406
+ pad=1
407
+ activation=silu
408
+
409
+ [shortcut]
410
+ from=-3
411
+ activation=linear
412
+
413
+ [convolutional]
414
+ batch_normalize=1
415
+ filters=160
416
+ size=1
417
+ stride=1
418
+ pad=1
419
+ activation=silu
420
+
421
+ [convolutional]
422
+ batch_normalize=1
423
+ filters=160
424
+ size=3
425
+ stride=1
426
+ pad=1
427
+ activation=silu
428
+
429
+ [shortcut]
430
+ from=-3
431
+ activation=linear
432
+
433
+ # Transition first
434
+
435
+ [convolutional]
436
+ batch_normalize=1
437
+ filters=160
438
+ size=1
439
+ stride=1
440
+ pad=1
441
+ activation=silu
442
+
443
+ # Merge [-1 -(4+3k)]
444
+
445
+ [route]
446
+ layers = -1,-34
447
+
448
+ # Transition last
449
+
450
+ # 57 (previous+7+3k)
451
+ [convolutional]
452
+ batch_normalize=1
453
+ filters=320
454
+ size=1
455
+ stride=1
456
+ pad=1
457
+ activation=silu
458
+
459
+ # P4
460
+
461
+ # Downsample
462
+
463
+ [convolutional]
464
+ batch_normalize=1
465
+ filters=640
466
+ size=3
467
+ stride=2
468
+ pad=1
469
+ activation=silu
470
+
471
+ # Split
472
+
473
+ [convolutional]
474
+ batch_normalize=1
475
+ filters=320
476
+ size=1
477
+ stride=1
478
+ pad=1
479
+ activation=silu
480
+
481
+ [route]
482
+ layers = -2
483
+
484
+ [convolutional]
485
+ batch_normalize=1
486
+ filters=320
487
+ size=1
488
+ stride=1
489
+ pad=1
490
+ activation=silu
491
+
492
+ # Residual Block
493
+
494
+ [convolutional]
495
+ batch_normalize=1
496
+ filters=320
497
+ size=1
498
+ stride=1
499
+ pad=1
500
+ activation=silu
501
+
502
+ [convolutional]
503
+ batch_normalize=1
504
+ filters=320
505
+ size=3
506
+ stride=1
507
+ pad=1
508
+ activation=silu
509
+
510
+ [shortcut]
511
+ from=-3
512
+ activation=linear
513
+
514
+ [convolutional]
515
+ batch_normalize=1
516
+ filters=320
517
+ size=1
518
+ stride=1
519
+ pad=1
520
+ activation=silu
521
+
522
+ [convolutional]
523
+ batch_normalize=1
524
+ filters=320
525
+ size=3
526
+ stride=1
527
+ pad=1
528
+ activation=silu
529
+
530
+ [shortcut]
531
+ from=-3
532
+ activation=linear
533
+
534
+ [convolutional]
535
+ batch_normalize=1
536
+ filters=320
537
+ size=1
538
+ stride=1
539
+ pad=1
540
+ activation=silu
541
+
542
+ [convolutional]
543
+ batch_normalize=1
544
+ filters=320
545
+ size=3
546
+ stride=1
547
+ pad=1
548
+ activation=silu
549
+
550
+ [shortcut]
551
+ from=-3
552
+ activation=linear
553
+
554
+ [convolutional]
555
+ batch_normalize=1
556
+ filters=320
557
+ size=1
558
+ stride=1
559
+ pad=1
560
+ activation=silu
561
+
562
+ [convolutional]
563
+ batch_normalize=1
564
+ filters=320
565
+ size=3
566
+ stride=1
567
+ pad=1
568
+ activation=silu
569
+
570
+ [shortcut]
571
+ from=-3
572
+ activation=linear
573
+
574
+ [convolutional]
575
+ batch_normalize=1
576
+ filters=320
577
+ size=1
578
+ stride=1
579
+ pad=1
580
+ activation=silu
581
+
582
+ [convolutional]
583
+ batch_normalize=1
584
+ filters=320
585
+ size=3
586
+ stride=1
587
+ pad=1
588
+ activation=silu
589
+
590
+ [shortcut]
591
+ from=-3
592
+ activation=linear
593
+
594
+ [convolutional]
595
+ batch_normalize=1
596
+ filters=320
597
+ size=1
598
+ stride=1
599
+ pad=1
600
+ activation=silu
601
+
602
+ [convolutional]
603
+ batch_normalize=1
604
+ filters=320
605
+ size=3
606
+ stride=1
607
+ pad=1
608
+ activation=silu
609
+
610
+ [shortcut]
611
+ from=-3
612
+ activation=linear
613
+
614
+ [convolutional]
615
+ batch_normalize=1
616
+ filters=320
617
+ size=1
618
+ stride=1
619
+ pad=1
620
+ activation=silu
621
+
622
+ [convolutional]
623
+ batch_normalize=1
624
+ filters=320
625
+ size=3
626
+ stride=1
627
+ pad=1
628
+ activation=silu
629
+
630
+ [shortcut]
631
+ from=-3
632
+ activation=linear
633
+
634
+ [convolutional]
635
+ batch_normalize=1
636
+ filters=320
637
+ size=1
638
+ stride=1
639
+ pad=1
640
+ activation=silu
641
+
642
+ [convolutional]
643
+ batch_normalize=1
644
+ filters=320
645
+ size=3
646
+ stride=1
647
+ pad=1
648
+ activation=silu
649
+
650
+ [shortcut]
651
+ from=-3
652
+ activation=linear
653
+
654
+ [convolutional]
655
+ batch_normalize=1
656
+ filters=320
657
+ size=1
658
+ stride=1
659
+ pad=1
660
+ activation=silu
661
+
662
+ [convolutional]
663
+ batch_normalize=1
664
+ filters=320
665
+ size=3
666
+ stride=1
667
+ pad=1
668
+ activation=silu
669
+
670
+ [shortcut]
671
+ from=-3
672
+ activation=linear
673
+
674
+ [convolutional]
675
+ batch_normalize=1
676
+ filters=320
677
+ size=1
678
+ stride=1
679
+ pad=1
680
+ activation=silu
681
+
682
+ [convolutional]
683
+ batch_normalize=1
684
+ filters=320
685
+ size=3
686
+ stride=1
687
+ pad=1
688
+ activation=silu
689
+
690
+ [shortcut]
691
+ from=-3
692
+ activation=linear
693
+
694
+ # Transition first
695
+
696
+ [convolutional]
697
+ batch_normalize=1
698
+ filters=320
699
+ size=1
700
+ stride=1
701
+ pad=1
702
+ activation=silu
703
+
704
+ # Merge [-1 -(3k+4)]
705
+
706
+ [route]
707
+ layers = -1,-34
708
+
709
+ # Transition last
710
+
711
+ # 94 (previous+7+3k)
712
+ [convolutional]
713
+ batch_normalize=1
714
+ filters=640
715
+ size=1
716
+ stride=1
717
+ pad=1
718
+ activation=silu
719
+
720
+ # P5
721
+
722
+ # Downsample
723
+
724
+ [convolutional]
725
+ batch_normalize=1
726
+ filters=1280
727
+ size=3
728
+ stride=2
729
+ pad=1
730
+ activation=silu
731
+
732
+ # Split
733
+
734
+ [convolutional]
735
+ batch_normalize=1
736
+ filters=640
737
+ size=1
738
+ stride=1
739
+ pad=1
740
+ activation=silu
741
+
742
+ [route]
743
+ layers = -2
744
+
745
+ [convolutional]
746
+ batch_normalize=1
747
+ filters=640
748
+ size=1
749
+ stride=1
750
+ pad=1
751
+ activation=silu
752
+
753
+ # Residual Block
754
+
755
+ [convolutional]
756
+ batch_normalize=1
757
+ filters=640
758
+ size=1
759
+ stride=1
760
+ pad=1
761
+ activation=silu
762
+
763
+ [convolutional]
764
+ batch_normalize=1
765
+ filters=640
766
+ size=3
767
+ stride=1
768
+ pad=1
769
+ activation=silu
770
+
771
+ [shortcut]
772
+ from=-3
773
+ activation=linear
774
+
775
+ [convolutional]
776
+ batch_normalize=1
777
+ filters=640
778
+ size=1
779
+ stride=1
780
+ pad=1
781
+ activation=silu
782
+
783
+ [convolutional]
784
+ batch_normalize=1
785
+ filters=640
786
+ size=3
787
+ stride=1
788
+ pad=1
789
+ activation=silu
790
+
791
+ [shortcut]
792
+ from=-3
793
+ activation=linear
794
+
795
+ [convolutional]
796
+ batch_normalize=1
797
+ filters=640
798
+ size=1
799
+ stride=1
800
+ pad=1
801
+ activation=silu
802
+
803
+ [convolutional]
804
+ batch_normalize=1
805
+ filters=640
806
+ size=3
807
+ stride=1
808
+ pad=1
809
+ activation=silu
810
+
811
+ [shortcut]
812
+ from=-3
813
+ activation=linear
814
+
815
+ [convolutional]
816
+ batch_normalize=1
817
+ filters=640
818
+ size=1
819
+ stride=1
820
+ pad=1
821
+ activation=silu
822
+
823
+ [convolutional]
824
+ batch_normalize=1
825
+ filters=640
826
+ size=3
827
+ stride=1
828
+ pad=1
829
+ activation=silu
830
+
831
+ [shortcut]
832
+ from=-3
833
+ activation=linear
834
+
835
+ [convolutional]
836
+ batch_normalize=1
837
+ filters=640
838
+ size=1
839
+ stride=1
840
+ pad=1
841
+ activation=silu
842
+
843
+ [convolutional]
844
+ batch_normalize=1
845
+ filters=640
846
+ size=3
847
+ stride=1
848
+ pad=1
849
+ activation=silu
850
+
851
+ [shortcut]
852
+ from=-3
853
+ activation=linear
854
+
855
+ # Transition first
856
+
857
+ [convolutional]
858
+ batch_normalize=1
859
+ filters=640
860
+ size=1
861
+ stride=1
862
+ pad=1
863
+ activation=silu
864
+
865
+ # Merge [-1 -(3k+4)]
866
+
867
+ [route]
868
+ layers = -1,-19
869
+
870
+ # Transition last
871
+
872
+ # 116 (previous+7+3k)
873
+ [convolutional]
874
+ batch_normalize=1
875
+ filters=1280
876
+ size=1
877
+ stride=1
878
+ pad=1
879
+ activation=silu
880
+
881
+ # ============ End of Backbone ============ #
882
+
883
+ # ============ Neck ============ #
884
+
885
+ # CSPSPP
886
+
887
+ [convolutional]
888
+ batch_normalize=1
889
+ filters=640
890
+ size=1
891
+ stride=1
892
+ pad=1
893
+ activation=silu
894
+
895
+ [route]
896
+ layers = -2
897
+
898
+ [convolutional]
899
+ batch_normalize=1
900
+ filters=640
901
+ size=1
902
+ stride=1
903
+ pad=1
904
+ activation=silu
905
+
906
+ [convolutional]
907
+ batch_normalize=1
908
+ size=3
909
+ stride=1
910
+ pad=1
911
+ filters=640
912
+ activation=silu
913
+
914
+ [convolutional]
915
+ batch_normalize=1
916
+ filters=640
917
+ size=1
918
+ stride=1
919
+ pad=1
920
+ activation=silu
921
+
922
+ ### SPP ###
923
+ [maxpool]
924
+ stride=1
925
+ size=5
926
+
927
+ [route]
928
+ layers=-2
929
+
930
+ [maxpool]
931
+ stride=1
932
+ size=9
933
+
934
+ [route]
935
+ layers=-4
936
+
937
+ [maxpool]
938
+ stride=1
939
+ size=13
940
+
941
+ [route]
942
+ layers=-1,-3,-5,-6
943
+ ### End SPP ###
944
+
945
+ [convolutional]
946
+ batch_normalize=1
947
+ filters=640
948
+ size=1
949
+ stride=1
950
+ pad=1
951
+ activation=silu
952
+
953
+ [convolutional]
954
+ batch_normalize=1
955
+ size=3
956
+ stride=1
957
+ pad=1
958
+ filters=640
959
+ activation=silu
960
+
961
+ [convolutional]
962
+ batch_normalize=1
963
+ filters=640
964
+ size=1
965
+ stride=1
966
+ pad=1
967
+ activation=silu
968
+
969
+ [convolutional]
970
+ batch_normalize=1
971
+ size=3
972
+ stride=1
973
+ pad=1
974
+ filters=640
975
+ activation=silu
976
+
977
+ [route]
978
+ layers = -1, -15
979
+
980
+ # 133 (previous+6+5+2k)
981
+ [convolutional]
982
+ batch_normalize=1
983
+ filters=640
984
+ size=1
985
+ stride=1
986
+ pad=1
987
+ activation=silu
988
+
989
+ # End of CSPSPP
990
+
991
+
992
+ # FPN-4
993
+
994
+ [convolutional]
995
+ batch_normalize=1
996
+ filters=320
997
+ size=1
998
+ stride=1
999
+ pad=1
1000
+ activation=silu
1001
+
1002
+ [upsample]
1003
+ stride=2
1004
+
1005
+ [route]
1006
+ layers = 94
1007
+
1008
+ [convolutional]
1009
+ batch_normalize=1
1010
+ filters=320
1011
+ size=1
1012
+ stride=1
1013
+ pad=1
1014
+ activation=silu
1015
+
1016
+ [route]
1017
+ layers = -1, -3
1018
+
1019
+ [convolutional]
1020
+ batch_normalize=1
1021
+ filters=320
1022
+ size=1
1023
+ stride=1
1024
+ pad=1
1025
+ activation=silu
1026
+
1027
+ # Split
1028
+
1029
+ [convolutional]
1030
+ batch_normalize=1
1031
+ filters=320
1032
+ size=1
1033
+ stride=1
1034
+ pad=1
1035
+ activation=silu
1036
+
1037
+ [route]
1038
+ layers = -2
1039
+
1040
+ # Plain Block
1041
+
1042
+ [convolutional]
1043
+ batch_normalize=1
1044
+ filters=320
1045
+ size=1
1046
+ stride=1
1047
+ pad=1
1048
+ activation=silu
1049
+
1050
+ [convolutional]
1051
+ batch_normalize=1
1052
+ size=3
1053
+ stride=1
1054
+ pad=1
1055
+ filters=320
1056
+ activation=silu
1057
+
1058
+ [convolutional]
1059
+ batch_normalize=1
1060
+ filters=320
1061
+ size=1
1062
+ stride=1
1063
+ pad=1
1064
+ activation=silu
1065
+
1066
+ [convolutional]
1067
+ batch_normalize=1
1068
+ size=3
1069
+ stride=1
1070
+ pad=1
1071
+ filters=320
1072
+ activation=silu
1073
+
1074
+ [convolutional]
1075
+ batch_normalize=1
1076
+ filters=320
1077
+ size=1
1078
+ stride=1
1079
+ pad=1
1080
+ activation=silu
1081
+
1082
+ [convolutional]
1083
+ batch_normalize=1
1084
+ size=3
1085
+ stride=1
1086
+ pad=1
1087
+ filters=320
1088
+ activation=silu
1089
+
1090
+ # Merge [-1, -(2k+2)]
1091
+
1092
+ [route]
1093
+ layers = -1, -8
1094
+
1095
+ # Transition last
1096
+
1097
+ # 149 (previous+6+4+2k)
1098
+ [convolutional]
1099
+ batch_normalize=1
1100
+ filters=320
1101
+ size=1
1102
+ stride=1
1103
+ pad=1
1104
+ activation=silu
1105
+
1106
+
1107
+ # FPN-3
1108
+
1109
+ [convolutional]
1110
+ batch_normalize=1
1111
+ filters=160
1112
+ size=1
1113
+ stride=1
1114
+ pad=1
1115
+ activation=silu
1116
+
1117
+ [upsample]
1118
+ stride=2
1119
+
1120
+ [route]
1121
+ layers = 57
1122
+
1123
+ [convolutional]
1124
+ batch_normalize=1
1125
+ filters=160
1126
+ size=1
1127
+ stride=1
1128
+ pad=1
1129
+ activation=silu
1130
+
1131
+ [route]
1132
+ layers = -1, -3
1133
+
1134
+ [convolutional]
1135
+ batch_normalize=1
1136
+ filters=160
1137
+ size=1
1138
+ stride=1
1139
+ pad=1
1140
+ activation=silu
1141
+
1142
+ # Split
1143
+
1144
+ [convolutional]
1145
+ batch_normalize=1
1146
+ filters=160
1147
+ size=1
1148
+ stride=1
1149
+ pad=1
1150
+ activation=silu
1151
+
1152
+ [route]
1153
+ layers = -2
1154
+
1155
+ # Plain Block
1156
+
1157
+ [convolutional]
1158
+ batch_normalize=1
1159
+ filters=160
1160
+ size=1
1161
+ stride=1
1162
+ pad=1
1163
+ activation=silu
1164
+
1165
+ [convolutional]
1166
+ batch_normalize=1
1167
+ size=3
1168
+ stride=1
1169
+ pad=1
1170
+ filters=160
1171
+ activation=silu
1172
+
1173
+ [convolutional]
1174
+ batch_normalize=1
1175
+ filters=160
1176
+ size=1
1177
+ stride=1
1178
+ pad=1
1179
+ activation=silu
1180
+
1181
+ [convolutional]
1182
+ batch_normalize=1
1183
+ size=3
1184
+ stride=1
1185
+ pad=1
1186
+ filters=160
1187
+ activation=silu
1188
+
1189
+ [convolutional]
1190
+ batch_normalize=1
1191
+ filters=160
1192
+ size=1
1193
+ stride=1
1194
+ pad=1
1195
+ activation=silu
1196
+
1197
+ [convolutional]
1198
+ batch_normalize=1
1199
+ size=3
1200
+ stride=1
1201
+ pad=1
1202
+ filters=160
1203
+ activation=silu
1204
+
1205
+ # Merge [-1, -(2k+2)]
1206
+
1207
+ [route]
1208
+ layers = -1, -8
1209
+
1210
+ # Transition last
1211
+
1212
+ # 165 (previous+6+4+2k)
1213
+ [convolutional]
1214
+ batch_normalize=1
1215
+ filters=160
1216
+ size=1
1217
+ stride=1
1218
+ pad=1
1219
+ activation=silu
1220
+
1221
+
1222
+ # PAN-4
1223
+
1224
+ [convolutional]
1225
+ batch_normalize=1
1226
+ size=3
1227
+ stride=2
1228
+ pad=1
1229
+ filters=320
1230
+ activation=silu
1231
+
1232
+ [route]
1233
+ layers = -1, 149
1234
+
1235
+ [convolutional]
1236
+ batch_normalize=1
1237
+ filters=320
1238
+ size=1
1239
+ stride=1
1240
+ pad=1
1241
+ activation=silu
1242
+
1243
+ # Split
1244
+
1245
+ [convolutional]
1246
+ batch_normalize=1
1247
+ filters=320
1248
+ size=1
1249
+ stride=1
1250
+ pad=1
1251
+ activation=silu
1252
+
1253
+ [route]
1254
+ layers = -2
1255
+
1256
+ # Plain Block
1257
+
1258
+ [convolutional]
1259
+ batch_normalize=1
1260
+ filters=320
1261
+ size=1
1262
+ stride=1
1263
+ pad=1
1264
+ activation=silu
1265
+
1266
+ [convolutional]
1267
+ batch_normalize=1
1268
+ size=3
1269
+ stride=1
1270
+ pad=1
1271
+ filters=320
1272
+ activation=silu
1273
+
1274
+ [convolutional]
1275
+ batch_normalize=1
1276
+ filters=320
1277
+ size=1
1278
+ stride=1
1279
+ pad=1
1280
+ activation=silu
1281
+
1282
+ [convolutional]
1283
+ batch_normalize=1
1284
+ size=3
1285
+ stride=1
1286
+ pad=1
1287
+ filters=320
1288
+ activation=silu
1289
+
1290
+ [convolutional]
1291
+ batch_normalize=1
1292
+ filters=320
1293
+ size=1
1294
+ stride=1
1295
+ pad=1
1296
+ activation=silu
1297
+
1298
+ [convolutional]
1299
+ batch_normalize=1
1300
+ size=3
1301
+ stride=1
1302
+ pad=1
1303
+ filters=320
1304
+ activation=silu
1305
+
1306
+ [route]
1307
+ layers = -1,-8
1308
+
1309
+ # Transition last
1310
+
1311
+ # 178 (previous+3+4+2k)
1312
+ [convolutional]
1313
+ batch_normalize=1
1314
+ filters=320
1315
+ size=1
1316
+ stride=1
1317
+ pad=1
1318
+ activation=silu
1319
+
1320
+
1321
+ # PAN-5
1322
+
1323
+ [convolutional]
1324
+ batch_normalize=1
1325
+ size=3
1326
+ stride=2
1327
+ pad=1
1328
+ filters=640
1329
+ activation=silu
1330
+
1331
+ [route]
1332
+ layers = -1, 133
1333
+
1334
+ [convolutional]
1335
+ batch_normalize=1
1336
+ filters=640
1337
+ size=1
1338
+ stride=1
1339
+ pad=1
1340
+ activation=silu
1341
+
1342
+ # Split
1343
+
1344
+ [convolutional]
1345
+ batch_normalize=1
1346
+ filters=640
1347
+ size=1
1348
+ stride=1
1349
+ pad=1
1350
+ activation=silu
1351
+
1352
+ [route]
1353
+ layers = -2
1354
+
1355
+ # Plain Block
1356
+
1357
+ [convolutional]
1358
+ batch_normalize=1
1359
+ filters=640
1360
+ size=1
1361
+ stride=1
1362
+ pad=1
1363
+ activation=silu
1364
+
1365
+ [convolutional]
1366
+ batch_normalize=1
1367
+ size=3
1368
+ stride=1
1369
+ pad=1
1370
+ filters=640
1371
+ activation=silu
1372
+
1373
+ [convolutional]
1374
+ batch_normalize=1
1375
+ filters=640
1376
+ size=1
1377
+ stride=1
1378
+ pad=1
1379
+ activation=silu
1380
+
1381
+ [convolutional]
1382
+ batch_normalize=1
1383
+ size=3
1384
+ stride=1
1385
+ pad=1
1386
+ filters=640
1387
+ activation=silu
1388
+
1389
+ [convolutional]
1390
+ batch_normalize=1
1391
+ filters=640
1392
+ size=1
1393
+ stride=1
1394
+ pad=1
1395
+ activation=silu
1396
+
1397
+ [convolutional]
1398
+ batch_normalize=1
1399
+ size=3
1400
+ stride=1
1401
+ pad=1
1402
+ filters=640
1403
+ activation=silu
1404
+
1405
+ [route]
1406
+ layers = -1,-8
1407
+
1408
+ # Transition last
1409
+
1410
+ # 191 (previous+3+4+2k)
1411
+ [convolutional]
1412
+ batch_normalize=1
1413
+ filters=640
1414
+ size=1
1415
+ stride=1
1416
+ pad=1
1417
+ activation=silu
1418
+
1419
+ # ============ End of Neck ============ #
1420
+
1421
+ # ============ Head ============ #
1422
+
1423
+ # YOLO-3
1424
+
1425
+ [route]
1426
+ layers = 165
1427
+
1428
+ [convolutional]
1429
+ batch_normalize=1
1430
+ size=3
1431
+ stride=1
1432
+ pad=1
1433
+ filters=320
1434
+ activation=silu
1435
+
1436
+ [convolutional]
1437
+ size=1
1438
+ stride=1
1439
+ pad=1
1440
+ filters=255
1441
+ activation=linear
1442
+
1443
+ [yolo]
1444
+ mask = 0,1,2
1445
+ anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401
1446
+ classes=80
1447
+ num=9
1448
+ jitter=.3
1449
+ ignore_thresh = .7
1450
+ truth_thresh = 1
1451
+ random=1
1452
+ scale_x_y = 1.05
1453
+ iou_thresh=0.213
1454
+ cls_normalizer=1.0
1455
+ iou_normalizer=0.07
1456
+ iou_loss=ciou
1457
+ nms_kind=greedynms
1458
+ beta_nms=0.6
1459
+
1460
+
1461
+ # YOLO-4
1462
+
1463
+ [route]
1464
+ layers = 178
1465
+
1466
+ [convolutional]
1467
+ batch_normalize=1
1468
+ size=3
1469
+ stride=1
1470
+ pad=1
1471
+ filters=640
1472
+ activation=silu
1473
+
1474
+ [convolutional]
1475
+ size=1
1476
+ stride=1
1477
+ pad=1
1478
+ filters=255
1479
+ activation=linear
1480
+
1481
+ [yolo]
1482
+ mask = 3,4,5
1483
+ anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401
1484
+ classes=80
1485
+ num=9
1486
+ jitter=.3
1487
+ ignore_thresh = .7
1488
+ truth_thresh = 1
1489
+ random=1
1490
+ scale_x_y = 1.05
1491
+ iou_thresh=0.213
1492
+ cls_normalizer=1.0
1493
+ iou_normalizer=0.07
1494
+ iou_loss=ciou
1495
+ nms_kind=greedynms
1496
+ beta_nms=0.6
1497
+
1498
+
1499
+ # YOLO-5
1500
+
1501
+ [route]
1502
+ layers = 191
1503
+
1504
+ [convolutional]
1505
+ batch_normalize=1
1506
+ size=3
1507
+ stride=1
1508
+ pad=1
1509
+ filters=1280
1510
+ activation=silu
1511
+
1512
+ [convolutional]
1513
+ size=1
1514
+ stride=1
1515
+ pad=1
1516
+ filters=255
1517
+ activation=linear
1518
+
1519
+ [yolo]
1520
+ mask = 6,7,8
1521
+ anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401
1522
+ classes=80
1523
+ num=9
1524
+ jitter=.3
1525
+ ignore_thresh = .7
1526
+ truth_thresh = 1
1527
+ random=1
1528
+ scale_x_y = 1.05
1529
+ iou_thresh=0.213
1530
+ cls_normalizer=1.0
1531
+ iou_normalizer=0.07
1532
+ iou_loss=ciou
1533
+ nms_kind=greedynms
1534
+ beta_nms=0.6
cfg/yolov4_p6.cfg ADDED
@@ -0,0 +1,2260 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [net]
2
+ batch=64
3
+ subdivisions=8
4
+ width=1280
5
+ height=1280
6
+ channels=3
7
+ momentum=0.949
8
+ decay=0.0005
9
+ angle=0
10
+ saturation = 1.5
11
+ exposure = 1.5
12
+ hue=.1
13
+
14
+ learning_rate=0.00261
15
+ burn_in=1000
16
+ max_batches = 500500
17
+ policy=steps
18
+ steps=400000,450000
19
+ scales=.1,.1
20
+
21
+ mosaic=1
22
+
23
+
24
+ # ============ Backbone ============ #
25
+
26
+ # Stem
27
+
28
+ # 0
29
+ [convolutional]
30
+ batch_normalize=1
31
+ filters=32
32
+ size=3
33
+ stride=1
34
+ pad=1
35
+ activation=mish
36
+
37
+
38
+ # P1
39
+
40
+ # Downsample
41
+
42
+ [convolutional]
43
+ batch_normalize=1
44
+ filters=64
45
+ size=3
46
+ stride=2
47
+ pad=1
48
+ activation=mish
49
+
50
+ # Split
51
+
52
+ [convolutional]
53
+ batch_normalize=1
54
+ filters=32
55
+ size=1
56
+ stride=1
57
+ pad=1
58
+ activation=mish
59
+
60
+ [route]
61
+ layers = -2
62
+
63
+ [convolutional]
64
+ batch_normalize=1
65
+ filters=32
66
+ size=1
67
+ stride=1
68
+ pad=1
69
+ activation=mish
70
+
71
+ # Residual Block
72
+
73
+ [convolutional]
74
+ batch_normalize=1
75
+ filters=32
76
+ size=1
77
+ stride=1
78
+ pad=1
79
+ activation=mish
80
+
81
+ [convolutional]
82
+ batch_normalize=1
83
+ filters=32
84
+ size=3
85
+ stride=1
86
+ pad=1
87
+ activation=mish
88
+
89
+ [shortcut]
90
+ from=-3
91
+ activation=linear
92
+
93
+ # Transition first
94
+
95
+ [convolutional]
96
+ batch_normalize=1
97
+ filters=32
98
+ size=1
99
+ stride=1
100
+ pad=1
101
+ activation=mish
102
+
103
+ # Merge [-1, -(3k+4)]
104
+
105
+ [route]
106
+ layers = -1,-7
107
+
108
+ # Transition last
109
+
110
+ # 10 (previous+7+3k)
111
+ [convolutional]
112
+ batch_normalize=1
113
+ filters=64
114
+ size=1
115
+ stride=1
116
+ pad=1
117
+ activation=mish
118
+
119
+
120
+ # P2
121
+
122
+ # Downsample
123
+
124
+ [convolutional]
125
+ batch_normalize=1
126
+ filters=128
127
+ size=3
128
+ stride=2
129
+ pad=1
130
+ activation=mish
131
+
132
+ # Split
133
+
134
+ [convolutional]
135
+ batch_normalize=1
136
+ filters=64
137
+ size=1
138
+ stride=1
139
+ pad=1
140
+ activation=mish
141
+
142
+ [route]
143
+ layers = -2
144
+
145
+ [convolutional]
146
+ batch_normalize=1
147
+ filters=64
148
+ size=1
149
+ stride=1
150
+ pad=1
151
+ activation=mish
152
+
153
+ # Residual Block
154
+
155
+ [convolutional]
156
+ batch_normalize=1
157
+ filters=64
158
+ size=1
159
+ stride=1
160
+ pad=1
161
+ activation=mish
162
+
163
+ [convolutional]
164
+ batch_normalize=1
165
+ filters=64
166
+ size=3
167
+ stride=1
168
+ pad=1
169
+ activation=mish
170
+
171
+ [shortcut]
172
+ from=-3
173
+ activation=linear
174
+
175
+ [convolutional]
176
+ batch_normalize=1
177
+ filters=64
178
+ size=1
179
+ stride=1
180
+ pad=1
181
+ activation=mish
182
+
183
+ [convolutional]
184
+ batch_normalize=1
185
+ filters=64
186
+ size=3
187
+ stride=1
188
+ pad=1
189
+ activation=mish
190
+
191
+ [shortcut]
192
+ from=-3
193
+ activation=linear
194
+
195
+ [convolutional]
196
+ batch_normalize=1
197
+ filters=64
198
+ size=1
199
+ stride=1
200
+ pad=1
201
+ activation=mish
202
+
203
+ [convolutional]
204
+ batch_normalize=1
205
+ filters=64
206
+ size=3
207
+ stride=1
208
+ pad=1
209
+ activation=mish
210
+
211
+ [shortcut]
212
+ from=-3
213
+ activation=linear
214
+
215
+ # Transition first
216
+
217
+ [convolutional]
218
+ batch_normalize=1
219
+ filters=64
220
+ size=1
221
+ stride=1
222
+ pad=1
223
+ activation=mish
224
+
225
+ # Merge [-1, -(3k+4)]
226
+
227
+ [route]
228
+ layers = -1,-13
229
+
230
+ # Transition last
231
+
232
+ # 26 (previous+7+3k)
233
+ [convolutional]
234
+ batch_normalize=1
235
+ filters=128
236
+ size=1
237
+ stride=1
238
+ pad=1
239
+ activation=mish
240
+
241
+
242
+ # P3
243
+
244
+ # Downsample
245
+
246
+ [convolutional]
247
+ batch_normalize=1
248
+ filters=256
249
+ size=3
250
+ stride=2
251
+ pad=1
252
+ activation=mish
253
+
254
+ # Split
255
+
256
+ [convolutional]
257
+ batch_normalize=1
258
+ filters=128
259
+ size=1
260
+ stride=1
261
+ pad=1
262
+ activation=mish
263
+
264
+ [route]
265
+ layers = -2
266
+
267
+ [convolutional]
268
+ batch_normalize=1
269
+ filters=128
270
+ size=1
271
+ stride=1
272
+ pad=1
273
+ activation=mish
274
+
275
+ # Residual Block
276
+
277
+ [convolutional]
278
+ batch_normalize=1
279
+ filters=128
280
+ size=1
281
+ stride=1
282
+ pad=1
283
+ activation=mish
284
+
285
+ [convolutional]
286
+ batch_normalize=1
287
+ filters=128
288
+ size=3
289
+ stride=1
290
+ pad=1
291
+ activation=mish
292
+
293
+ [shortcut]
294
+ from=-3
295
+ activation=linear
296
+
297
+ [convolutional]
298
+ batch_normalize=1
299
+ filters=128
300
+ size=1
301
+ stride=1
302
+ pad=1
303
+ activation=mish
304
+
305
+ [convolutional]
306
+ batch_normalize=1
307
+ filters=128
308
+ size=3
309
+ stride=1
310
+ pad=1
311
+ activation=mish
312
+
313
+ [shortcut]
314
+ from=-3
315
+ activation=linear
316
+
317
+ [convolutional]
318
+ batch_normalize=1
319
+ filters=128
320
+ size=1
321
+ stride=1
322
+ pad=1
323
+ activation=mish
324
+
325
+ [convolutional]
326
+ batch_normalize=1
327
+ filters=128
328
+ size=3
329
+ stride=1
330
+ pad=1
331
+ activation=mish
332
+
333
+ [shortcut]
334
+ from=-3
335
+ activation=linear
336
+
337
+ [convolutional]
338
+ batch_normalize=1
339
+ filters=128
340
+ size=1
341
+ stride=1
342
+ pad=1
343
+ activation=mish
344
+
345
+ [convolutional]
346
+ batch_normalize=1
347
+ filters=128
348
+ size=3
349
+ stride=1
350
+ pad=1
351
+ activation=mish
352
+
353
+ [shortcut]
354
+ from=-3
355
+ activation=linear
356
+
357
+ [convolutional]
358
+ batch_normalize=1
359
+ filters=128
360
+ size=1
361
+ stride=1
362
+ pad=1
363
+ activation=mish
364
+
365
+ [convolutional]
366
+ batch_normalize=1
367
+ filters=128
368
+ size=3
369
+ stride=1
370
+ pad=1
371
+ activation=mish
372
+
373
+ [shortcut]
374
+ from=-3
375
+ activation=linear
376
+
377
+ [convolutional]
378
+ batch_normalize=1
379
+ filters=128
380
+ size=1
381
+ stride=1
382
+ pad=1
383
+ activation=mish
384
+
385
+ [convolutional]
386
+ batch_normalize=1
387
+ filters=128
388
+ size=3
389
+ stride=1
390
+ pad=1
391
+ activation=mish
392
+
393
+ [shortcut]
394
+ from=-3
395
+ activation=linear
396
+
397
+ [convolutional]
398
+ batch_normalize=1
399
+ filters=128
400
+ size=1
401
+ stride=1
402
+ pad=1
403
+ activation=mish
404
+
405
+ [convolutional]
406
+ batch_normalize=1
407
+ filters=128
408
+ size=3
409
+ stride=1
410
+ pad=1
411
+ activation=mish
412
+
413
+ [shortcut]
414
+ from=-3
415
+ activation=linear
416
+
417
+ [convolutional]
418
+ batch_normalize=1
419
+ filters=128
420
+ size=1
421
+ stride=1
422
+ pad=1
423
+ activation=mish
424
+
425
+ [convolutional]
426
+ batch_normalize=1
427
+ filters=128
428
+ size=3
429
+ stride=1
430
+ pad=1
431
+ activation=mish
432
+
433
+ [shortcut]
434
+ from=-3
435
+ activation=linear
436
+
437
+ [convolutional]
438
+ batch_normalize=1
439
+ filters=128
440
+ size=1
441
+ stride=1
442
+ pad=1
443
+ activation=mish
444
+
445
+ [convolutional]
446
+ batch_normalize=1
447
+ filters=128
448
+ size=3
449
+ stride=1
450
+ pad=1
451
+ activation=mish
452
+
453
+ [shortcut]
454
+ from=-3
455
+ activation=linear
456
+
457
+ [convolutional]
458
+ batch_normalize=1
459
+ filters=128
460
+ size=1
461
+ stride=1
462
+ pad=1
463
+ activation=mish
464
+
465
+ [convolutional]
466
+ batch_normalize=1
467
+ filters=128
468
+ size=3
469
+ stride=1
470
+ pad=1
471
+ activation=mish
472
+
473
+ [shortcut]
474
+ from=-3
475
+ activation=linear
476
+
477
+ [convolutional]
478
+ batch_normalize=1
479
+ filters=128
480
+ size=1
481
+ stride=1
482
+ pad=1
483
+ activation=mish
484
+
485
+ [convolutional]
486
+ batch_normalize=1
487
+ filters=128
488
+ size=3
489
+ stride=1
490
+ pad=1
491
+ activation=mish
492
+
493
+ [shortcut]
494
+ from=-3
495
+ activation=linear
496
+
497
+ [convolutional]
498
+ batch_normalize=1
499
+ filters=128
500
+ size=1
501
+ stride=1
502
+ pad=1
503
+ activation=mish
504
+
505
+ [convolutional]
506
+ batch_normalize=1
507
+ filters=128
508
+ size=3
509
+ stride=1
510
+ pad=1
511
+ activation=mish
512
+
513
+ [shortcut]
514
+ from=-3
515
+ activation=linear
516
+
517
+ [convolutional]
518
+ batch_normalize=1
519
+ filters=128
520
+ size=1
521
+ stride=1
522
+ pad=1
523
+ activation=mish
524
+
525
+ [convolutional]
526
+ batch_normalize=1
527
+ filters=128
528
+ size=3
529
+ stride=1
530
+ pad=1
531
+ activation=mish
532
+
533
+ [shortcut]
534
+ from=-3
535
+ activation=linear
536
+
537
+ [convolutional]
538
+ batch_normalize=1
539
+ filters=128
540
+ size=1
541
+ stride=1
542
+ pad=1
543
+ activation=mish
544
+
545
+ [convolutional]
546
+ batch_normalize=1
547
+ filters=128
548
+ size=3
549
+ stride=1
550
+ pad=1
551
+ activation=mish
552
+
553
+ [shortcut]
554
+ from=-3
555
+ activation=linear
556
+
557
+ [convolutional]
558
+ batch_normalize=1
559
+ filters=128
560
+ size=1
561
+ stride=1
562
+ pad=1
563
+ activation=mish
564
+
565
+ [convolutional]
566
+ batch_normalize=1
567
+ filters=128
568
+ size=3
569
+ stride=1
570
+ pad=1
571
+ activation=mish
572
+
573
+ [shortcut]
574
+ from=-3
575
+ activation=linear
576
+
577
+ # Transition first
578
+
579
+ [convolutional]
580
+ batch_normalize=1
581
+ filters=128
582
+ size=1
583
+ stride=1
584
+ pad=1
585
+ activation=mish
586
+
587
+ # Merge [-1, -(3k+4)]
588
+
589
+ [route]
590
+ layers = -1,-49
591
+
592
+ # Transition last
593
+
594
+ # 78 (previous+7+3k)
595
+ [convolutional]
596
+ batch_normalize=1
597
+ filters=256
598
+ size=1
599
+ stride=1
600
+ pad=1
601
+ activation=mish
602
+
603
+
604
+ # P4
605
+
606
+ # Downsample
607
+
608
+ [convolutional]
609
+ batch_normalize=1
610
+ filters=512
611
+ size=3
612
+ stride=2
613
+ pad=1
614
+ activation=mish
615
+
616
+ # Split
617
+
618
+ [convolutional]
619
+ batch_normalize=1
620
+ filters=256
621
+ size=1
622
+ stride=1
623
+ pad=1
624
+ activation=mish
625
+
626
+ [route]
627
+ layers = -2
628
+
629
+ [convolutional]
630
+ batch_normalize=1
631
+ filters=256
632
+ size=1
633
+ stride=1
634
+ pad=1
635
+ activation=mish
636
+
637
+ # Residual Block
638
+
639
+ [convolutional]
640
+ batch_normalize=1
641
+ filters=256
642
+ size=1
643
+ stride=1
644
+ pad=1
645
+ activation=mish
646
+
647
+ [convolutional]
648
+ batch_normalize=1
649
+ filters=256
650
+ size=3
651
+ stride=1
652
+ pad=1
653
+ activation=mish
654
+
655
+ [shortcut]
656
+ from=-3
657
+ activation=linear
658
+
659
+ [convolutional]
660
+ batch_normalize=1
661
+ filters=256
662
+ size=1
663
+ stride=1
664
+ pad=1
665
+ activation=mish
666
+
667
+ [convolutional]
668
+ batch_normalize=1
669
+ filters=256
670
+ size=3
671
+ stride=1
672
+ pad=1
673
+ activation=mish
674
+
675
+ [shortcut]
676
+ from=-3
677
+ activation=linear
678
+
679
+ [convolutional]
680
+ batch_normalize=1
681
+ filters=256
682
+ size=1
683
+ stride=1
684
+ pad=1
685
+ activation=mish
686
+
687
+ [convolutional]
688
+ batch_normalize=1
689
+ filters=256
690
+ size=3
691
+ stride=1
692
+ pad=1
693
+ activation=mish
694
+
695
+ [shortcut]
696
+ from=-3
697
+ activation=linear
698
+
699
+ [convolutional]
700
+ batch_normalize=1
701
+ filters=256
702
+ size=1
703
+ stride=1
704
+ pad=1
705
+ activation=mish
706
+
707
+ [convolutional]
708
+ batch_normalize=1
709
+ filters=256
710
+ size=3
711
+ stride=1
712
+ pad=1
713
+ activation=mish
714
+
715
+ [shortcut]
716
+ from=-3
717
+ activation=linear
718
+
719
+ [convolutional]
720
+ batch_normalize=1
721
+ filters=256
722
+ size=1
723
+ stride=1
724
+ pad=1
725
+ activation=mish
726
+
727
+ [convolutional]
728
+ batch_normalize=1
729
+ filters=256
730
+ size=3
731
+ stride=1
732
+ pad=1
733
+ activation=mish
734
+
735
+ [shortcut]
736
+ from=-3
737
+ activation=linear
738
+
739
+ [convolutional]
740
+ batch_normalize=1
741
+ filters=256
742
+ size=1
743
+ stride=1
744
+ pad=1
745
+ activation=mish
746
+
747
+ [convolutional]
748
+ batch_normalize=1
749
+ filters=256
750
+ size=3
751
+ stride=1
752
+ pad=1
753
+ activation=mish
754
+
755
+ [shortcut]
756
+ from=-3
757
+ activation=linear
758
+
759
+ [convolutional]
760
+ batch_normalize=1
761
+ filters=256
762
+ size=1
763
+ stride=1
764
+ pad=1
765
+ activation=mish
766
+
767
+ [convolutional]
768
+ batch_normalize=1
769
+ filters=256
770
+ size=3
771
+ stride=1
772
+ pad=1
773
+ activation=mish
774
+
775
+ [shortcut]
776
+ from=-3
777
+ activation=linear
778
+
779
+ [convolutional]
780
+ batch_normalize=1
781
+ filters=256
782
+ size=1
783
+ stride=1
784
+ pad=1
785
+ activation=mish
786
+
787
+ [convolutional]
788
+ batch_normalize=1
789
+ filters=256
790
+ size=3
791
+ stride=1
792
+ pad=1
793
+ activation=mish
794
+
795
+ [shortcut]
796
+ from=-3
797
+ activation=linear
798
+
799
+ [convolutional]
800
+ batch_normalize=1
801
+ filters=256
802
+ size=1
803
+ stride=1
804
+ pad=1
805
+ activation=mish
806
+
807
+ [convolutional]
808
+ batch_normalize=1
809
+ filters=256
810
+ size=3
811
+ stride=1
812
+ pad=1
813
+ activation=mish
814
+
815
+ [shortcut]
816
+ from=-3
817
+ activation=linear
818
+
819
+ [convolutional]
820
+ batch_normalize=1
821
+ filters=256
822
+ size=1
823
+ stride=1
824
+ pad=1
825
+ activation=mish
826
+
827
+ [convolutional]
828
+ batch_normalize=1
829
+ filters=256
830
+ size=3
831
+ stride=1
832
+ pad=1
833
+ activation=mish
834
+
835
+ [shortcut]
836
+ from=-3
837
+ activation=linear
838
+
839
+ [convolutional]
840
+ batch_normalize=1
841
+ filters=256
842
+ size=1
843
+ stride=1
844
+ pad=1
845
+ activation=mish
846
+
847
+ [convolutional]
848
+ batch_normalize=1
849
+ filters=256
850
+ size=3
851
+ stride=1
852
+ pad=1
853
+ activation=mish
854
+
855
+ [shortcut]
856
+ from=-3
857
+ activation=linear
858
+
859
+ [convolutional]
860
+ batch_normalize=1
861
+ filters=256
862
+ size=1
863
+ stride=1
864
+ pad=1
865
+ activation=mish
866
+
867
+ [convolutional]
868
+ batch_normalize=1
869
+ filters=256
870
+ size=3
871
+ stride=1
872
+ pad=1
873
+ activation=mish
874
+
875
+ [shortcut]
876
+ from=-3
877
+ activation=linear
878
+
879
+ [convolutional]
880
+ batch_normalize=1
881
+ filters=256
882
+ size=1
883
+ stride=1
884
+ pad=1
885
+ activation=mish
886
+
887
+ [convolutional]
888
+ batch_normalize=1
889
+ filters=256
890
+ size=3
891
+ stride=1
892
+ pad=1
893
+ activation=mish
894
+
895
+ [shortcut]
896
+ from=-3
897
+ activation=linear
898
+
899
+ [convolutional]
900
+ batch_normalize=1
901
+ filters=256
902
+ size=1
903
+ stride=1
904
+ pad=1
905
+ activation=mish
906
+
907
+ [convolutional]
908
+ batch_normalize=1
909
+ filters=256
910
+ size=3
911
+ stride=1
912
+ pad=1
913
+ activation=mish
914
+
915
+ [shortcut]
916
+ from=-3
917
+ activation=linear
918
+
919
+ [convolutional]
920
+ batch_normalize=1
921
+ filters=256
922
+ size=1
923
+ stride=1
924
+ pad=1
925
+ activation=mish
926
+
927
+ [convolutional]
928
+ batch_normalize=1
929
+ filters=256
930
+ size=3
931
+ stride=1
932
+ pad=1
933
+ activation=mish
934
+
935
+ [shortcut]
936
+ from=-3
937
+ activation=linear
938
+
939
+ # Transition first
940
+
941
+ [convolutional]
942
+ batch_normalize=1
943
+ filters=256
944
+ size=1
945
+ stride=1
946
+ pad=1
947
+ activation=mish
948
+
949
+ # Merge [-1, -(3k+4)]
950
+
951
+ [route]
952
+ layers = -1,-49
953
+
954
+ # Transition last
955
+
956
+ # 130 (previous+7+3k)
957
+ [convolutional]
958
+ batch_normalize=1
959
+ filters=512
960
+ size=1
961
+ stride=1
962
+ pad=1
963
+ activation=mish
964
+
965
+
966
+ # P5
967
+
968
+ # Downsample
969
+
970
+ [convolutional]
971
+ batch_normalize=1
972
+ filters=1024
973
+ size=3
974
+ stride=2
975
+ pad=1
976
+ activation=mish
977
+
978
+ # Split
979
+
980
+ [convolutional]
981
+ batch_normalize=1
982
+ filters=512
983
+ size=1
984
+ stride=1
985
+ pad=1
986
+ activation=mish
987
+
988
+ [route]
989
+ layers = -2
990
+
991
+ [convolutional]
992
+ batch_normalize=1
993
+ filters=512
994
+ size=1
995
+ stride=1
996
+ pad=1
997
+ activation=mish
998
+
999
+ # Residual Block
1000
+
1001
+ [convolutional]
1002
+ batch_normalize=1
1003
+ filters=512
1004
+ size=1
1005
+ stride=1
1006
+ pad=1
1007
+ activation=mish
1008
+
1009
+ [convolutional]
1010
+ batch_normalize=1
1011
+ filters=512
1012
+ size=3
1013
+ stride=1
1014
+ pad=1
1015
+ activation=mish
1016
+
1017
+ [shortcut]
1018
+ from=-3
1019
+ activation=linear
1020
+
1021
+ [convolutional]
1022
+ batch_normalize=1
1023
+ filters=512
1024
+ size=1
1025
+ stride=1
1026
+ pad=1
1027
+ activation=mish
1028
+
1029
+ [convolutional]
1030
+ batch_normalize=1
1031
+ filters=512
1032
+ size=3
1033
+ stride=1
1034
+ pad=1
1035
+ activation=mish
1036
+
1037
+ [shortcut]
1038
+ from=-3
1039
+ activation=linear
1040
+
1041
+ [convolutional]
1042
+ batch_normalize=1
1043
+ filters=512
1044
+ size=1
1045
+ stride=1
1046
+ pad=1
1047
+ activation=mish
1048
+
1049
+ [convolutional]
1050
+ batch_normalize=1
1051
+ filters=512
1052
+ size=3
1053
+ stride=1
1054
+ pad=1
1055
+ activation=mish
1056
+
1057
+ [shortcut]
1058
+ from=-3
1059
+ activation=linear
1060
+
1061
+ [convolutional]
1062
+ batch_normalize=1
1063
+ filters=512
1064
+ size=1
1065
+ stride=1
1066
+ pad=1
1067
+ activation=mish
1068
+
1069
+ [convolutional]
1070
+ batch_normalize=1
1071
+ filters=512
1072
+ size=3
1073
+ stride=1
1074
+ pad=1
1075
+ activation=mish
1076
+
1077
+ [shortcut]
1078
+ from=-3
1079
+ activation=linear
1080
+
1081
+ [convolutional]
1082
+ batch_normalize=1
1083
+ filters=512
1084
+ size=1
1085
+ stride=1
1086
+ pad=1
1087
+ activation=mish
1088
+
1089
+ [convolutional]
1090
+ batch_normalize=1
1091
+ filters=512
1092
+ size=3
1093
+ stride=1
1094
+ pad=1
1095
+ activation=mish
1096
+
1097
+ [shortcut]
1098
+ from=-3
1099
+ activation=linear
1100
+
1101
+ [convolutional]
1102
+ batch_normalize=1
1103
+ filters=512
1104
+ size=1
1105
+ stride=1
1106
+ pad=1
1107
+ activation=mish
1108
+
1109
+ [convolutional]
1110
+ batch_normalize=1
1111
+ filters=512
1112
+ size=3
1113
+ stride=1
1114
+ pad=1
1115
+ activation=mish
1116
+
1117
+ [shortcut]
1118
+ from=-3
1119
+ activation=linear
1120
+
1121
+ [convolutional]
1122
+ batch_normalize=1
1123
+ filters=512
1124
+ size=1
1125
+ stride=1
1126
+ pad=1
1127
+ activation=mish
1128
+
1129
+ [convolutional]
1130
+ batch_normalize=1
1131
+ filters=512
1132
+ size=3
1133
+ stride=1
1134
+ pad=1
1135
+ activation=mish
1136
+
1137
+ [shortcut]
1138
+ from=-3
1139
+ activation=linear
1140
+
1141
+ # Transition first
1142
+
1143
+ [convolutional]
1144
+ batch_normalize=1
1145
+ filters=512
1146
+ size=1
1147
+ stride=1
1148
+ pad=1
1149
+ activation=mish
1150
+
1151
+ # Merge [-1, -(3k+4)]
1152
+
1153
+ [route]
1154
+ layers = -1,-25
1155
+
1156
+ # Transition last
1157
+
1158
+ # 158 (previous+7+3k)
1159
+ [convolutional]
1160
+ batch_normalize=1
1161
+ filters=1024
1162
+ size=1
1163
+ stride=1
1164
+ pad=1
1165
+ activation=mish
1166
+
1167
+
1168
+ # P6
1169
+
1170
+ # Downsample
1171
+
1172
+ [convolutional]
1173
+ batch_normalize=1
1174
+ filters=1024
1175
+ size=3
1176
+ stride=2
1177
+ pad=1
1178
+ activation=mish
1179
+
1180
+ # Split
1181
+
1182
+ [convolutional]
1183
+ batch_normalize=1
1184
+ filters=512
1185
+ size=1
1186
+ stride=1
1187
+ pad=1
1188
+ activation=mish
1189
+
1190
+ [route]
1191
+ layers = -2
1192
+
1193
+ [convolutional]
1194
+ batch_normalize=1
1195
+ filters=512
1196
+ size=1
1197
+ stride=1
1198
+ pad=1
1199
+ activation=mish
1200
+
1201
+ # Residual Block
1202
+
1203
+ [convolutional]
1204
+ batch_normalize=1
1205
+ filters=512
1206
+ size=1
1207
+ stride=1
1208
+ pad=1
1209
+ activation=mish
1210
+
1211
+ [convolutional]
1212
+ batch_normalize=1
1213
+ filters=512
1214
+ size=3
1215
+ stride=1
1216
+ pad=1
1217
+ activation=mish
1218
+
1219
+ [shortcut]
1220
+ from=-3
1221
+ activation=linear
1222
+
1223
+ [convolutional]
1224
+ batch_normalize=1
1225
+ filters=512
1226
+ size=1
1227
+ stride=1
1228
+ pad=1
1229
+ activation=mish
1230
+
1231
+ [convolutional]
1232
+ batch_normalize=1
1233
+ filters=512
1234
+ size=3
1235
+ stride=1
1236
+ pad=1
1237
+ activation=mish
1238
+
1239
+ [shortcut]
1240
+ from=-3
1241
+ activation=linear
1242
+
1243
+ [convolutional]
1244
+ batch_normalize=1
1245
+ filters=512
1246
+ size=1
1247
+ stride=1
1248
+ pad=1
1249
+ activation=mish
1250
+
1251
+ [convolutional]
1252
+ batch_normalize=1
1253
+ filters=512
1254
+ size=3
1255
+ stride=1
1256
+ pad=1
1257
+ activation=mish
1258
+
1259
+ [shortcut]
1260
+ from=-3
1261
+ activation=linear
1262
+
1263
+ [convolutional]
1264
+ batch_normalize=1
1265
+ filters=512
1266
+ size=1
1267
+ stride=1
1268
+ pad=1
1269
+ activation=mish
1270
+
1271
+ [convolutional]
1272
+ batch_normalize=1
1273
+ filters=512
1274
+ size=3
1275
+ stride=1
1276
+ pad=1
1277
+ activation=mish
1278
+
1279
+ [shortcut]
1280
+ from=-3
1281
+ activation=linear
1282
+
1283
+ [convolutional]
1284
+ batch_normalize=1
1285
+ filters=512
1286
+ size=1
1287
+ stride=1
1288
+ pad=1
1289
+ activation=mish
1290
+
1291
+ [convolutional]
1292
+ batch_normalize=1
1293
+ filters=512
1294
+ size=3
1295
+ stride=1
1296
+ pad=1
1297
+ activation=mish
1298
+
1299
+ [shortcut]
1300
+ from=-3
1301
+ activation=linear
1302
+
1303
+ [convolutional]
1304
+ batch_normalize=1
1305
+ filters=512
1306
+ size=1
1307
+ stride=1
1308
+ pad=1
1309
+ activation=mish
1310
+
1311
+ [convolutional]
1312
+ batch_normalize=1
1313
+ filters=512
1314
+ size=3
1315
+ stride=1
1316
+ pad=1
1317
+ activation=mish
1318
+
1319
+ [shortcut]
1320
+ from=-3
1321
+ activation=linear
1322
+
1323
+ [convolutional]
1324
+ batch_normalize=1
1325
+ filters=512
1326
+ size=1
1327
+ stride=1
1328
+ pad=1
1329
+ activation=mish
1330
+
1331
+ [convolutional]
1332
+ batch_normalize=1
1333
+ filters=512
1334
+ size=3
1335
+ stride=1
1336
+ pad=1
1337
+ activation=mish
1338
+
1339
+ [shortcut]
1340
+ from=-3
1341
+ activation=linear
1342
+
1343
+ # Transition first
1344
+
1345
+ [convolutional]
1346
+ batch_normalize=1
1347
+ filters=512
1348
+ size=1
1349
+ stride=1
1350
+ pad=1
1351
+ activation=mish
1352
+
1353
+ # Merge [-1, -(3k+4)]
1354
+
1355
+ [route]
1356
+ layers = -1,-25
1357
+
1358
+ # Transition last
1359
+
1360
+ # 186 (previous+7+3k)
1361
+ [convolutional]
1362
+ batch_normalize=1
1363
+ filters=1024
1364
+ size=1
1365
+ stride=1
1366
+ pad=1
1367
+ activation=mish
1368
+
1369
+ # ============ End of Backbone ============ #
1370
+
1371
+ # ============ Neck ============ #
1372
+
1373
+ # CSPSPP
1374
+
1375
+ [convolutional]
1376
+ batch_normalize=1
1377
+ filters=512
1378
+ size=1
1379
+ stride=1
1380
+ pad=1
1381
+ activation=mish
1382
+
1383
+ [route]
1384
+ layers = -2
1385
+
1386
+ [convolutional]
1387
+ batch_normalize=1
1388
+ filters=512
1389
+ size=1
1390
+ stride=1
1391
+ pad=1
1392
+ activation=mish
1393
+
1394
+ [convolutional]
1395
+ batch_normalize=1
1396
+ size=3
1397
+ stride=1
1398
+ pad=1
1399
+ filters=512
1400
+ activation=mish
1401
+
1402
+ [convolutional]
1403
+ batch_normalize=1
1404
+ filters=512
1405
+ size=1
1406
+ stride=1
1407
+ pad=1
1408
+ activation=mish
1409
+
1410
+ ### SPP ###
1411
+ [maxpool]
1412
+ stride=1
1413
+ size=5
1414
+
1415
+ [route]
1416
+ layers=-2
1417
+
1418
+ [maxpool]
1419
+ stride=1
1420
+ size=9
1421
+
1422
+ [route]
1423
+ layers=-4
1424
+
1425
+ [maxpool]
1426
+ stride=1
1427
+ size=13
1428
+
1429
+ [route]
1430
+ layers=-1,-3,-5,-6
1431
+ ### End SPP ###
1432
+
1433
+ [convolutional]
1434
+ batch_normalize=1
1435
+ filters=512
1436
+ size=1
1437
+ stride=1
1438
+ pad=1
1439
+ activation=mish
1440
+
1441
+ [convolutional]
1442
+ batch_normalize=1
1443
+ size=3
1444
+ stride=1
1445
+ pad=1
1446
+ filters=512
1447
+ activation=mish
1448
+
1449
+ [route]
1450
+ layers = -1, -13
1451
+
1452
+ # 201 (previous+6+5+2k)
1453
+ [convolutional]
1454
+ batch_normalize=1
1455
+ filters=512
1456
+ size=1
1457
+ stride=1
1458
+ pad=1
1459
+ activation=mish
1460
+
1461
+ # End of CSPSPP
1462
+
1463
+
1464
+ # FPN-5
1465
+
1466
+ [convolutional]
1467
+ batch_normalize=1
1468
+ filters=512
1469
+ size=1
1470
+ stride=1
1471
+ pad=1
1472
+ activation=mish
1473
+
1474
+ [upsample]
1475
+ stride=2
1476
+
1477
+ [route]
1478
+ layers = 158
1479
+
1480
+ [convolutional]
1481
+ batch_normalize=1
1482
+ filters=512
1483
+ size=1
1484
+ stride=1
1485
+ pad=1
1486
+ activation=mish
1487
+
1488
+ [route]
1489
+ layers = -1, -3
1490
+
1491
+ [convolutional]
1492
+ batch_normalize=1
1493
+ filters=512
1494
+ size=1
1495
+ stride=1
1496
+ pad=1
1497
+ activation=mish
1498
+
1499
+ # Split
1500
+
1501
+ [convolutional]
1502
+ batch_normalize=1
1503
+ filters=512
1504
+ size=1
1505
+ stride=1
1506
+ pad=1
1507
+ activation=mish
1508
+
1509
+ [route]
1510
+ layers = -2
1511
+
1512
+ # Plain Block
1513
+
1514
+ [convolutional]
1515
+ batch_normalize=1
1516
+ filters=512
1517
+ size=1
1518
+ stride=1
1519
+ pad=1
1520
+ activation=mish
1521
+
1522
+ [convolutional]
1523
+ batch_normalize=1
1524
+ size=3
1525
+ stride=1
1526
+ pad=1
1527
+ filters=512
1528
+ activation=mish
1529
+
1530
+ [convolutional]
1531
+ batch_normalize=1
1532
+ filters=512
1533
+ size=1
1534
+ stride=1
1535
+ pad=1
1536
+ activation=mish
1537
+
1538
+ [convolutional]
1539
+ batch_normalize=1
1540
+ size=3
1541
+ stride=1
1542
+ pad=1
1543
+ filters=512
1544
+ activation=mish
1545
+
1546
+ [convolutional]
1547
+ batch_normalize=1
1548
+ filters=512
1549
+ size=1
1550
+ stride=1
1551
+ pad=1
1552
+ activation=mish
1553
+
1554
+ [convolutional]
1555
+ batch_normalize=1
1556
+ size=3
1557
+ stride=1
1558
+ pad=1
1559
+ filters=512
1560
+ activation=mish
1561
+
1562
+ # Merge [-1, -(2k+2)]
1563
+
1564
+ [route]
1565
+ layers = -1, -8
1566
+
1567
+ # Transition last
1568
+
1569
+ # 217 (previous+6+4+2k)
1570
+ [convolutional]
1571
+ batch_normalize=1
1572
+ filters=512
1573
+ size=1
1574
+ stride=1
1575
+ pad=1
1576
+ activation=mish
1577
+
1578
+
1579
+ # FPN-4
1580
+
1581
+ [convolutional]
1582
+ batch_normalize=1
1583
+ filters=256
1584
+ size=1
1585
+ stride=1
1586
+ pad=1
1587
+ activation=mish
1588
+
1589
+ [upsample]
1590
+ stride=2
1591
+
1592
+ [route]
1593
+ layers = 130
1594
+
1595
+ [convolutional]
1596
+ batch_normalize=1
1597
+ filters=256
1598
+ size=1
1599
+ stride=1
1600
+ pad=1
1601
+ activation=mish
1602
+
1603
+ [route]
1604
+ layers = -1, -3
1605
+
1606
+ [convolutional]
1607
+ batch_normalize=1
1608
+ filters=256
1609
+ size=1
1610
+ stride=1
1611
+ pad=1
1612
+ activation=mish
1613
+
1614
+ # Split
1615
+
1616
+ [convolutional]
1617
+ batch_normalize=1
1618
+ filters=256
1619
+ size=1
1620
+ stride=1
1621
+ pad=1
1622
+ activation=mish
1623
+
1624
+ [route]
1625
+ layers = -2
1626
+
1627
+ # Plain Block
1628
+
1629
+ [convolutional]
1630
+ batch_normalize=1
1631
+ filters=256
1632
+ size=1
1633
+ stride=1
1634
+ pad=1
1635
+ activation=mish
1636
+
1637
+ [convolutional]
1638
+ batch_normalize=1
1639
+ size=3
1640
+ stride=1
1641
+ pad=1
1642
+ filters=256
1643
+ activation=mish
1644
+
1645
+ [convolutional]
1646
+ batch_normalize=1
1647
+ filters=256
1648
+ size=1
1649
+ stride=1
1650
+ pad=1
1651
+ activation=mish
1652
+
1653
+ [convolutional]
1654
+ batch_normalize=1
1655
+ size=3
1656
+ stride=1
1657
+ pad=1
1658
+ filters=256
1659
+ activation=mish
1660
+
1661
+ [convolutional]
1662
+ batch_normalize=1
1663
+ filters=256
1664
+ size=1
1665
+ stride=1
1666
+ pad=1
1667
+ activation=mish
1668
+
1669
+ [convolutional]
1670
+ batch_normalize=1
1671
+ size=3
1672
+ stride=1
1673
+ pad=1
1674
+ filters=256
1675
+ activation=mish
1676
+
1677
+ # Merge [-1, -(2k+2)]
1678
+
1679
+ [route]
1680
+ layers = -1, -8
1681
+
1682
+ # Transition last
1683
+
1684
+ # 233 (previous+6+4+2k)
1685
+ [convolutional]
1686
+ batch_normalize=1
1687
+ filters=256
1688
+ size=1
1689
+ stride=1
1690
+ pad=1
1691
+ activation=mish
1692
+
1693
+
1694
+ # FPN-3
1695
+
1696
+ [convolutional]
1697
+ batch_normalize=1
1698
+ filters=128
1699
+ size=1
1700
+ stride=1
1701
+ pad=1
1702
+ activation=mish
1703
+
1704
+ [upsample]
1705
+ stride=2
1706
+
1707
+ [route]
1708
+ layers = 78
1709
+
1710
+ [convolutional]
1711
+ batch_normalize=1
1712
+ filters=128
1713
+ size=1
1714
+ stride=1
1715
+ pad=1
1716
+ activation=mish
1717
+
1718
+ [route]
1719
+ layers = -1, -3
1720
+
1721
+ [convolutional]
1722
+ batch_normalize=1
1723
+ filters=128
1724
+ size=1
1725
+ stride=1
1726
+ pad=1
1727
+ activation=mish
1728
+
1729
+ # Split
1730
+
1731
+ [convolutional]
1732
+ batch_normalize=1
1733
+ filters=128
1734
+ size=1
1735
+ stride=1
1736
+ pad=1
1737
+ activation=mish
1738
+
1739
+ [route]
1740
+ layers = -2
1741
+
1742
+ # Plain Block
1743
+
1744
+ [convolutional]
1745
+ batch_normalize=1
1746
+ filters=128
1747
+ size=1
1748
+ stride=1
1749
+ pad=1
1750
+ activation=mish
1751
+
1752
+ [convolutional]
1753
+ batch_normalize=1
1754
+ size=3
1755
+ stride=1
1756
+ pad=1
1757
+ filters=128
1758
+ activation=mish
1759
+
1760
+ [convolutional]
1761
+ batch_normalize=1
1762
+ filters=128
1763
+ size=1
1764
+ stride=1
1765
+ pad=1
1766
+ activation=mish
1767
+
1768
+ [convolutional]
1769
+ batch_normalize=1
1770
+ size=3
1771
+ stride=1
1772
+ pad=1
1773
+ filters=128
1774
+ activation=mish
1775
+
1776
+ [convolutional]
1777
+ batch_normalize=1
1778
+ filters=128
1779
+ size=1
1780
+ stride=1
1781
+ pad=1
1782
+ activation=mish
1783
+
1784
+ [convolutional]
1785
+ batch_normalize=1
1786
+ size=3
1787
+ stride=1
1788
+ pad=1
1789
+ filters=128
1790
+ activation=mish
1791
+
1792
+ # Merge [-1, -(2k+2)]
1793
+
1794
+ [route]
1795
+ layers = -1, -8
1796
+
1797
+ # Transition last
1798
+
1799
+ # 249 (previous+6+4+2k)
1800
+ [convolutional]
1801
+ batch_normalize=1
1802
+ filters=128
1803
+ size=1
1804
+ stride=1
1805
+ pad=1
1806
+ activation=mish
1807
+
1808
+
1809
+ # PAN-4
1810
+
1811
+ [convolutional]
1812
+ batch_normalize=1
1813
+ size=3
1814
+ stride=2
1815
+ pad=1
1816
+ filters=256
1817
+ activation=mish
1818
+
1819
+ [route]
1820
+ layers = -1, 233
1821
+
1822
+ [convolutional]
1823
+ batch_normalize=1
1824
+ filters=256
1825
+ size=1
1826
+ stride=1
1827
+ pad=1
1828
+ activation=mish
1829
+
1830
+ # Split
1831
+
1832
+ [convolutional]
1833
+ batch_normalize=1
1834
+ filters=256
1835
+ size=1
1836
+ stride=1
1837
+ pad=1
1838
+ activation=mish
1839
+
1840
+ [route]
1841
+ layers = -2
1842
+
1843
+ # Plain Block
1844
+
1845
+ [convolutional]
1846
+ batch_normalize=1
1847
+ filters=256
1848
+ size=1
1849
+ stride=1
1850
+ pad=1
1851
+ activation=mish
1852
+
1853
+ [convolutional]
1854
+ batch_normalize=1
1855
+ size=3
1856
+ stride=1
1857
+ pad=1
1858
+ filters=256
1859
+ activation=mish
1860
+
1861
+ [convolutional]
1862
+ batch_normalize=1
1863
+ filters=256
1864
+ size=1
1865
+ stride=1
1866
+ pad=1
1867
+ activation=mish
1868
+
1869
+ [convolutional]
1870
+ batch_normalize=1
1871
+ size=3
1872
+ stride=1
1873
+ pad=1
1874
+ filters=256
1875
+ activation=mish
1876
+
1877
+ [convolutional]
1878
+ batch_normalize=1
1879
+ filters=256
1880
+ size=1
1881
+ stride=1
1882
+ pad=1
1883
+ activation=mish
1884
+
1885
+ [convolutional]
1886
+ batch_normalize=1
1887
+ size=3
1888
+ stride=1
1889
+ pad=1
1890
+ filters=256
1891
+ activation=mish
1892
+
1893
+ [route]
1894
+ layers = -1,-8
1895
+
1896
+ # Transition last
1897
+
1898
+ # 262 (previous+3+4+2k)
1899
+ [convolutional]
1900
+ batch_normalize=1
1901
+ filters=256
1902
+ size=1
1903
+ stride=1
1904
+ pad=1
1905
+ activation=mish
1906
+
1907
+
1908
+ # PAN-5
1909
+
1910
+ [convolutional]
1911
+ batch_normalize=1
1912
+ size=3
1913
+ stride=2
1914
+ pad=1
1915
+ filters=512
1916
+ activation=mish
1917
+
1918
+ [route]
1919
+ layers = -1, 217
1920
+
1921
+ [convolutional]
1922
+ batch_normalize=1
1923
+ filters=512
1924
+ size=1
1925
+ stride=1
1926
+ pad=1
1927
+ activation=mish
1928
+
1929
+ # Split
1930
+
1931
+ [convolutional]
1932
+ batch_normalize=1
1933
+ filters=512
1934
+ size=1
1935
+ stride=1
1936
+ pad=1
1937
+ activation=mish
1938
+
1939
+ [route]
1940
+ layers = -2
1941
+
1942
+ # Plain Block
1943
+
1944
+ [convolutional]
1945
+ batch_normalize=1
1946
+ filters=512
1947
+ size=1
1948
+ stride=1
1949
+ pad=1
1950
+ activation=mish
1951
+
1952
+ [convolutional]
1953
+ batch_normalize=1
1954
+ size=3
1955
+ stride=1
1956
+ pad=1
1957
+ filters=512
1958
+ activation=mish
1959
+
1960
+ [convolutional]
1961
+ batch_normalize=1
1962
+ filters=512
1963
+ size=1
1964
+ stride=1
1965
+ pad=1
1966
+ activation=mish
1967
+
1968
+ [convolutional]
1969
+ batch_normalize=1
1970
+ size=3
1971
+ stride=1
1972
+ pad=1
1973
+ filters=512
1974
+ activation=mish
1975
+
1976
+ [convolutional]
1977
+ batch_normalize=1
1978
+ filters=512
1979
+ size=1
1980
+ stride=1
1981
+ pad=1
1982
+ activation=mish
1983
+
1984
+ [convolutional]
1985
+ batch_normalize=1
1986
+ size=3
1987
+ stride=1
1988
+ pad=1
1989
+ filters=512
1990
+ activation=mish
1991
+
1992
+ [route]
1993
+ layers = -1,-8
1994
+
1995
+ # Transition last
1996
+
1997
+ # 275 (previous+3+4+2k)
1998
+ [convolutional]
1999
+ batch_normalize=1
2000
+ filters=512
2001
+ size=1
2002
+ stride=1
2003
+ pad=1
2004
+ activation=mish
2005
+
2006
+
2007
+ # PAN-6
2008
+
2009
+ [convolutional]
2010
+ batch_normalize=1
2011
+ size=3
2012
+ stride=2
2013
+ pad=1
2014
+ filters=512
2015
+ activation=mish
2016
+
2017
+ [route]
2018
+ layers = -1, 201
2019
+
2020
+ [convolutional]
2021
+ batch_normalize=1
2022
+ filters=512
2023
+ size=1
2024
+ stride=1
2025
+ pad=1
2026
+ activation=mish
2027
+
2028
+ # Split
2029
+
2030
+ [convolutional]
2031
+ batch_normalize=1
2032
+ filters=512
2033
+ size=1
2034
+ stride=1
2035
+ pad=1
2036
+ activation=mish
2037
+
2038
+ [route]
2039
+ layers = -2
2040
+
2041
+ # Plain Block
2042
+
2043
+ [convolutional]
2044
+ batch_normalize=1
2045
+ filters=512
2046
+ size=1
2047
+ stride=1
2048
+ pad=1
2049
+ activation=mish
2050
+
2051
+ [convolutional]
2052
+ batch_normalize=1
2053
+ size=3
2054
+ stride=1
2055
+ pad=1
2056
+ filters=512
2057
+ activation=mish
2058
+
2059
+ [convolutional]
2060
+ batch_normalize=1
2061
+ filters=512
2062
+ size=1
2063
+ stride=1
2064
+ pad=1
2065
+ activation=mish
2066
+
2067
+ [convolutional]
2068
+ batch_normalize=1
2069
+ size=3
2070
+ stride=1
2071
+ pad=1
2072
+ filters=512
2073
+ activation=mish
2074
+
2075
+ [convolutional]
2076
+ batch_normalize=1
2077
+ filters=512
2078
+ size=1
2079
+ stride=1
2080
+ pad=1
2081
+ activation=mish
2082
+
2083
+ [convolutional]
2084
+ batch_normalize=1
2085
+ size=3
2086
+ stride=1
2087
+ pad=1
2088
+ filters=512
2089
+ activation=mish
2090
+
2091
+ [route]
2092
+ layers = -1,-8
2093
+
2094
+ # Transition last
2095
+
2096
+ # 288 (previous+3+4+2k)
2097
+ [convolutional]
2098
+ batch_normalize=1
2099
+ filters=512
2100
+ size=1
2101
+ stride=1
2102
+ pad=1
2103
+ activation=mish
2104
+
2105
+ # ============ End of Neck ============ #
2106
+
2107
+ # ============ Head ============ #
2108
+
2109
+ # YOLO-3
2110
+
2111
+ [route]
2112
+ layers = 249
2113
+
2114
+ [convolutional]
2115
+ batch_normalize=1
2116
+ size=3
2117
+ stride=1
2118
+ pad=1
2119
+ filters=256
2120
+ activation=mish
2121
+
2122
+ [convolutional]
2123
+ size=1
2124
+ stride=1
2125
+ pad=1
2126
+ filters=340
2127
+ activation=linear
2128
+
2129
+ [yolo]
2130
+ mask = 0,1,2,3
2131
+ anchors = 13,17, 31,25, 24,51, 61,45, 61,45, 48,102, 119,96, 97,189, 97,189, 217,184, 171,384, 324,451, 324,451, 545,357, 616,618, 1024,1024
2132
+ classes=80
2133
+ num=16
2134
+ jitter=.3
2135
+ ignore_thresh = .7
2136
+ truth_thresh = 1
2137
+ random=1
2138
+ scale_x_y = 1.05
2139
+ iou_thresh=0.213
2140
+ cls_normalizer=1.0
2141
+ iou_normalizer=0.07
2142
+ iou_loss=ciou
2143
+ nms_kind=greedynms
2144
+ beta_nms=0.6
2145
+
2146
+
2147
+ # YOLO-4
2148
+
2149
+ [route]
2150
+ layers = 262
2151
+
2152
+ [convolutional]
2153
+ batch_normalize=1
2154
+ size=3
2155
+ stride=1
2156
+ pad=1
2157
+ filters=512
2158
+ activation=mish
2159
+
2160
+ [convolutional]
2161
+ size=1
2162
+ stride=1
2163
+ pad=1
2164
+ filters=340
2165
+ activation=linear
2166
+
2167
+ [yolo]
2168
+ mask = 4,5,6,7
2169
+ anchors = 13,17, 31,25, 24,51, 61,45, 61,45, 48,102, 119,96, 97,189, 97,189, 217,184, 171,384, 324,451, 324,451, 545,357, 616,618, 1024,1024
2170
+ classes=80
2171
+ num=16
2172
+ jitter=.3
2173
+ ignore_thresh = .7
2174
+ truth_thresh = 1
2175
+ random=1
2176
+ scale_x_y = 1.05
2177
+ iou_thresh=0.213
2178
+ cls_normalizer=1.0
2179
+ iou_normalizer=0.07
2180
+ iou_loss=ciou
2181
+ nms_kind=greedynms
2182
+ beta_nms=0.6
2183
+
2184
+
2185
+ # YOLO-5
2186
+
2187
+ [route]
2188
+ layers = 275
2189
+
2190
+ [convolutional]
2191
+ batch_normalize=1
2192
+ size=3
2193
+ stride=1
2194
+ pad=1
2195
+ filters=1024
2196
+ activation=mish
2197
+
2198
+ [convolutional]
2199
+ size=1
2200
+ stride=1
2201
+ pad=1
2202
+ filters=340
2203
+ activation=linear
2204
+
2205
+ [yolo]
2206
+ mask = 8,9,10,11
2207
+ anchors = 13,17, 31,25, 24,51, 61,45, 61,45, 48,102, 119,96, 97,189, 97,189, 217,184, 171,384, 324,451, 324,451, 545,357, 616,618, 1024,1024
2208
+ classes=80
2209
+ num=16
2210
+ jitter=.3
2211
+ ignore_thresh = .7
2212
+ truth_thresh = 1
2213
+ random=1
2214
+ scale_x_y = 1.05
2215
+ iou_thresh=0.213
2216
+ cls_normalizer=1.0
2217
+ iou_normalizer=0.07
2218
+ iou_loss=ciou
2219
+ nms_kind=greedynms
2220
+ beta_nms=0.6
2221
+
2222
+
2223
+ # YOLO-6
2224
+
2225
+ [route]
2226
+ layers = 288
2227
+
2228
+ [convolutional]
2229
+ batch_normalize=1
2230
+ size=3
2231
+ stride=1
2232
+ pad=1
2233
+ filters=1024
2234
+ activation=mish
2235
+
2236
+ [convolutional]
2237
+ size=1
2238
+ stride=1
2239
+ pad=1
2240
+ filters=340
2241
+ activation=linear
2242
+
2243
+ [yolo]
2244
+ mask = 12,13,14,15
2245
+ anchors = 13,17, 31,25, 24,51, 61,45, 61,45, 48,102, 119,96, 97,189, 97,189, 217,184, 171,384, 324,451, 324,451, 545,357, 616,618, 1024,1024
2246
+ classes=80
2247
+ num=16
2248
+ jitter=.3
2249
+ ignore_thresh = .7
2250
+ truth_thresh = 1
2251
+ random=1
2252
+ scale_x_y = 1.05
2253
+ iou_thresh=0.213
2254
+ cls_normalizer=1.0
2255
+ iou_normalizer=0.07
2256
+ iou_loss=ciou
2257
+ nms_kind=greedynms
2258
+ beta_nms=0.6
2259
+
2260
+ # ============ End of Head ============ #
cfg/yolov4_p7.cfg ADDED
@@ -0,0 +1,2714 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [net]
2
+ batch=64
3
+ subdivisions=8
4
+ width=1536
5
+ height=1536
6
+ channels=3
7
+ momentum=0.949
8
+ decay=0.0005
9
+ angle=0
10
+ saturation = 1.5
11
+ exposure = 1.5
12
+ hue=.1
13
+
14
+ learning_rate=0.00261
15
+ burn_in=1000
16
+ max_batches = 500500
17
+ policy=steps
18
+ steps=400000,450000
19
+ scales=.1,.1
20
+
21
+ mosaic=1
22
+
23
+
24
+ # ============ Backbone ============ #
25
+
26
+ # Stem
27
+
28
+ # 0
29
+ [convolutional]
30
+ batch_normalize=1
31
+ filters=40
32
+ size=3
33
+ stride=1
34
+ pad=1
35
+ activation=mish
36
+
37
+
38
+ # P1
39
+
40
+ # Downsample
41
+
42
+ [convolutional]
43
+ batch_normalize=1
44
+ filters=80
45
+ size=3
46
+ stride=2
47
+ pad=1
48
+ activation=mish
49
+
50
+ # Split
51
+
52
+ [convolutional]
53
+ batch_normalize=1
54
+ filters=40
55
+ size=1
56
+ stride=1
57
+ pad=1
58
+ activation=mish
59
+
60
+ [route]
61
+ layers = -2
62
+
63
+ [convolutional]
64
+ batch_normalize=1
65
+ filters=40
66
+ size=1
67
+ stride=1
68
+ pad=1
69
+ activation=mish
70
+
71
+ # Residual Block
72
+
73
+ [convolutional]
74
+ batch_normalize=1
75
+ filters=40
76
+ size=1
77
+ stride=1
78
+ pad=1
79
+ activation=mish
80
+
81
+ [convolutional]
82
+ batch_normalize=1
83
+ filters=40
84
+ size=3
85
+ stride=1
86
+ pad=1
87
+ activation=mish
88
+
89
+ [shortcut]
90
+ from=-3
91
+ activation=linear
92
+
93
+ # Transition first
94
+
95
+ [convolutional]
96
+ batch_normalize=1
97
+ filters=40
98
+ size=1
99
+ stride=1
100
+ pad=1
101
+ activation=mish
102
+
103
+ # Merge [-1, -(3k+4)]
104
+
105
+ [route]
106
+ layers = -1,-7
107
+
108
+ # Transition last
109
+
110
+ # 10 (previous+7+3k)
111
+ [convolutional]
112
+ batch_normalize=1
113
+ filters=80
114
+ size=1
115
+ stride=1
116
+ pad=1
117
+ activation=mish
118
+
119
+
120
+ # P2
121
+
122
+ # Downsample
123
+
124
+ [convolutional]
125
+ batch_normalize=1
126
+ filters=160
127
+ size=3
128
+ stride=2
129
+ pad=1
130
+ activation=mish
131
+
132
+ # Split
133
+
134
+ [convolutional]
135
+ batch_normalize=1
136
+ filters=80
137
+ size=1
138
+ stride=1
139
+ pad=1
140
+ activation=mish
141
+
142
+ [route]
143
+ layers = -2
144
+
145
+ [convolutional]
146
+ batch_normalize=1
147
+ filters=80
148
+ size=1
149
+ stride=1
150
+ pad=1
151
+ activation=mish
152
+
153
+ # Residual Block
154
+
155
+ [convolutional]
156
+ batch_normalize=1
157
+ filters=80
158
+ size=1
159
+ stride=1
160
+ pad=1
161
+ activation=mish
162
+
163
+ [convolutional]
164
+ batch_normalize=1
165
+ filters=80
166
+ size=3
167
+ stride=1
168
+ pad=1
169
+ activation=mish
170
+
171
+ [shortcut]
172
+ from=-3
173
+ activation=linear
174
+
175
+ [convolutional]
176
+ batch_normalize=1
177
+ filters=80
178
+ size=1
179
+ stride=1
180
+ pad=1
181
+ activation=mish
182
+
183
+ [convolutional]
184
+ batch_normalize=1
185
+ filters=80
186
+ size=3
187
+ stride=1
188
+ pad=1
189
+ activation=mish
190
+
191
+ [shortcut]
192
+ from=-3
193
+ activation=linear
194
+
195
+ [convolutional]
196
+ batch_normalize=1
197
+ filters=80
198
+ size=1
199
+ stride=1
200
+ pad=1
201
+ activation=mish
202
+
203
+ [convolutional]
204
+ batch_normalize=1
205
+ filters=80
206
+ size=3
207
+ stride=1
208
+ pad=1
209
+ activation=mish
210
+
211
+ [shortcut]
212
+ from=-3
213
+ activation=linear
214
+
215
+ # Transition first
216
+
217
+ [convolutional]
218
+ batch_normalize=1
219
+ filters=80
220
+ size=1
221
+ stride=1
222
+ pad=1
223
+ activation=mish
224
+
225
+ # Merge [-1, -(3k+4)]
226
+
227
+ [route]
228
+ layers = -1,-13
229
+
230
+ # Transition last
231
+
232
+ # 26 (previous+7+3k)
233
+ [convolutional]
234
+ batch_normalize=1
235
+ filters=160
236
+ size=1
237
+ stride=1
238
+ pad=1
239
+ activation=mish
240
+
241
+
242
+ # P3
243
+
244
+ # Downsample
245
+
246
+ [convolutional]
247
+ batch_normalize=1
248
+ filters=320
249
+ size=3
250
+ stride=2
251
+ pad=1
252
+ activation=mish
253
+
254
+ # Split
255
+
256
+ [convolutional]
257
+ batch_normalize=1
258
+ filters=160
259
+ size=1
260
+ stride=1
261
+ pad=1
262
+ activation=mish
263
+
264
+ [route]
265
+ layers = -2
266
+
267
+ [convolutional]
268
+ batch_normalize=1
269
+ filters=160
270
+ size=1
271
+ stride=1
272
+ pad=1
273
+ activation=mish
274
+
275
+ # Residual Block
276
+
277
+ [convolutional]
278
+ batch_normalize=1
279
+ filters=160
280
+ size=1
281
+ stride=1
282
+ pad=1
283
+ activation=mish
284
+
285
+ [convolutional]
286
+ batch_normalize=1
287
+ filters=160
288
+ size=3
289
+ stride=1
290
+ pad=1
291
+ activation=mish
292
+
293
+ [shortcut]
294
+ from=-3
295
+ activation=linear
296
+
297
+ [convolutional]
298
+ batch_normalize=1
299
+ filters=160
300
+ size=1
301
+ stride=1
302
+ pad=1
303
+ activation=mish
304
+
305
+ [convolutional]
306
+ batch_normalize=1
307
+ filters=160
308
+ size=3
309
+ stride=1
310
+ pad=1
311
+ activation=mish
312
+
313
+ [shortcut]
314
+ from=-3
315
+ activation=linear
316
+
317
+ [convolutional]
318
+ batch_normalize=1
319
+ filters=160
320
+ size=1
321
+ stride=1
322
+ pad=1
323
+ activation=mish
324
+
325
+ [convolutional]
326
+ batch_normalize=1
327
+ filters=160
328
+ size=3
329
+ stride=1
330
+ pad=1
331
+ activation=mish
332
+
333
+ [shortcut]
334
+ from=-3
335
+ activation=linear
336
+
337
+ [convolutional]
338
+ batch_normalize=1
339
+ filters=160
340
+ size=1
341
+ stride=1
342
+ pad=1
343
+ activation=mish
344
+
345
+ [convolutional]
346
+ batch_normalize=1
347
+ filters=160
348
+ size=3
349
+ stride=1
350
+ pad=1
351
+ activation=mish
352
+
353
+ [shortcut]
354
+ from=-3
355
+ activation=linear
356
+
357
+ [convolutional]
358
+ batch_normalize=1
359
+ filters=160
360
+ size=1
361
+ stride=1
362
+ pad=1
363
+ activation=mish
364
+
365
+ [convolutional]
366
+ batch_normalize=1
367
+ filters=160
368
+ size=3
369
+ stride=1
370
+ pad=1
371
+ activation=mish
372
+
373
+ [shortcut]
374
+ from=-3
375
+ activation=linear
376
+
377
+ [convolutional]
378
+ batch_normalize=1
379
+ filters=160
380
+ size=1
381
+ stride=1
382
+ pad=1
383
+ activation=mish
384
+
385
+ [convolutional]
386
+ batch_normalize=1
387
+ filters=160
388
+ size=3
389
+ stride=1
390
+ pad=1
391
+ activation=mish
392
+
393
+ [shortcut]
394
+ from=-3
395
+ activation=linear
396
+
397
+ [convolutional]
398
+ batch_normalize=1
399
+ filters=160
400
+ size=1
401
+ stride=1
402
+ pad=1
403
+ activation=mish
404
+
405
+ [convolutional]
406
+ batch_normalize=1
407
+ filters=160
408
+ size=3
409
+ stride=1
410
+ pad=1
411
+ activation=mish
412
+
413
+ [shortcut]
414
+ from=-3
415
+ activation=linear
416
+
417
+ [convolutional]
418
+ batch_normalize=1
419
+ filters=160
420
+ size=1
421
+ stride=1
422
+ pad=1
423
+ activation=mish
424
+
425
+ [convolutional]
426
+ batch_normalize=1
427
+ filters=160
428
+ size=3
429
+ stride=1
430
+ pad=1
431
+ activation=mish
432
+
433
+ [shortcut]
434
+ from=-3
435
+ activation=linear
436
+
437
+ [convolutional]
438
+ batch_normalize=1
439
+ filters=160
440
+ size=1
441
+ stride=1
442
+ pad=1
443
+ activation=mish
444
+
445
+ [convolutional]
446
+ batch_normalize=1
447
+ filters=160
448
+ size=3
449
+ stride=1
450
+ pad=1
451
+ activation=mish
452
+
453
+ [shortcut]
454
+ from=-3
455
+ activation=linear
456
+
457
+ [convolutional]
458
+ batch_normalize=1
459
+ filters=160
460
+ size=1
461
+ stride=1
462
+ pad=1
463
+ activation=mish
464
+
465
+ [convolutional]
466
+ batch_normalize=1
467
+ filters=160
468
+ size=3
469
+ stride=1
470
+ pad=1
471
+ activation=mish
472
+
473
+ [shortcut]
474
+ from=-3
475
+ activation=linear
476
+
477
+ [convolutional]
478
+ batch_normalize=1
479
+ filters=160
480
+ size=1
481
+ stride=1
482
+ pad=1
483
+ activation=mish
484
+
485
+ [convolutional]
486
+ batch_normalize=1
487
+ filters=160
488
+ size=3
489
+ stride=1
490
+ pad=1
491
+ activation=mish
492
+
493
+ [shortcut]
494
+ from=-3
495
+ activation=linear
496
+
497
+ [convolutional]
498
+ batch_normalize=1
499
+ filters=160
500
+ size=1
501
+ stride=1
502
+ pad=1
503
+ activation=mish
504
+
505
+ [convolutional]
506
+ batch_normalize=1
507
+ filters=160
508
+ size=3
509
+ stride=1
510
+ pad=1
511
+ activation=mish
512
+
513
+ [shortcut]
514
+ from=-3
515
+ activation=linear
516
+
517
+ [convolutional]
518
+ batch_normalize=1
519
+ filters=160
520
+ size=1
521
+ stride=1
522
+ pad=1
523
+ activation=mish
524
+
525
+ [convolutional]
526
+ batch_normalize=1
527
+ filters=160
528
+ size=3
529
+ stride=1
530
+ pad=1
531
+ activation=mish
532
+
533
+ [shortcut]
534
+ from=-3
535
+ activation=linear
536
+
537
+ [convolutional]
538
+ batch_normalize=1
539
+ filters=160
540
+ size=1
541
+ stride=1
542
+ pad=1
543
+ activation=mish
544
+
545
+ [convolutional]
546
+ batch_normalize=1
547
+ filters=160
548
+ size=3
549
+ stride=1
550
+ pad=1
551
+ activation=mish
552
+
553
+ [shortcut]
554
+ from=-3
555
+ activation=linear
556
+
557
+ [convolutional]
558
+ batch_normalize=1
559
+ filters=160
560
+ size=1
561
+ stride=1
562
+ pad=1
563
+ activation=mish
564
+
565
+ [convolutional]
566
+ batch_normalize=1
567
+ filters=160
568
+ size=3
569
+ stride=1
570
+ pad=1
571
+ activation=mish
572
+
573
+ [shortcut]
574
+ from=-3
575
+ activation=linear
576
+
577
+ # Transition first
578
+
579
+ [convolutional]
580
+ batch_normalize=1
581
+ filters=160
582
+ size=1
583
+ stride=1
584
+ pad=1
585
+ activation=mish
586
+
587
+ # Merge [-1, -(3k+4)]
588
+
589
+ [route]
590
+ layers = -1,-49
591
+
592
+ # Transition last
593
+
594
+ # 78 (previous+7+3k)
595
+ [convolutional]
596
+ batch_normalize=1
597
+ filters=320
598
+ size=1
599
+ stride=1
600
+ pad=1
601
+ activation=mish
602
+
603
+
604
+ # P4
605
+
606
+ # Downsample
607
+
608
+ [convolutional]
609
+ batch_normalize=1
610
+ filters=640
611
+ size=3
612
+ stride=2
613
+ pad=1
614
+ activation=mish
615
+
616
+ # Split
617
+
618
+ [convolutional]
619
+ batch_normalize=1
620
+ filters=320
621
+ size=1
622
+ stride=1
623
+ pad=1
624
+ activation=mish
625
+
626
+ [route]
627
+ layers = -2
628
+
629
+ [convolutional]
630
+ batch_normalize=1
631
+ filters=320
632
+ size=1
633
+ stride=1
634
+ pad=1
635
+ activation=mish
636
+
637
+ # Residual Block
638
+
639
+ [convolutional]
640
+ batch_normalize=1
641
+ filters=320
642
+ size=1
643
+ stride=1
644
+ pad=1
645
+ activation=mish
646
+
647
+ [convolutional]
648
+ batch_normalize=1
649
+ filters=320
650
+ size=3
651
+ stride=1
652
+ pad=1
653
+ activation=mish
654
+
655
+ [shortcut]
656
+ from=-3
657
+ activation=linear
658
+
659
+ [convolutional]
660
+ batch_normalize=1
661
+ filters=320
662
+ size=1
663
+ stride=1
664
+ pad=1
665
+ activation=mish
666
+
667
+ [convolutional]
668
+ batch_normalize=1
669
+ filters=320
670
+ size=3
671
+ stride=1
672
+ pad=1
673
+ activation=mish
674
+
675
+ [shortcut]
676
+ from=-3
677
+ activation=linear
678
+
679
+ [convolutional]
680
+ batch_normalize=1
681
+ filters=320
682
+ size=1
683
+ stride=1
684
+ pad=1
685
+ activation=mish
686
+
687
+ [convolutional]
688
+ batch_normalize=1
689
+ filters=320
690
+ size=3
691
+ stride=1
692
+ pad=1
693
+ activation=mish
694
+
695
+ [shortcut]
696
+ from=-3
697
+ activation=linear
698
+
699
+ [convolutional]
700
+ batch_normalize=1
701
+ filters=320
702
+ size=1
703
+ stride=1
704
+ pad=1
705
+ activation=mish
706
+
707
+ [convolutional]
708
+ batch_normalize=1
709
+ filters=320
710
+ size=3
711
+ stride=1
712
+ pad=1
713
+ activation=mish
714
+
715
+ [shortcut]
716
+ from=-3
717
+ activation=linear
718
+
719
+ [convolutional]
720
+ batch_normalize=1
721
+ filters=320
722
+ size=1
723
+ stride=1
724
+ pad=1
725
+ activation=mish
726
+
727
+ [convolutional]
728
+ batch_normalize=1
729
+ filters=320
730
+ size=3
731
+ stride=1
732
+ pad=1
733
+ activation=mish
734
+
735
+ [shortcut]
736
+ from=-3
737
+ activation=linear
738
+
739
+ [convolutional]
740
+ batch_normalize=1
741
+ filters=320
742
+ size=1
743
+ stride=1
744
+ pad=1
745
+ activation=mish
746
+
747
+ [convolutional]
748
+ batch_normalize=1
749
+ filters=320
750
+ size=3
751
+ stride=1
752
+ pad=1
753
+ activation=mish
754
+
755
+ [shortcut]
756
+ from=-3
757
+ activation=linear
758
+
759
+ [convolutional]
760
+ batch_normalize=1
761
+ filters=320
762
+ size=1
763
+ stride=1
764
+ pad=1
765
+ activation=mish
766
+
767
+ [convolutional]
768
+ batch_normalize=1
769
+ filters=320
770
+ size=3
771
+ stride=1
772
+ pad=1
773
+ activation=mish
774
+
775
+ [shortcut]
776
+ from=-3
777
+ activation=linear
778
+
779
+ [convolutional]
780
+ batch_normalize=1
781
+ filters=320
782
+ size=1
783
+ stride=1
784
+ pad=1
785
+ activation=mish
786
+
787
+ [convolutional]
788
+ batch_normalize=1
789
+ filters=320
790
+ size=3
791
+ stride=1
792
+ pad=1
793
+ activation=mish
794
+
795
+ [shortcut]
796
+ from=-3
797
+ activation=linear
798
+
799
+ [convolutional]
800
+ batch_normalize=1
801
+ filters=320
802
+ size=1
803
+ stride=1
804
+ pad=1
805
+ activation=mish
806
+
807
+ [convolutional]
808
+ batch_normalize=1
809
+ filters=320
810
+ size=3
811
+ stride=1
812
+ pad=1
813
+ activation=mish
814
+
815
+ [shortcut]
816
+ from=-3
817
+ activation=linear
818
+
819
+ [convolutional]
820
+ batch_normalize=1
821
+ filters=320
822
+ size=1
823
+ stride=1
824
+ pad=1
825
+ activation=mish
826
+
827
+ [convolutional]
828
+ batch_normalize=1
829
+ filters=320
830
+ size=3
831
+ stride=1
832
+ pad=1
833
+ activation=mish
834
+
835
+ [shortcut]
836
+ from=-3
837
+ activation=linear
838
+
839
+ [convolutional]
840
+ batch_normalize=1
841
+ filters=320
842
+ size=1
843
+ stride=1
844
+ pad=1
845
+ activation=mish
846
+
847
+ [convolutional]
848
+ batch_normalize=1
849
+ filters=320
850
+ size=3
851
+ stride=1
852
+ pad=1
853
+ activation=mish
854
+
855
+ [shortcut]
856
+ from=-3
857
+ activation=linear
858
+
859
+ [convolutional]
860
+ batch_normalize=1
861
+ filters=320
862
+ size=1
863
+ stride=1
864
+ pad=1
865
+ activation=mish
866
+
867
+ [convolutional]
868
+ batch_normalize=1
869
+ filters=320
870
+ size=3
871
+ stride=1
872
+ pad=1
873
+ activation=mish
874
+
875
+ [shortcut]
876
+ from=-3
877
+ activation=linear
878
+
879
+ [convolutional]
880
+ batch_normalize=1
881
+ filters=320
882
+ size=1
883
+ stride=1
884
+ pad=1
885
+ activation=mish
886
+
887
+ [convolutional]
888
+ batch_normalize=1
889
+ filters=320
890
+ size=3
891
+ stride=1
892
+ pad=1
893
+ activation=mish
894
+
895
+ [shortcut]
896
+ from=-3
897
+ activation=linear
898
+
899
+ [convolutional]
900
+ batch_normalize=1
901
+ filters=320
902
+ size=1
903
+ stride=1
904
+ pad=1
905
+ activation=mish
906
+
907
+ [convolutional]
908
+ batch_normalize=1
909
+ filters=320
910
+ size=3
911
+ stride=1
912
+ pad=1
913
+ activation=mish
914
+
915
+ [shortcut]
916
+ from=-3
917
+ activation=linear
918
+
919
+ [convolutional]
920
+ batch_normalize=1
921
+ filters=320
922
+ size=1
923
+ stride=1
924
+ pad=1
925
+ activation=mish
926
+
927
+ [convolutional]
928
+ batch_normalize=1
929
+ filters=320
930
+ size=3
931
+ stride=1
932
+ pad=1
933
+ activation=mish
934
+
935
+ [shortcut]
936
+ from=-3
937
+ activation=linear
938
+
939
+ # Transition first
940
+
941
+ [convolutional]
942
+ batch_normalize=1
943
+ filters=320
944
+ size=1
945
+ stride=1
946
+ pad=1
947
+ activation=mish
948
+
949
+ # Merge [-1, -(3k+4)]
950
+
951
+ [route]
952
+ layers = -1,-49
953
+
954
+ # Transition last
955
+
956
+ # 130 (previous+7+3k)
957
+ [convolutional]
958
+ batch_normalize=1
959
+ filters=640
960
+ size=1
961
+ stride=1
962
+ pad=1
963
+ activation=mish
964
+
965
+
966
+ # P5
967
+
968
+ # Downsample
969
+
970
+ [convolutional]
971
+ batch_normalize=1
972
+ filters=1280
973
+ size=3
974
+ stride=2
975
+ pad=1
976
+ activation=mish
977
+
978
+ # Split
979
+
980
+ [convolutional]
981
+ batch_normalize=1
982
+ filters=640
983
+ size=1
984
+ stride=1
985
+ pad=1
986
+ activation=mish
987
+
988
+ [route]
989
+ layers = -2
990
+
991
+ [convolutional]
992
+ batch_normalize=1
993
+ filters=640
994
+ size=1
995
+ stride=1
996
+ pad=1
997
+ activation=mish
998
+
999
+ # Residual Block
1000
+
1001
+ [convolutional]
1002
+ batch_normalize=1
1003
+ filters=640
1004
+ size=1
1005
+ stride=1
1006
+ pad=1
1007
+ activation=mish
1008
+
1009
+ [convolutional]
1010
+ batch_normalize=1
1011
+ filters=640
1012
+ size=3
1013
+ stride=1
1014
+ pad=1
1015
+ activation=mish
1016
+
1017
+ [shortcut]
1018
+ from=-3
1019
+ activation=linear
1020
+
1021
+ [convolutional]
1022
+ batch_normalize=1
1023
+ filters=640
1024
+ size=1
1025
+ stride=1
1026
+ pad=1
1027
+ activation=mish
1028
+
1029
+ [convolutional]
1030
+ batch_normalize=1
1031
+ filters=640
1032
+ size=3
1033
+ stride=1
1034
+ pad=1
1035
+ activation=mish
1036
+
1037
+ [shortcut]
1038
+ from=-3
1039
+ activation=linear
1040
+
1041
+ [convolutional]
1042
+ batch_normalize=1
1043
+ filters=640
1044
+ size=1
1045
+ stride=1
1046
+ pad=1
1047
+ activation=mish
1048
+
1049
+ [convolutional]
1050
+ batch_normalize=1
1051
+ filters=640
1052
+ size=3
1053
+ stride=1
1054
+ pad=1
1055
+ activation=mish
1056
+
1057
+ [shortcut]
1058
+ from=-3
1059
+ activation=linear
1060
+
1061
+ [convolutional]
1062
+ batch_normalize=1
1063
+ filters=640
1064
+ size=1
1065
+ stride=1
1066
+ pad=1
1067
+ activation=mish
1068
+
1069
+ [convolutional]
1070
+ batch_normalize=1
1071
+ filters=640
1072
+ size=3
1073
+ stride=1
1074
+ pad=1
1075
+ activation=mish
1076
+
1077
+ [shortcut]
1078
+ from=-3
1079
+ activation=linear
1080
+
1081
+ [convolutional]
1082
+ batch_normalize=1
1083
+ filters=640
1084
+ size=1
1085
+ stride=1
1086
+ pad=1
1087
+ activation=mish
1088
+
1089
+ [convolutional]
1090
+ batch_normalize=1
1091
+ filters=640
1092
+ size=3
1093
+ stride=1
1094
+ pad=1
1095
+ activation=mish
1096
+
1097
+ [shortcut]
1098
+ from=-3
1099
+ activation=linear
1100
+
1101
+ [convolutional]
1102
+ batch_normalize=1
1103
+ filters=640
1104
+ size=1
1105
+ stride=1
1106
+ pad=1
1107
+ activation=mish
1108
+
1109
+ [convolutional]
1110
+ batch_normalize=1
1111
+ filters=640
1112
+ size=3
1113
+ stride=1
1114
+ pad=1
1115
+ activation=mish
1116
+
1117
+ [shortcut]
1118
+ from=-3
1119
+ activation=linear
1120
+
1121
+ [convolutional]
1122
+ batch_normalize=1
1123
+ filters=640
1124
+ size=1
1125
+ stride=1
1126
+ pad=1
1127
+ activation=mish
1128
+
1129
+ [convolutional]
1130
+ batch_normalize=1
1131
+ filters=640
1132
+ size=3
1133
+ stride=1
1134
+ pad=1
1135
+ activation=mish
1136
+
1137
+ [shortcut]
1138
+ from=-3
1139
+ activation=linear
1140
+
1141
+ # Transition first
1142
+
1143
+ [convolutional]
1144
+ batch_normalize=1
1145
+ filters=640
1146
+ size=1
1147
+ stride=1
1148
+ pad=1
1149
+ activation=mish
1150
+
1151
+ # Merge [-1, -(3k+4)]
1152
+
1153
+ [route]
1154
+ layers = -1,-25
1155
+
1156
+ # Transition last
1157
+
1158
+ # 158 (previous+7+3k)
1159
+ [convolutional]
1160
+ batch_normalize=1
1161
+ filters=1280
1162
+ size=1
1163
+ stride=1
1164
+ pad=1
1165
+ activation=mish
1166
+
1167
+
1168
+ # P6
1169
+
1170
+ # Downsample
1171
+
1172
+ [convolutional]
1173
+ batch_normalize=1
1174
+ filters=1280
1175
+ size=3
1176
+ stride=2
1177
+ pad=1
1178
+ activation=mish
1179
+
1180
+ # Split
1181
+
1182
+ [convolutional]
1183
+ batch_normalize=1
1184
+ filters=640
1185
+ size=1
1186
+ stride=1
1187
+ pad=1
1188
+ activation=mish
1189
+
1190
+ [route]
1191
+ layers = -2
1192
+
1193
+ [convolutional]
1194
+ batch_normalize=1
1195
+ filters=640
1196
+ size=1
1197
+ stride=1
1198
+ pad=1
1199
+ activation=mish
1200
+
1201
+ # Residual Block
1202
+
1203
+ [convolutional]
1204
+ batch_normalize=1
1205
+ filters=640
1206
+ size=1
1207
+ stride=1
1208
+ pad=1
1209
+ activation=mish
1210
+
1211
+ [convolutional]
1212
+ batch_normalize=1
1213
+ filters=640
1214
+ size=3
1215
+ stride=1
1216
+ pad=1
1217
+ activation=mish
1218
+
1219
+ [shortcut]
1220
+ from=-3
1221
+ activation=linear
1222
+
1223
+ [convolutional]
1224
+ batch_normalize=1
1225
+ filters=640
1226
+ size=1
1227
+ stride=1
1228
+ pad=1
1229
+ activation=mish
1230
+
1231
+ [convolutional]
1232
+ batch_normalize=1
1233
+ filters=640
1234
+ size=3
1235
+ stride=1
1236
+ pad=1
1237
+ activation=mish
1238
+
1239
+ [shortcut]
1240
+ from=-3
1241
+ activation=linear
1242
+
1243
+ [convolutional]
1244
+ batch_normalize=1
1245
+ filters=640
1246
+ size=1
1247
+ stride=1
1248
+ pad=1
1249
+ activation=mish
1250
+
1251
+ [convolutional]
1252
+ batch_normalize=1
1253
+ filters=640
1254
+ size=3
1255
+ stride=1
1256
+ pad=1
1257
+ activation=mish
1258
+
1259
+ [shortcut]
1260
+ from=-3
1261
+ activation=linear
1262
+
1263
+ [convolutional]
1264
+ batch_normalize=1
1265
+ filters=640
1266
+ size=1
1267
+ stride=1
1268
+ pad=1
1269
+ activation=mish
1270
+
1271
+ [convolutional]
1272
+ batch_normalize=1
1273
+ filters=640
1274
+ size=3
1275
+ stride=1
1276
+ pad=1
1277
+ activation=mish
1278
+
1279
+ [shortcut]
1280
+ from=-3
1281
+ activation=linear
1282
+
1283
+ [convolutional]
1284
+ batch_normalize=1
1285
+ filters=640
1286
+ size=1
1287
+ stride=1
1288
+ pad=1
1289
+ activation=mish
1290
+
1291
+ [convolutional]
1292
+ batch_normalize=1
1293
+ filters=640
1294
+ size=3
1295
+ stride=1
1296
+ pad=1
1297
+ activation=mish
1298
+
1299
+ [shortcut]
1300
+ from=-3
1301
+ activation=linear
1302
+
1303
+ [convolutional]
1304
+ batch_normalize=1
1305
+ filters=640
1306
+ size=1
1307
+ stride=1
1308
+ pad=1
1309
+ activation=mish
1310
+
1311
+ [convolutional]
1312
+ batch_normalize=1
1313
+ filters=640
1314
+ size=3
1315
+ stride=1
1316
+ pad=1
1317
+ activation=mish
1318
+
1319
+ [shortcut]
1320
+ from=-3
1321
+ activation=linear
1322
+
1323
+ [convolutional]
1324
+ batch_normalize=1
1325
+ filters=640
1326
+ size=1
1327
+ stride=1
1328
+ pad=1
1329
+ activation=mish
1330
+
1331
+ [convolutional]
1332
+ batch_normalize=1
1333
+ filters=640
1334
+ size=3
1335
+ stride=1
1336
+ pad=1
1337
+ activation=mish
1338
+
1339
+ [shortcut]
1340
+ from=-3
1341
+ activation=linear
1342
+
1343
+ # Transition first
1344
+
1345
+ [convolutional]
1346
+ batch_normalize=1
1347
+ filters=640
1348
+ size=1
1349
+ stride=1
1350
+ pad=1
1351
+ activation=mish
1352
+
1353
+ # Merge [-1, -(3k+4)]
1354
+
1355
+ [route]
1356
+ layers = -1,-25
1357
+
1358
+ # Transition last
1359
+
1360
+ # 186 (previous+7+3k)
1361
+ [convolutional]
1362
+ batch_normalize=1
1363
+ filters=1280
1364
+ size=1
1365
+ stride=1
1366
+ pad=1
1367
+ activation=mish
1368
+
1369
+
1370
+ # P7
1371
+
1372
+ # Downsample
1373
+
1374
+ [convolutional]
1375
+ batch_normalize=1
1376
+ filters=1280
1377
+ size=3
1378
+ stride=2
1379
+ pad=1
1380
+ activation=mish
1381
+
1382
+ # Split
1383
+
1384
+ [convolutional]
1385
+ batch_normalize=1
1386
+ filters=640
1387
+ size=1
1388
+ stride=1
1389
+ pad=1
1390
+ activation=mish
1391
+
1392
+ [route]
1393
+ layers = -2
1394
+
1395
+ [convolutional]
1396
+ batch_normalize=1
1397
+ filters=640
1398
+ size=1
1399
+ stride=1
1400
+ pad=1
1401
+ activation=mish
1402
+
1403
+ # Residual Block
1404
+
1405
+ [convolutional]
1406
+ batch_normalize=1
1407
+ filters=640
1408
+ size=1
1409
+ stride=1
1410
+ pad=1
1411
+ activation=mish
1412
+
1413
+ [convolutional]
1414
+ batch_normalize=1
1415
+ filters=640
1416
+ size=3
1417
+ stride=1
1418
+ pad=1
1419
+ activation=mish
1420
+
1421
+ [shortcut]
1422
+ from=-3
1423
+ activation=linear
1424
+
1425
+ [convolutional]
1426
+ batch_normalize=1
1427
+ filters=640
1428
+ size=1
1429
+ stride=1
1430
+ pad=1
1431
+ activation=mish
1432
+
1433
+ [convolutional]
1434
+ batch_normalize=1
1435
+ filters=640
1436
+ size=3
1437
+ stride=1
1438
+ pad=1
1439
+ activation=mish
1440
+
1441
+ [shortcut]
1442
+ from=-3
1443
+ activation=linear
1444
+
1445
+ [convolutional]
1446
+ batch_normalize=1
1447
+ filters=640
1448
+ size=1
1449
+ stride=1
1450
+ pad=1
1451
+ activation=mish
1452
+
1453
+ [convolutional]
1454
+ batch_normalize=1
1455
+ filters=640
1456
+ size=3
1457
+ stride=1
1458
+ pad=1
1459
+ activation=mish
1460
+
1461
+ [shortcut]
1462
+ from=-3
1463
+ activation=linear
1464
+
1465
+ [convolutional]
1466
+ batch_normalize=1
1467
+ filters=640
1468
+ size=1
1469
+ stride=1
1470
+ pad=1
1471
+ activation=mish
1472
+
1473
+ [convolutional]
1474
+ batch_normalize=1
1475
+ filters=640
1476
+ size=3
1477
+ stride=1
1478
+ pad=1
1479
+ activation=mish
1480
+
1481
+ [shortcut]
1482
+ from=-3
1483
+ activation=linear
1484
+
1485
+ [convolutional]
1486
+ batch_normalize=1
1487
+ filters=640
1488
+ size=1
1489
+ stride=1
1490
+ pad=1
1491
+ activation=mish
1492
+
1493
+ [convolutional]
1494
+ batch_normalize=1
1495
+ filters=640
1496
+ size=3
1497
+ stride=1
1498
+ pad=1
1499
+ activation=mish
1500
+
1501
+ [shortcut]
1502
+ from=-3
1503
+ activation=linear
1504
+
1505
+ [convolutional]
1506
+ batch_normalize=1
1507
+ filters=640
1508
+ size=1
1509
+ stride=1
1510
+ pad=1
1511
+ activation=mish
1512
+
1513
+ [convolutional]
1514
+ batch_normalize=1
1515
+ filters=640
1516
+ size=3
1517
+ stride=1
1518
+ pad=1
1519
+ activation=mish
1520
+
1521
+ [shortcut]
1522
+ from=-3
1523
+ activation=linear
1524
+
1525
+ [convolutional]
1526
+ batch_normalize=1
1527
+ filters=640
1528
+ size=1
1529
+ stride=1
1530
+ pad=1
1531
+ activation=mish
1532
+
1533
+ [convolutional]
1534
+ batch_normalize=1
1535
+ filters=640
1536
+ size=3
1537
+ stride=1
1538
+ pad=1
1539
+ activation=mish
1540
+
1541
+ [shortcut]
1542
+ from=-3
1543
+ activation=linear
1544
+
1545
+ # Transition first
1546
+
1547
+ [convolutional]
1548
+ batch_normalize=1
1549
+ filters=640
1550
+ size=1
1551
+ stride=1
1552
+ pad=1
1553
+ activation=mish
1554
+
1555
+ # Merge [-1, -(3k+4)]
1556
+
1557
+ [route]
1558
+ layers = -1,-25
1559
+
1560
+ # Transition last
1561
+
1562
+ # 214 (previous+7+3k)
1563
+ [convolutional]
1564
+ batch_normalize=1
1565
+ filters=1280
1566
+ size=1
1567
+ stride=1
1568
+ pad=1
1569
+ activation=mish
1570
+
1571
+ # ============ End of Backbone ============ #
1572
+
1573
+ # ============ Neck ============ #
1574
+
1575
+ # CSPSPP
1576
+
1577
+ [convolutional]
1578
+ batch_normalize=1
1579
+ filters=640
1580
+ size=1
1581
+ stride=1
1582
+ pad=1
1583
+ activation=mish
1584
+
1585
+ [route]
1586
+ layers = -2
1587
+
1588
+ [convolutional]
1589
+ batch_normalize=1
1590
+ filters=640
1591
+ size=1
1592
+ stride=1
1593
+ pad=1
1594
+ activation=mish
1595
+
1596
+ [convolutional]
1597
+ batch_normalize=1
1598
+ size=3
1599
+ stride=1
1600
+ pad=1
1601
+ filters=640
1602
+ activation=mish
1603
+
1604
+ [convolutional]
1605
+ batch_normalize=1
1606
+ filters=640
1607
+ size=1
1608
+ stride=1
1609
+ pad=1
1610
+ activation=mish
1611
+
1612
+ ### SPP ###
1613
+ [maxpool]
1614
+ stride=1
1615
+ size=5
1616
+
1617
+ [route]
1618
+ layers=-2
1619
+
1620
+ [maxpool]
1621
+ stride=1
1622
+ size=9
1623
+
1624
+ [route]
1625
+ layers=-4
1626
+
1627
+ [maxpool]
1628
+ stride=1
1629
+ size=13
1630
+
1631
+ [route]
1632
+ layers=-1,-3,-5,-6
1633
+ ### End SPP ###
1634
+
1635
+ [convolutional]
1636
+ batch_normalize=1
1637
+ filters=640
1638
+ size=1
1639
+ stride=1
1640
+ pad=1
1641
+ activation=mish
1642
+
1643
+ [convolutional]
1644
+ batch_normalize=1
1645
+ size=3
1646
+ stride=1
1647
+ pad=1
1648
+ filters=640
1649
+ activation=mish
1650
+
1651
+ [route]
1652
+ layers = -1, -13
1653
+
1654
+ # 229 (previous+6+5+2k)
1655
+ [convolutional]
1656
+ batch_normalize=1
1657
+ filters=640
1658
+ size=1
1659
+ stride=1
1660
+ pad=1
1661
+ activation=mish
1662
+
1663
+ # End of CSPSPP
1664
+
1665
+
1666
+ # FPN-6
1667
+
1668
+ [convolutional]
1669
+ batch_normalize=1
1670
+ filters=640
1671
+ size=1
1672
+ stride=1
1673
+ pad=1
1674
+ activation=mish
1675
+
1676
+ [upsample]
1677
+ stride=2
1678
+
1679
+ [route]
1680
+ layers = 186
1681
+
1682
+ [convolutional]
1683
+ batch_normalize=1
1684
+ filters=640
1685
+ size=1
1686
+ stride=1
1687
+ pad=1
1688
+ activation=mish
1689
+
1690
+ [route]
1691
+ layers = -1, -3
1692
+
1693
+ [convolutional]
1694
+ batch_normalize=1
1695
+ filters=640
1696
+ size=1
1697
+ stride=1
1698
+ pad=1
1699
+ activation=mish
1700
+
1701
+ # Split
1702
+
1703
+ [convolutional]
1704
+ batch_normalize=1
1705
+ filters=640
1706
+ size=1
1707
+ stride=1
1708
+ pad=1
1709
+ activation=mish
1710
+
1711
+ [route]
1712
+ layers = -2
1713
+
1714
+ # Plain Block
1715
+
1716
+ [convolutional]
1717
+ batch_normalize=1
1718
+ filters=640
1719
+ size=1
1720
+ stride=1
1721
+ pad=1
1722
+ activation=mish
1723
+
1724
+ [convolutional]
1725
+ batch_normalize=1
1726
+ size=3
1727
+ stride=1
1728
+ pad=1
1729
+ filters=640
1730
+ activation=mish
1731
+
1732
+ [convolutional]
1733
+ batch_normalize=1
1734
+ filters=640
1735
+ size=1
1736
+ stride=1
1737
+ pad=1
1738
+ activation=mish
1739
+
1740
+ [convolutional]
1741
+ batch_normalize=1
1742
+ size=3
1743
+ stride=1
1744
+ pad=1
1745
+ filters=640
1746
+ activation=mish
1747
+
1748
+ [convolutional]
1749
+ batch_normalize=1
1750
+ filters=640
1751
+ size=1
1752
+ stride=1
1753
+ pad=1
1754
+ activation=mish
1755
+
1756
+ [convolutional]
1757
+ batch_normalize=1
1758
+ size=3
1759
+ stride=1
1760
+ pad=1
1761
+ filters=640
1762
+ activation=mish
1763
+
1764
+ # Merge [-1, -(2k+2)]
1765
+
1766
+ [route]
1767
+ layers = -1, -8
1768
+
1769
+ # Transition last
1770
+
1771
+ # 245 (previous+6+4+2k)
1772
+ [convolutional]
1773
+ batch_normalize=1
1774
+ filters=640
1775
+ size=1
1776
+ stride=1
1777
+ pad=1
1778
+ activation=mish
1779
+
1780
+
1781
+ # FPN-5
1782
+
1783
+ [convolutional]
1784
+ batch_normalize=1
1785
+ filters=640
1786
+ size=1
1787
+ stride=1
1788
+ pad=1
1789
+ activation=mish
1790
+
1791
+ [upsample]
1792
+ stride=2
1793
+
1794
+ [route]
1795
+ layers = 158
1796
+
1797
+ [convolutional]
1798
+ batch_normalize=1
1799
+ filters=640
1800
+ size=1
1801
+ stride=1
1802
+ pad=1
1803
+ activation=mish
1804
+
1805
+ [route]
1806
+ layers = -1, -3
1807
+
1808
+ [convolutional]
1809
+ batch_normalize=1
1810
+ filters=640
1811
+ size=1
1812
+ stride=1
1813
+ pad=1
1814
+ activation=mish
1815
+
1816
+ # Split
1817
+
1818
+ [convolutional]
1819
+ batch_normalize=1
1820
+ filters=640
1821
+ size=1
1822
+ stride=1
1823
+ pad=1
1824
+ activation=mish
1825
+
1826
+ [route]
1827
+ layers = -2
1828
+
1829
+ # Plain Block
1830
+
1831
+ [convolutional]
1832
+ batch_normalize=1
1833
+ filters=640
1834
+ size=1
1835
+ stride=1
1836
+ pad=1
1837
+ activation=mish
1838
+
1839
+ [convolutional]
1840
+ batch_normalize=1
1841
+ size=3
1842
+ stride=1
1843
+ pad=1
1844
+ filters=640
1845
+ activation=mish
1846
+
1847
+ [convolutional]
1848
+ batch_normalize=1
1849
+ filters=640
1850
+ size=1
1851
+ stride=1
1852
+ pad=1
1853
+ activation=mish
1854
+
1855
+ [convolutional]
1856
+ batch_normalize=1
1857
+ size=3
1858
+ stride=1
1859
+ pad=1
1860
+ filters=640
1861
+ activation=mish
1862
+
1863
+ [convolutional]
1864
+ batch_normalize=1
1865
+ filters=640
1866
+ size=1
1867
+ stride=1
1868
+ pad=1
1869
+ activation=mish
1870
+
1871
+ [convolutional]
1872
+ batch_normalize=1
1873
+ size=3
1874
+ stride=1
1875
+ pad=1
1876
+ filters=640
1877
+ activation=mish
1878
+
1879
+ # Merge [-1, -(2k+2)]
1880
+
1881
+ [route]
1882
+ layers = -1, -8
1883
+
1884
+ # Transition last
1885
+
1886
+ # 261 (previous+6+4+2k)
1887
+ [convolutional]
1888
+ batch_normalize=1
1889
+ filters=640
1890
+ size=1
1891
+ stride=1
1892
+ pad=1
1893
+ activation=mish
1894
+
1895
+
1896
+ # FPN-4
1897
+
1898
+ [convolutional]
1899
+ batch_normalize=1
1900
+ filters=320
1901
+ size=1
1902
+ stride=1
1903
+ pad=1
1904
+ activation=mish
1905
+
1906
+ [upsample]
1907
+ stride=2
1908
+
1909
+ [route]
1910
+ layers = 130
1911
+
1912
+ [convolutional]
1913
+ batch_normalize=1
1914
+ filters=320
1915
+ size=1
1916
+ stride=1
1917
+ pad=1
1918
+ activation=mish
1919
+
1920
+ [route]
1921
+ layers = -1, -3
1922
+
1923
+ [convolutional]
1924
+ batch_normalize=1
1925
+ filters=320
1926
+ size=1
1927
+ stride=1
1928
+ pad=1
1929
+ activation=mish
1930
+
1931
+ # Split
1932
+
1933
+ [convolutional]
1934
+ batch_normalize=1
1935
+ filters=320
1936
+ size=1
1937
+ stride=1
1938
+ pad=1
1939
+ activation=mish
1940
+
1941
+ [route]
1942
+ layers = -2
1943
+
1944
+ # Plain Block
1945
+
1946
+ [convolutional]
1947
+ batch_normalize=1
1948
+ filters=320
1949
+ size=1
1950
+ stride=1
1951
+ pad=1
1952
+ activation=mish
1953
+
1954
+ [convolutional]
1955
+ batch_normalize=1
1956
+ size=3
1957
+ stride=1
1958
+ pad=1
1959
+ filters=320
1960
+ activation=mish
1961
+
1962
+ [convolutional]
1963
+ batch_normalize=1
1964
+ filters=320
1965
+ size=1
1966
+ stride=1
1967
+ pad=1
1968
+ activation=mish
1969
+
1970
+ [convolutional]
1971
+ batch_normalize=1
1972
+ size=3
1973
+ stride=1
1974
+ pad=1
1975
+ filters=320
1976
+ activation=mish
1977
+
1978
+ [convolutional]
1979
+ batch_normalize=1
1980
+ filters=320
1981
+ size=1
1982
+ stride=1
1983
+ pad=1
1984
+ activation=mish
1985
+
1986
+ [convolutional]
1987
+ batch_normalize=1
1988
+ size=3
1989
+ stride=1
1990
+ pad=1
1991
+ filters=320
1992
+ activation=mish
1993
+
1994
+ # Merge [-1, -(2k+2)]
1995
+
1996
+ [route]
1997
+ layers = -1, -8
1998
+
1999
+ # Transition last
2000
+
2001
+ # 277 (previous+6+4+2k)
2002
+ [convolutional]
2003
+ batch_normalize=1
2004
+ filters=320
2005
+ size=1
2006
+ stride=1
2007
+ pad=1
2008
+ activation=mish
2009
+
2010
+
2011
+ # FPN-3
2012
+
2013
+ [convolutional]
2014
+ batch_normalize=1
2015
+ filters=160
2016
+ size=1
2017
+ stride=1
2018
+ pad=1
2019
+ activation=mish
2020
+
2021
+ [upsample]
2022
+ stride=2
2023
+
2024
+ [route]
2025
+ layers = 78
2026
+
2027
+ [convolutional]
2028
+ batch_normalize=1
2029
+ filters=160
2030
+ size=1
2031
+ stride=1
2032
+ pad=1
2033
+ activation=mish
2034
+
2035
+ [route]
2036
+ layers = -1, -3
2037
+
2038
+ [convolutional]
2039
+ batch_normalize=1
2040
+ filters=160
2041
+ size=1
2042
+ stride=1
2043
+ pad=1
2044
+ activation=mish
2045
+
2046
+ # Split
2047
+
2048
+ [convolutional]
2049
+ batch_normalize=1
2050
+ filters=160
2051
+ size=1
2052
+ stride=1
2053
+ pad=1
2054
+ activation=mish
2055
+
2056
+ [route]
2057
+ layers = -2
2058
+
2059
+ # Plain Block
2060
+
2061
+ [convolutional]
2062
+ batch_normalize=1
2063
+ filters=160
2064
+ size=1
2065
+ stride=1
2066
+ pad=1
2067
+ activation=mish
2068
+
2069
+ [convolutional]
2070
+ batch_normalize=1
2071
+ size=3
2072
+ stride=1
2073
+ pad=1
2074
+ filters=160
2075
+ activation=mish
2076
+
2077
+ [convolutional]
2078
+ batch_normalize=1
2079
+ filters=160
2080
+ size=1
2081
+ stride=1
2082
+ pad=1
2083
+ activation=mish
2084
+
2085
+ [convolutional]
2086
+ batch_normalize=1
2087
+ size=3
2088
+ stride=1
2089
+ pad=1
2090
+ filters=160
2091
+ activation=mish
2092
+
2093
+ [convolutional]
2094
+ batch_normalize=1
2095
+ filters=160
2096
+ size=1
2097
+ stride=1
2098
+ pad=1
2099
+ activation=mish
2100
+
2101
+ [convolutional]
2102
+ batch_normalize=1
2103
+ size=3
2104
+ stride=1
2105
+ pad=1
2106
+ filters=160
2107
+ activation=mish
2108
+
2109
+ # Merge [-1, -(2k+2)]
2110
+
2111
+ [route]
2112
+ layers = -1, -8
2113
+
2114
+ # Transition last
2115
+
2116
+ # 293 (previous+6+4+2k)
2117
+ [convolutional]
2118
+ batch_normalize=1
2119
+ filters=160
2120
+ size=1
2121
+ stride=1
2122
+ pad=1
2123
+ activation=mish
2124
+
2125
+
2126
+ # PAN-4
2127
+
2128
+ [convolutional]
2129
+ batch_normalize=1
2130
+ size=3
2131
+ stride=2
2132
+ pad=1
2133
+ filters=320
2134
+ activation=mish
2135
+
2136
+ [route]
2137
+ layers = -1, 277
2138
+
2139
+ [convolutional]
2140
+ batch_normalize=1
2141
+ filters=320
2142
+ size=1
2143
+ stride=1
2144
+ pad=1
2145
+ activation=mish
2146
+
2147
+ # Split
2148
+
2149
+ [convolutional]
2150
+ batch_normalize=1
2151
+ filters=320
2152
+ size=1
2153
+ stride=1
2154
+ pad=1
2155
+ activation=mish
2156
+
2157
+ [route]
2158
+ layers = -2
2159
+
2160
+ # Plain Block
2161
+
2162
+ [convolutional]
2163
+ batch_normalize=1
2164
+ filters=320
2165
+ size=1
2166
+ stride=1
2167
+ pad=1
2168
+ activation=mish
2169
+
2170
+ [convolutional]
2171
+ batch_normalize=1
2172
+ size=3
2173
+ stride=1
2174
+ pad=1
2175
+ filters=320
2176
+ activation=mish
2177
+
2178
+ [convolutional]
2179
+ batch_normalize=1
2180
+ filters=320
2181
+ size=1
2182
+ stride=1
2183
+ pad=1
2184
+ activation=mish
2185
+
2186
+ [convolutional]
2187
+ batch_normalize=1
2188
+ size=3
2189
+ stride=1
2190
+ pad=1
2191
+ filters=320
2192
+ activation=mish
2193
+
2194
+ [convolutional]
2195
+ batch_normalize=1
2196
+ filters=320
2197
+ size=1
2198
+ stride=1
2199
+ pad=1
2200
+ activation=mish
2201
+
2202
+ [convolutional]
2203
+ batch_normalize=1
2204
+ size=3
2205
+ stride=1
2206
+ pad=1
2207
+ filters=320
2208
+ activation=mish
2209
+
2210
+ [route]
2211
+ layers = -1,-8
2212
+
2213
+ # Transition last
2214
+
2215
+ # 306 (previous+3+4+2k)
2216
+ [convolutional]
2217
+ batch_normalize=1
2218
+ filters=320
2219
+ size=1
2220
+ stride=1
2221
+ pad=1
2222
+ activation=mish
2223
+
2224
+
2225
+ # PAN-5
2226
+
2227
+ [convolutional]
2228
+ batch_normalize=1
2229
+ size=3
2230
+ stride=2
2231
+ pad=1
2232
+ filters=640
2233
+ activation=mish
2234
+
2235
+ [route]
2236
+ layers = -1, 261
2237
+
2238
+ [convolutional]
2239
+ batch_normalize=1
2240
+ filters=640
2241
+ size=1
2242
+ stride=1
2243
+ pad=1
2244
+ activation=mish
2245
+
2246
+ # Split
2247
+
2248
+ [convolutional]
2249
+ batch_normalize=1
2250
+ filters=640
2251
+ size=1
2252
+ stride=1
2253
+ pad=1
2254
+ activation=mish
2255
+
2256
+ [route]
2257
+ layers = -2
2258
+
2259
+ # Plain Block
2260
+
2261
+ [convolutional]
2262
+ batch_normalize=1
2263
+ filters=640
2264
+ size=1
2265
+ stride=1
2266
+ pad=1
2267
+ activation=mish
2268
+
2269
+ [convolutional]
2270
+ batch_normalize=1
2271
+ size=3
2272
+ stride=1
2273
+ pad=1
2274
+ filters=640
2275
+ activation=mish
2276
+
2277
+ [convolutional]
2278
+ batch_normalize=1
2279
+ filters=640
2280
+ size=1
2281
+ stride=1
2282
+ pad=1
2283
+ activation=mish
2284
+
2285
+ [convolutional]
2286
+ batch_normalize=1
2287
+ size=3
2288
+ stride=1
2289
+ pad=1
2290
+ filters=640
2291
+ activation=mish
2292
+
2293
+ [convolutional]
2294
+ batch_normalize=1
2295
+ filters=640
2296
+ size=1
2297
+ stride=1
2298
+ pad=1
2299
+ activation=mish
2300
+
2301
+ [convolutional]
2302
+ batch_normalize=1
2303
+ size=3
2304
+ stride=1
2305
+ pad=1
2306
+ filters=640
2307
+ activation=mish
2308
+
2309
+ [route]
2310
+ layers = -1,-8
2311
+
2312
+ # Transition last
2313
+
2314
+ # 319 (previous+3+4+2k)
2315
+ [convolutional]
2316
+ batch_normalize=1
2317
+ filters=640
2318
+ size=1
2319
+ stride=1
2320
+ pad=1
2321
+ activation=mish
2322
+
2323
+
2324
+ # PAN-6
2325
+
2326
+ [convolutional]
2327
+ batch_normalize=1
2328
+ size=3
2329
+ stride=2
2330
+ pad=1
2331
+ filters=640
2332
+ activation=mish
2333
+
2334
+ [route]
2335
+ layers = -1, 245
2336
+
2337
+ [convolutional]
2338
+ batch_normalize=1
2339
+ filters=640
2340
+ size=1
2341
+ stride=1
2342
+ pad=1
2343
+ activation=mish
2344
+
2345
+ # Split
2346
+
2347
+ [convolutional]
2348
+ batch_normalize=1
2349
+ filters=640
2350
+ size=1
2351
+ stride=1
2352
+ pad=1
2353
+ activation=mish
2354
+
2355
+ [route]
2356
+ layers = -2
2357
+
2358
+ # Plain Block
2359
+
2360
+ [convolutional]
2361
+ batch_normalize=1
2362
+ filters=640
2363
+ size=1
2364
+ stride=1
2365
+ pad=1
2366
+ activation=mish
2367
+
2368
+ [convolutional]
2369
+ batch_normalize=1
2370
+ size=3
2371
+ stride=1
2372
+ pad=1
2373
+ filters=640
2374
+ activation=mish
2375
+
2376
+ [convolutional]
2377
+ batch_normalize=1
2378
+ filters=640
2379
+ size=1
2380
+ stride=1
2381
+ pad=1
2382
+ activation=mish
2383
+
2384
+ [convolutional]
2385
+ batch_normalize=1
2386
+ size=3
2387
+ stride=1
2388
+ pad=1
2389
+ filters=640
2390
+ activation=mish
2391
+
2392
+ [convolutional]
2393
+ batch_normalize=1
2394
+ filters=640
2395
+ size=1
2396
+ stride=1
2397
+ pad=1
2398
+ activation=mish
2399
+
2400
+ [convolutional]
2401
+ batch_normalize=1
2402
+ size=3
2403
+ stride=1
2404
+ pad=1
2405
+ filters=640
2406
+ activation=mish
2407
+
2408
+ [route]
2409
+ layers = -1,-8
2410
+
2411
+ # Transition last
2412
+
2413
+ # 332 (previous+3+4+2k)
2414
+ [convolutional]
2415
+ batch_normalize=1
2416
+ filters=640
2417
+ size=1
2418
+ stride=1
2419
+ pad=1
2420
+ activation=mish
2421
+
2422
+
2423
+ # PAN-7
2424
+
2425
+ [convolutional]
2426
+ batch_normalize=1
2427
+ size=3
2428
+ stride=2
2429
+ pad=1
2430
+ filters=640
2431
+ activation=mish
2432
+
2433
+ [route]
2434
+ layers = -1, 229
2435
+
2436
+ [convolutional]
2437
+ batch_normalize=1
2438
+ filters=640
2439
+ size=1
2440
+ stride=1
2441
+ pad=1
2442
+ activation=mish
2443
+
2444
+ # Split
2445
+
2446
+ [convolutional]
2447
+ batch_normalize=1
2448
+ filters=640
2449
+ size=1
2450
+ stride=1
2451
+ pad=1
2452
+ activation=mish
2453
+
2454
+ [route]
2455
+ layers = -2
2456
+
2457
+ # Plain Block
2458
+
2459
+ [convolutional]
2460
+ batch_normalize=1
2461
+ filters=640
2462
+ size=1
2463
+ stride=1
2464
+ pad=1
2465
+ activation=mish
2466
+
2467
+ [convolutional]
2468
+ batch_normalize=1
2469
+ size=3
2470
+ stride=1
2471
+ pad=1
2472
+ filters=640
2473
+ activation=mish
2474
+
2475
+ [convolutional]
2476
+ batch_normalize=1
2477
+ filters=640
2478
+ size=1
2479
+ stride=1
2480
+ pad=1
2481
+ activation=mish
2482
+
2483
+ [convolutional]
2484
+ batch_normalize=1
2485
+ size=3
2486
+ stride=1
2487
+ pad=1
2488
+ filters=640
2489
+ activation=mish
2490
+
2491
+ [convolutional]
2492
+ batch_normalize=1
2493
+ filters=640
2494
+ size=1
2495
+ stride=1
2496
+ pad=1
2497
+ activation=mish
2498
+
2499
+ [convolutional]
2500
+ batch_normalize=1
2501
+ size=3
2502
+ stride=1
2503
+ pad=1
2504
+ filters=640
2505
+ activation=mish
2506
+
2507
+ [route]
2508
+ layers = -1,-8
2509
+
2510
+ # Transition last
2511
+
2512
+ # 345 (previous+3+4+2k)
2513
+ [convolutional]
2514
+ batch_normalize=1
2515
+ filters=640
2516
+ size=1
2517
+ stride=1
2518
+ pad=1
2519
+ activation=mish
2520
+
2521
+ # ============ End of Neck ============ #
2522
+
2523
+ # ============ Head ============ #
2524
+
2525
+ # YOLO-3
2526
+
2527
+ [route]
2528
+ layers = 293
2529
+
2530
+ [convolutional]
2531
+ batch_normalize=1
2532
+ size=3
2533
+ stride=1
2534
+ pad=1
2535
+ filters=320
2536
+ activation=mish
2537
+
2538
+ [convolutional]
2539
+ size=1
2540
+ stride=1
2541
+ pad=1
2542
+ filters=340
2543
+ activation=linear
2544
+
2545
+ [yolo]
2546
+ mask = 0,1,2,3
2547
+ anchors = 13,17, 22,25, 27,66, 55,41, 57,88, 112,69, 69,177, 136,138, 136,138, 287,114, 134,275, 268,248, 268,248, 232,504, 445,416, 640,640, 812,393, 477,808, 1070,908, 1408,1408
2548
+ classes=80
2549
+ num=20
2550
+ jitter=.3
2551
+ ignore_thresh = .7
2552
+ truth_thresh = 1
2553
+ random=1
2554
+ scale_x_y = 1.05
2555
+ iou_thresh=0.213
2556
+ cls_normalizer=1.0
2557
+ iou_normalizer=0.07
2558
+ iou_loss=ciou
2559
+ nms_kind=greedynms
2560
+ beta_nms=0.6
2561
+
2562
+
2563
+ # YOLO-4
2564
+
2565
+ [route]
2566
+ layers = 306
2567
+
2568
+ [convolutional]
2569
+ batch_normalize=1
2570
+ size=3
2571
+ stride=1
2572
+ pad=1
2573
+ filters=640
2574
+ activation=mish
2575
+
2576
+ [convolutional]
2577
+ size=1
2578
+ stride=1
2579
+ pad=1
2580
+ filters=340
2581
+ activation=linear
2582
+
2583
+ [yolo]
2584
+ mask = 4,5,6,7
2585
+ anchors = 13,17, 22,25, 27,66, 55,41, 57,88, 112,69, 69,177, 136,138, 136,138, 287,114, 134,275, 268,248, 268,248, 232,504, 445,416, 640,640, 812,393, 477,808, 1070,908, 1408,1408
2586
+ classes=80
2587
+ num=20
2588
+ jitter=.3
2589
+ ignore_thresh = .7
2590
+ truth_thresh = 1
2591
+ random=1
2592
+ scale_x_y = 1.05
2593
+ iou_thresh=0.213
2594
+ cls_normalizer=1.0
2595
+ iou_normalizer=0.07
2596
+ iou_loss=ciou
2597
+ nms_kind=greedynms
2598
+ beta_nms=0.6
2599
+
2600
+
2601
+ # YOLO-5
2602
+
2603
+ [route]
2604
+ layers = 319
2605
+
2606
+ [convolutional]
2607
+ batch_normalize=1
2608
+ size=3
2609
+ stride=1
2610
+ pad=1
2611
+ filters=1280
2612
+ activation=mish
2613
+
2614
+ [convolutional]
2615
+ size=1
2616
+ stride=1
2617
+ pad=1
2618
+ filters=340
2619
+ activation=linear
2620
+
2621
+ [yolo]
2622
+ mask = 8,9,10,11
2623
+ anchors = 13,17, 22,25, 27,66, 55,41, 57,88, 112,69, 69,177, 136,138, 136,138, 287,114, 134,275, 268,248, 268,248, 232,504, 445,416, 640,640, 812,393, 477,808, 1070,908, 1408,1408
2624
+ classes=80
2625
+ num=20
2626
+ jitter=.3
2627
+ ignore_thresh = .7
2628
+ truth_thresh = 1
2629
+ random=1
2630
+ scale_x_y = 1.05
2631
+ iou_thresh=0.213
2632
+ cls_normalizer=1.0
2633
+ iou_normalizer=0.07
2634
+ iou_loss=ciou
2635
+ nms_kind=greedynms
2636
+ beta_nms=0.6
2637
+
2638
+
2639
+ # YOLO-6
2640
+
2641
+ [route]
2642
+ layers = 332
2643
+
2644
+ [convolutional]
2645
+ batch_normalize=1
2646
+ size=3
2647
+ stride=1
2648
+ pad=1
2649
+ filters=1280
2650
+ activation=mish
2651
+
2652
+ [convolutional]
2653
+ size=1
2654
+ stride=1
2655
+ pad=1
2656
+ filters=340
2657
+ activation=linear
2658
+
2659
+ [yolo]
2660
+ mask = 12,13,14,15
2661
+ anchors = 13,17, 22,25, 27,66, 55,41, 57,88, 112,69, 69,177, 136,138, 136,138, 287,114, 134,275, 268,248, 268,248, 232,504, 445,416, 640,640, 812,393, 477,808, 1070,908, 1408,1408
2662
+ classes=80
2663
+ num=20
2664
+ jitter=.3
2665
+ ignore_thresh = .7
2666
+ truth_thresh = 1
2667
+ random=1
2668
+ scale_x_y = 1.05
2669
+ iou_thresh=0.213
2670
+ cls_normalizer=1.0
2671
+ iou_normalizer=0.07
2672
+ iou_loss=ciou
2673
+ nms_kind=greedynms
2674
+ beta_nms=0.6
2675
+
2676
+
2677
+ # YOLO-7
2678
+
2679
+ [route]
2680
+ layers = 345
2681
+
2682
+ [convolutional]
2683
+ batch_normalize=1
2684
+ size=3
2685
+ stride=1
2686
+ pad=1
2687
+ filters=1280
2688
+ activation=mish
2689
+
2690
+ [convolutional]
2691
+ size=1
2692
+ stride=1
2693
+ pad=1
2694
+ filters=340
2695
+ activation=linear
2696
+
2697
+ [yolo]
2698
+ mask = 16,17,18,19
2699
+ anchors = 13,17, 22,25, 27,66, 55,41, 57,88, 112,69, 69,177, 136,138, 136,138, 287,114, 134,275, 268,248, 268,248, 232,504, 445,416, 640,640, 812,393, 477,808, 1070,908, 1408,1408
2700
+ classes=80
2701
+ num=20
2702
+ jitter=.3
2703
+ ignore_thresh = .7
2704
+ truth_thresh = 1
2705
+ random=1
2706
+ scale_x_y = 1.05
2707
+ iou_thresh=0.213
2708
+ cls_normalizer=1.0
2709
+ iou_normalizer=0.07
2710
+ iou_loss=ciou
2711
+ nms_kind=greedynms
2712
+ beta_nms=0.6
2713
+
2714
+ # ============ End of Head ============ #
darknet/README.md ADDED
@@ -0,0 +1,63 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ## Model Zoo
2
+
3
+ | Model | Test Size | AP<sup>val</sup> | AP<sub>50</sub><sup>val</sup> | AP<sub>75</sub><sup>val</sup> | AP<sub>S</sub><sup>val</sup> | AP<sub>M</sub><sup>val</sup> | AP<sub>L</sub><sup>val</sup> | batch1 throughput |
4
+ | :-- | :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: |
5
+ | **YOLOv4-CSP** | 640 | **49.1%** | **67.7%** | **53.8%** | **32.1%** | **54.4%** | **63.2%** | 76 *fps* |
6
+ | **YOLOR-CSP** | 640 | **49.2%** | **67.6%** | **53.7%** | **32.9%** | **54.4%** | **63.0%** | - |
7
+ | | | | | | | |
8
+ | **YOLOv4-CSP-X** | 640 | **50.9%** | **69.3%** | **55.4%** | **35.3%** | **55.8%** | **64.8%** | 53 *fps* |
9
+ | **YOLOR-CSP-X** | 640 | **51.1%** | **69.6%** | **55.7%** | **35.7%** | **56.0%** | **65.2%** | - |
10
+ | | | | | | | |
11
+
12
+ ## Installation
13
+
14
+ https://github.com/AlexeyAB/darknet
15
+
16
+ Docker environment (recommended)
17
+ <details><summary> <b>Expand</b> </summary>
18
+
19
+ ```
20
+ # get code
21
+ git clone https://github.com/AlexeyAB/darknet
22
+
23
+ # create the docker container, you can change the share memory size if you have more.
24
+ nvidia-docker run --name yolor -it -v your_coco_path/:/coco/ -v your_code_path/:/yolor --shm-size=64g nvcr.io/nvidia/pytorch:21.02-py3
25
+
26
+ # apt install required packages
27
+ apt update
28
+ apt install -y libopencv-dev
29
+
30
+ # edit Makefile
31
+ #GPU=1
32
+ #CUDNN=1
33
+ #CUDNN_HALF=1
34
+ #OPENCV=1
35
+ #AVX=1
36
+ #OPENMP=1
37
+ #LIBSO=1
38
+ #ZED_CAMERA=0
39
+ #ZED_CAMERA_v2_8=0
40
+ #
41
+ #USE_CPP=0
42
+ #DEBUG=0
43
+ #
44
+ #ARCH= -gencode arch=compute_52,code=[sm_70,compute_70] \
45
+ # -gencode arch=compute_61,code=[sm_75,compute_75] \
46
+ # -gencode arch=compute_61,code=[sm_80,compute_80] \
47
+ # -gencode arch=compute_61,code=[sm_86,compute_86]
48
+ #
49
+ #...
50
+
51
+ # build
52
+ make -j8
53
+ ```
54
+
55
+ </details>
56
+
57
+ ## Testing
58
+
59
+ To reproduce inference speed, using:
60
+
61
+ ```
62
+ CUDA_VISIBLE_DEVICES=0 ./darknet detector demo cfg/coco.data cfg/yolov4-csp.cfg weights/yolov4-csp.weights source/test.mp4 -dont_show -benchmark
63
+ ```
darknet/cfg/yolov4-csp-x.cfg ADDED
@@ -0,0 +1,1555 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ [net]
3
+ # Testing
4
+ #batch=1
5
+ #subdivisions=1
6
+ # Training
7
+ batch=64
8
+ subdivisions=8
9
+ width=640
10
+ height=640
11
+ channels=3
12
+ momentum=0.949
13
+ decay=0.0005
14
+ angle=0
15
+ saturation = 1.5
16
+ exposure = 1.5
17
+ hue=.1
18
+
19
+ learning_rate=0.001
20
+ burn_in=1000
21
+ max_batches = 500500
22
+ policy=steps
23
+ steps=400000,450000
24
+ scales=.1,.1
25
+
26
+ mosaic=1
27
+
28
+ letter_box=1
29
+
30
+ ema_alpha=0.9998
31
+
32
+ #optimized_memory=1
33
+
34
+
35
+ # ============ Backbone ============ #
36
+
37
+ # Stem
38
+
39
+ # 0
40
+ [convolutional]
41
+ batch_normalize=1
42
+ filters=32
43
+ size=3
44
+ stride=1
45
+ pad=1
46
+ activation=swish
47
+
48
+ # P1
49
+
50
+ # Downsample
51
+
52
+ [convolutional]
53
+ batch_normalize=1
54
+ filters=80
55
+ size=3
56
+ stride=2
57
+ pad=1
58
+ activation=swish
59
+
60
+ # Residual Block
61
+
62
+ [convolutional]
63
+ batch_normalize=1
64
+ filters=40
65
+ size=1
66
+ stride=1
67
+ pad=1
68
+ activation=swish
69
+
70
+ [convolutional]
71
+ batch_normalize=1
72
+ filters=80
73
+ size=3
74
+ stride=1
75
+ pad=1
76
+ activation=swish
77
+
78
+ # 4 (previous+1+3k)
79
+ [shortcut]
80
+ from=-3
81
+ activation=linear
82
+
83
+ # P2
84
+
85
+ # Downsample
86
+
87
+ [convolutional]
88
+ batch_normalize=1
89
+ filters=160
90
+ size=3
91
+ stride=2
92
+ pad=1
93
+ activation=swish
94
+
95
+ # Split
96
+
97
+ [convolutional]
98
+ batch_normalize=1
99
+ filters=80
100
+ size=1
101
+ stride=1
102
+ pad=1
103
+ activation=swish
104
+
105
+ [route]
106
+ layers = -2
107
+
108
+ [convolutional]
109
+ batch_normalize=1
110
+ filters=80
111
+ size=1
112
+ stride=1
113
+ pad=1
114
+ activation=swish
115
+
116
+ # Residual Block
117
+
118
+ [convolutional]
119
+ batch_normalize=1
120
+ filters=80
121
+ size=1
122
+ stride=1
123
+ pad=1
124
+ activation=swish
125
+
126
+ [convolutional]
127
+ batch_normalize=1
128
+ filters=80
129
+ size=3
130
+ stride=1
131
+ pad=1
132
+ activation=swish
133
+
134
+ [shortcut]
135
+ from=-3
136
+ activation=linear
137
+
138
+ [convolutional]
139
+ batch_normalize=1
140
+ filters=80
141
+ size=1
142
+ stride=1
143
+ pad=1
144
+ activation=swish
145
+
146
+ [convolutional]
147
+ batch_normalize=1
148
+ filters=80
149
+ size=3
150
+ stride=1
151
+ pad=1
152
+ activation=swish
153
+
154
+ [shortcut]
155
+ from=-3
156
+ activation=linear
157
+
158
+ [convolutional]
159
+ batch_normalize=1
160
+ filters=80
161
+ size=1
162
+ stride=1
163
+ pad=1
164
+ activation=swish
165
+
166
+ [convolutional]
167
+ batch_normalize=1
168
+ filters=80
169
+ size=3
170
+ stride=1
171
+ pad=1
172
+ activation=swish
173
+
174
+ [shortcut]
175
+ from=-3
176
+ activation=linear
177
+
178
+ # Transition first
179
+
180
+ [convolutional]
181
+ batch_normalize=1
182
+ filters=80
183
+ size=1
184
+ stride=1
185
+ pad=1
186
+ activation=swish
187
+
188
+ # Merge [-1, -(3k+4)]
189
+
190
+ [route]
191
+ layers = -1,-13
192
+
193
+ # Transition last
194
+
195
+ # 20 (previous+7+3k)
196
+ [convolutional]
197
+ batch_normalize=1
198
+ filters=160
199
+ size=1
200
+ stride=1
201
+ pad=1
202
+ activation=swish
203
+
204
+ # P3
205
+
206
+ # Downsample
207
+
208
+ [convolutional]
209
+ batch_normalize=1
210
+ filters=320
211
+ size=3
212
+ stride=2
213
+ pad=1
214
+ activation=swish
215
+
216
+ # Split
217
+
218
+ [convolutional]
219
+ batch_normalize=1
220
+ filters=160
221
+ size=1
222
+ stride=1
223
+ pad=1
224
+ activation=swish
225
+
226
+ [route]
227
+ layers = -2
228
+
229
+ [convolutional]
230
+ batch_normalize=1
231
+ filters=160
232
+ size=1
233
+ stride=1
234
+ pad=1
235
+ activation=swish
236
+
237
+ # Residual Block
238
+
239
+ [convolutional]
240
+ batch_normalize=1
241
+ filters=160
242
+ size=1
243
+ stride=1
244
+ pad=1
245
+ activation=swish
246
+
247
+ [convolutional]
248
+ batch_normalize=1
249
+ filters=160
250
+ size=3
251
+ stride=1
252
+ pad=1
253
+ activation=swish
254
+
255
+ [shortcut]
256
+ from=-3
257
+ activation=linear
258
+
259
+ [convolutional]
260
+ batch_normalize=1
261
+ filters=160
262
+ size=1
263
+ stride=1
264
+ pad=1
265
+ activation=swish
266
+
267
+ [convolutional]
268
+ batch_normalize=1
269
+ filters=160
270
+ size=3
271
+ stride=1
272
+ pad=1
273
+ activation=swish
274
+
275
+ [shortcut]
276
+ from=-3
277
+ activation=linear
278
+
279
+ [convolutional]
280
+ batch_normalize=1
281
+ filters=160
282
+ size=1
283
+ stride=1
284
+ pad=1
285
+ activation=swish
286
+
287
+ [convolutional]
288
+ batch_normalize=1
289
+ filters=160
290
+ size=3
291
+ stride=1
292
+ pad=1
293
+ activation=swish
294
+
295
+ [shortcut]
296
+ from=-3
297
+ activation=linear
298
+
299
+ [convolutional]
300
+ batch_normalize=1
301
+ filters=160
302
+ size=1
303
+ stride=1
304
+ pad=1
305
+ activation=swish
306
+
307
+ [convolutional]
308
+ batch_normalize=1
309
+ filters=160
310
+ size=3
311
+ stride=1
312
+ pad=1
313
+ activation=swish
314
+
315
+ [shortcut]
316
+ from=-3
317
+ activation=linear
318
+
319
+ [convolutional]
320
+ batch_normalize=1
321
+ filters=160
322
+ size=1
323
+ stride=1
324
+ pad=1
325
+ activation=swish
326
+
327
+ [convolutional]
328
+ batch_normalize=1
329
+ filters=160
330
+ size=3
331
+ stride=1
332
+ pad=1
333
+ activation=swish
334
+
335
+ [shortcut]
336
+ from=-3
337
+ activation=linear
338
+
339
+ [convolutional]
340
+ batch_normalize=1
341
+ filters=160
342
+ size=1
343
+ stride=1
344
+ pad=1
345
+ activation=swish
346
+
347
+ [convolutional]
348
+ batch_normalize=1
349
+ filters=160
350
+ size=3
351
+ stride=1
352
+ pad=1
353
+ activation=swish
354
+
355
+ [shortcut]
356
+ from=-3
357
+ activation=linear
358
+
359
+ [convolutional]
360
+ batch_normalize=1
361
+ filters=160
362
+ size=1
363
+ stride=1
364
+ pad=1
365
+ activation=swish
366
+
367
+ [convolutional]
368
+ batch_normalize=1
369
+ filters=160
370
+ size=3
371
+ stride=1
372
+ pad=1
373
+ activation=swish
374
+
375
+ [shortcut]
376
+ from=-3
377
+ activation=linear
378
+
379
+ [convolutional]
380
+ batch_normalize=1
381
+ filters=160
382
+ size=1
383
+ stride=1
384
+ pad=1
385
+ activation=swish
386
+
387
+ [convolutional]
388
+ batch_normalize=1
389
+ filters=160
390
+ size=3
391
+ stride=1
392
+ pad=1
393
+ activation=swish
394
+
395
+ [shortcut]
396
+ from=-3
397
+ activation=linear
398
+
399
+ [convolutional]
400
+ batch_normalize=1
401
+ filters=160
402
+ size=1
403
+ stride=1
404
+ pad=1
405
+ activation=swish
406
+
407
+ [convolutional]
408
+ batch_normalize=1
409
+ filters=160
410
+ size=3
411
+ stride=1
412
+ pad=1
413
+ activation=swish
414
+
415
+ [shortcut]
416
+ from=-3
417
+ activation=linear
418
+
419
+ [convolutional]
420
+ batch_normalize=1
421
+ filters=160
422
+ size=1
423
+ stride=1
424
+ pad=1
425
+ activation=swish
426
+
427
+ [convolutional]
428
+ batch_normalize=1
429
+ filters=160
430
+ size=3
431
+ stride=1
432
+ pad=1
433
+ activation=swish
434
+
435
+ [shortcut]
436
+ from=-3
437
+ activation=linear
438
+
439
+ # Transition first
440
+
441
+ [convolutional]
442
+ batch_normalize=1
443
+ filters=160
444
+ size=1
445
+ stride=1
446
+ pad=1
447
+ activation=swish
448
+
449
+ # Merge [-1 -(4+3k)]
450
+
451
+ [route]
452
+ layers = -1,-34
453
+
454
+ # Transition last
455
+
456
+ # 57 (previous+7+3k)
457
+ [convolutional]
458
+ batch_normalize=1
459
+ filters=320
460
+ size=1
461
+ stride=1
462
+ pad=1
463
+ activation=swish
464
+
465
+ # P4
466
+
467
+ # Downsample
468
+
469
+ [convolutional]
470
+ batch_normalize=1
471
+ filters=640
472
+ size=3
473
+ stride=2
474
+ pad=1
475
+ activation=swish
476
+
477
+ # Split
478
+
479
+ [convolutional]
480
+ batch_normalize=1
481
+ filters=320
482
+ size=1
483
+ stride=1
484
+ pad=1
485
+ activation=swish
486
+
487
+ [route]
488
+ layers = -2
489
+
490
+ [convolutional]
491
+ batch_normalize=1
492
+ filters=320
493
+ size=1
494
+ stride=1
495
+ pad=1
496
+ activation=swish
497
+
498
+ # Residual Block
499
+
500
+ [convolutional]
501
+ batch_normalize=1
502
+ filters=320
503
+ size=1
504
+ stride=1
505
+ pad=1
506
+ activation=swish
507
+
508
+ [convolutional]
509
+ batch_normalize=1
510
+ filters=320
511
+ size=3
512
+ stride=1
513
+ pad=1
514
+ activation=swish
515
+
516
+ [shortcut]
517
+ from=-3
518
+ activation=linear
519
+
520
+ [convolutional]
521
+ batch_normalize=1
522
+ filters=320
523
+ size=1
524
+ stride=1
525
+ pad=1
526
+ activation=swish
527
+
528
+ [convolutional]
529
+ batch_normalize=1
530
+ filters=320
531
+ size=3
532
+ stride=1
533
+ pad=1
534
+ activation=swish
535
+
536
+ [shortcut]
537
+ from=-3
538
+ activation=linear
539
+
540
+ [convolutional]
541
+ batch_normalize=1
542
+ filters=320
543
+ size=1
544
+ stride=1
545
+ pad=1
546
+ activation=swish
547
+
548
+ [convolutional]
549
+ batch_normalize=1
550
+ filters=320
551
+ size=3
552
+ stride=1
553
+ pad=1
554
+ activation=swish
555
+
556
+ [shortcut]
557
+ from=-3
558
+ activation=linear
559
+
560
+ [convolutional]
561
+ batch_normalize=1
562
+ filters=320
563
+ size=1
564
+ stride=1
565
+ pad=1
566
+ activation=swish
567
+
568
+ [convolutional]
569
+ batch_normalize=1
570
+ filters=320
571
+ size=3
572
+ stride=1
573
+ pad=1
574
+ activation=swish
575
+
576
+ [shortcut]
577
+ from=-3
578
+ activation=linear
579
+
580
+ [convolutional]
581
+ batch_normalize=1
582
+ filters=320
583
+ size=1
584
+ stride=1
585
+ pad=1
586
+ activation=swish
587
+
588
+ [convolutional]
589
+ batch_normalize=1
590
+ filters=320
591
+ size=3
592
+ stride=1
593
+ pad=1
594
+ activation=swish
595
+
596
+ [shortcut]
597
+ from=-3
598
+ activation=linear
599
+
600
+ [convolutional]
601
+ batch_normalize=1
602
+ filters=320
603
+ size=1
604
+ stride=1
605
+ pad=1
606
+ activation=swish
607
+
608
+ [convolutional]
609
+ batch_normalize=1
610
+ filters=320
611
+ size=3
612
+ stride=1
613
+ pad=1
614
+ activation=swish
615
+
616
+ [shortcut]
617
+ from=-3
618
+ activation=linear
619
+
620
+ [convolutional]
621
+ batch_normalize=1
622
+ filters=320
623
+ size=1
624
+ stride=1
625
+ pad=1
626
+ activation=swish
627
+
628
+ [convolutional]
629
+ batch_normalize=1
630
+ filters=320
631
+ size=3
632
+ stride=1
633
+ pad=1
634
+ activation=swish
635
+
636
+ [shortcut]
637
+ from=-3
638
+ activation=linear
639
+
640
+ [convolutional]
641
+ batch_normalize=1
642
+ filters=320
643
+ size=1
644
+ stride=1
645
+ pad=1
646
+ activation=swish
647
+
648
+ [convolutional]
649
+ batch_normalize=1
650
+ filters=320
651
+ size=3
652
+ stride=1
653
+ pad=1
654
+ activation=swish
655
+
656
+ [shortcut]
657
+ from=-3
658
+ activation=linear
659
+
660
+ [convolutional]
661
+ batch_normalize=1
662
+ filters=320
663
+ size=1
664
+ stride=1
665
+ pad=1
666
+ activation=swish
667
+
668
+ [convolutional]
669
+ batch_normalize=1
670
+ filters=320
671
+ size=3
672
+ stride=1
673
+ pad=1
674
+ activation=swish
675
+
676
+ [shortcut]
677
+ from=-3
678
+ activation=linear
679
+
680
+ [convolutional]
681
+ batch_normalize=1
682
+ filters=320
683
+ size=1
684
+ stride=1
685
+ pad=1
686
+ activation=swish
687
+
688
+ [convolutional]
689
+ batch_normalize=1
690
+ filters=320
691
+ size=3
692
+ stride=1
693
+ pad=1
694
+ activation=swish
695
+
696
+ [shortcut]
697
+ from=-3
698
+ activation=linear
699
+
700
+ # Transition first
701
+
702
+ [convolutional]
703
+ batch_normalize=1
704
+ filters=320
705
+ size=1
706
+ stride=1
707
+ pad=1
708
+ activation=swish
709
+
710
+ # Merge [-1 -(3k+4)]
711
+
712
+ [route]
713
+ layers = -1,-34
714
+
715
+ # Transition last
716
+
717
+ # 94 (previous+7+3k)
718
+ [convolutional]
719
+ batch_normalize=1
720
+ filters=640
721
+ size=1
722
+ stride=1
723
+ pad=1
724
+ activation=swish
725
+
726
+ # P5
727
+
728
+ # Downsample
729
+
730
+ [convolutional]
731
+ batch_normalize=1
732
+ filters=1280
733
+ size=3
734
+ stride=2
735
+ pad=1
736
+ activation=swish
737
+
738
+ # Split
739
+
740
+ [convolutional]
741
+ batch_normalize=1
742
+ filters=640
743
+ size=1
744
+ stride=1
745
+ pad=1
746
+ activation=swish
747
+
748
+ [route]
749
+ layers = -2
750
+
751
+ [convolutional]
752
+ batch_normalize=1
753
+ filters=640
754
+ size=1
755
+ stride=1
756
+ pad=1
757
+ activation=swish
758
+
759
+ # Residual Block
760
+
761
+ [convolutional]
762
+ batch_normalize=1
763
+ filters=640
764
+ size=1
765
+ stride=1
766
+ pad=1
767
+ activation=swish
768
+
769
+ [convolutional]
770
+ batch_normalize=1
771
+ filters=640
772
+ size=3
773
+ stride=1
774
+ pad=1
775
+ activation=swish
776
+
777
+ [shortcut]
778
+ from=-3
779
+ activation=linear
780
+
781
+ [convolutional]
782
+ batch_normalize=1
783
+ filters=640
784
+ size=1
785
+ stride=1
786
+ pad=1
787
+ activation=swish
788
+
789
+ [convolutional]
790
+ batch_normalize=1
791
+ filters=640
792
+ size=3
793
+ stride=1
794
+ pad=1
795
+ activation=swish
796
+
797
+ [shortcut]
798
+ from=-3
799
+ activation=linear
800
+
801
+ [convolutional]
802
+ batch_normalize=1
803
+ filters=640
804
+ size=1
805
+ stride=1
806
+ pad=1
807
+ activation=swish
808
+
809
+ [convolutional]
810
+ batch_normalize=1
811
+ filters=640
812
+ size=3
813
+ stride=1
814
+ pad=1
815
+ activation=swish
816
+
817
+ [shortcut]
818
+ from=-3
819
+ activation=linear
820
+
821
+ [convolutional]
822
+ batch_normalize=1
823
+ filters=640
824
+ size=1
825
+ stride=1
826
+ pad=1
827
+ activation=swish
828
+
829
+ [convolutional]
830
+ batch_normalize=1
831
+ filters=640
832
+ size=3
833
+ stride=1
834
+ pad=1
835
+ activation=swish
836
+
837
+ [shortcut]
838
+ from=-3
839
+ activation=linear
840
+
841
+ [convolutional]
842
+ batch_normalize=1
843
+ filters=640
844
+ size=1
845
+ stride=1
846
+ pad=1
847
+ activation=swish
848
+
849
+ [convolutional]
850
+ batch_normalize=1
851
+ filters=640
852
+ size=3
853
+ stride=1
854
+ pad=1
855
+ activation=swish
856
+
857
+ [shortcut]
858
+ from=-3
859
+ activation=linear
860
+
861
+ # Transition first
862
+
863
+ [convolutional]
864
+ batch_normalize=1
865
+ filters=640
866
+ size=1
867
+ stride=1
868
+ pad=1
869
+ activation=swish
870
+
871
+ # Merge [-1 -(3k+4)]
872
+
873
+ [route]
874
+ layers = -1,-19
875
+
876
+ # Transition last
877
+
878
+ # 116 (previous+7+3k)
879
+ [convolutional]
880
+ batch_normalize=1
881
+ filters=1280
882
+ size=1
883
+ stride=1
884
+ pad=1
885
+ activation=swish
886
+
887
+ # ============ End of Backbone ============ #
888
+
889
+ # ============ Neck ============ #
890
+
891
+ # CSPSPP
892
+
893
+ [convolutional]
894
+ batch_normalize=1
895
+ filters=640
896
+ size=1
897
+ stride=1
898
+ pad=1
899
+ activation=swish
900
+
901
+ [route]
902
+ layers = -2
903
+
904
+ [convolutional]
905
+ batch_normalize=1
906
+ filters=640
907
+ size=1
908
+ stride=1
909
+ pad=1
910
+ activation=swish
911
+
912
+ [convolutional]
913
+ batch_normalize=1
914
+ size=3
915
+ stride=1
916
+ pad=1
917
+ filters=640
918
+ activation=swish
919
+
920
+ [convolutional]
921
+ batch_normalize=1
922
+ filters=640
923
+ size=1
924
+ stride=1
925
+ pad=1
926
+ activation=swish
927
+
928
+ ### SPP ###
929
+ [maxpool]
930
+ stride=1
931
+ size=5
932
+
933
+ [route]
934
+ layers=-2
935
+
936
+ [maxpool]
937
+ stride=1
938
+ size=9
939
+
940
+ [route]
941
+ layers=-4
942
+
943
+ [maxpool]
944
+ stride=1
945
+ size=13
946
+
947
+ [route]
948
+ layers=-1,-3,-5,-6
949
+ ### End SPP ###
950
+
951
+ [convolutional]
952
+ batch_normalize=1
953
+ filters=640
954
+ size=1
955
+ stride=1
956
+ pad=1
957
+ activation=swish
958
+
959
+ [convolutional]
960
+ batch_normalize=1
961
+ size=3
962
+ stride=1
963
+ pad=1
964
+ filters=640
965
+ activation=swish
966
+
967
+ [convolutional]
968
+ batch_normalize=1
969
+ filters=640
970
+ size=1
971
+ stride=1
972
+ pad=1
973
+ activation=swish
974
+
975
+ [convolutional]
976
+ batch_normalize=1
977
+ size=3
978
+ stride=1
979
+ pad=1
980
+ filters=640
981
+ activation=swish
982
+
983
+ [route]
984
+ layers = -1, -15
985
+
986
+ # 133 (previous+6+5+2k)
987
+ [convolutional]
988
+ batch_normalize=1
989
+ filters=640
990
+ size=1
991
+ stride=1
992
+ pad=1
993
+ activation=swish
994
+
995
+ # End of CSPSPP
996
+
997
+
998
+ # FPN-4
999
+
1000
+ [convolutional]
1001
+ batch_normalize=1
1002
+ filters=320
1003
+ size=1
1004
+ stride=1
1005
+ pad=1
1006
+ activation=swish
1007
+
1008
+ [upsample]
1009
+ stride=2
1010
+
1011
+ [route]
1012
+ layers = 94
1013
+
1014
+ [convolutional]
1015
+ batch_normalize=1
1016
+ filters=320
1017
+ size=1
1018
+ stride=1
1019
+ pad=1
1020
+ activation=swish
1021
+
1022
+ [route]
1023
+ layers = -1, -3
1024
+
1025
+ [convolutional]
1026
+ batch_normalize=1
1027
+ filters=320
1028
+ size=1
1029
+ stride=1
1030
+ pad=1
1031
+ activation=swish
1032
+
1033
+ # Split
1034
+
1035
+ [convolutional]
1036
+ batch_normalize=1
1037
+ filters=320
1038
+ size=1
1039
+ stride=1
1040
+ pad=1
1041
+ activation=swish
1042
+
1043
+ [route]
1044
+ layers = -2
1045
+
1046
+ # Plain Block
1047
+
1048
+ [convolutional]
1049
+ batch_normalize=1
1050
+ filters=320
1051
+ size=1
1052
+ stride=1
1053
+ pad=1
1054
+ activation=swish
1055
+
1056
+ [convolutional]
1057
+ batch_normalize=1
1058
+ size=3
1059
+ stride=1
1060
+ pad=1
1061
+ filters=320
1062
+ activation=swish
1063
+
1064
+ [convolutional]
1065
+ batch_normalize=1
1066
+ filters=320
1067
+ size=1
1068
+ stride=1
1069
+ pad=1
1070
+ activation=swish
1071
+
1072
+ [convolutional]
1073
+ batch_normalize=1
1074
+ size=3
1075
+ stride=1
1076
+ pad=1
1077
+ filters=320
1078
+ activation=swish
1079
+
1080
+ [convolutional]
1081
+ batch_normalize=1
1082
+ filters=320
1083
+ size=1
1084
+ stride=1
1085
+ pad=1
1086
+ activation=swish
1087
+
1088
+ [convolutional]
1089
+ batch_normalize=1
1090
+ size=3
1091
+ stride=1
1092
+ pad=1
1093
+ filters=320
1094
+ activation=swish
1095
+
1096
+ # Merge [-1, -(2k+2)]
1097
+
1098
+ [route]
1099
+ layers = -1, -8
1100
+
1101
+ # Transition last
1102
+
1103
+ # 149 (previous+6+4+2k)
1104
+ [convolutional]
1105
+ batch_normalize=1
1106
+ filters=320
1107
+ size=1
1108
+ stride=1
1109
+ pad=1
1110
+ activation=swish
1111
+
1112
+
1113
+ # FPN-3
1114
+
1115
+ [convolutional]
1116
+ batch_normalize=1
1117
+ filters=160
1118
+ size=1
1119
+ stride=1
1120
+ pad=1
1121
+ activation=swish
1122
+
1123
+ [upsample]
1124
+ stride=2
1125
+
1126
+ [route]
1127
+ layers = 57
1128
+
1129
+ [convolutional]
1130
+ batch_normalize=1
1131
+ filters=160
1132
+ size=1
1133
+ stride=1
1134
+ pad=1
1135
+ activation=swish
1136
+
1137
+ [route]
1138
+ layers = -1, -3
1139
+
1140
+ [convolutional]
1141
+ batch_normalize=1
1142
+ filters=160
1143
+ size=1
1144
+ stride=1
1145
+ pad=1
1146
+ activation=swish
1147
+
1148
+ # Split
1149
+
1150
+ [convolutional]
1151
+ batch_normalize=1
1152
+ filters=160
1153
+ size=1
1154
+ stride=1
1155
+ pad=1
1156
+ activation=swish
1157
+
1158
+ [route]
1159
+ layers = -2
1160
+
1161
+ # Plain Block
1162
+
1163
+ [convolutional]
1164
+ batch_normalize=1
1165
+ filters=160
1166
+ size=1
1167
+ stride=1
1168
+ pad=1
1169
+ activation=swish
1170
+
1171
+ [convolutional]
1172
+ batch_normalize=1
1173
+ size=3
1174
+ stride=1
1175
+ pad=1
1176
+ filters=160
1177
+ activation=swish
1178
+
1179
+ [convolutional]
1180
+ batch_normalize=1
1181
+ filters=160
1182
+ size=1
1183
+ stride=1
1184
+ pad=1
1185
+ activation=swish
1186
+
1187
+ [convolutional]
1188
+ batch_normalize=1
1189
+ size=3
1190
+ stride=1
1191
+ pad=1
1192
+ filters=160
1193
+ activation=swish
1194
+
1195
+ [convolutional]
1196
+ batch_normalize=1
1197
+ filters=160
1198
+ size=1
1199
+ stride=1
1200
+ pad=1
1201
+ activation=swish
1202
+
1203
+ [convolutional]
1204
+ batch_normalize=1
1205
+ size=3
1206
+ stride=1
1207
+ pad=1
1208
+ filters=160
1209
+ activation=swish
1210
+
1211
+ # Merge [-1, -(2k+2)]
1212
+
1213
+ [route]
1214
+ layers = -1, -8
1215
+
1216
+ # Transition last
1217
+
1218
+ # 165 (previous+6+4+2k)
1219
+ [convolutional]
1220
+ batch_normalize=1
1221
+ filters=160
1222
+ size=1
1223
+ stride=1
1224
+ pad=1
1225
+ activation=swish
1226
+
1227
+
1228
+ # PAN-4
1229
+
1230
+ [convolutional]
1231
+ batch_normalize=1
1232
+ size=3
1233
+ stride=2
1234
+ pad=1
1235
+ filters=320
1236
+ activation=swish
1237
+
1238
+ [route]
1239
+ layers = -1, 149
1240
+
1241
+ [convolutional]
1242
+ batch_normalize=1
1243
+ filters=320
1244
+ size=1
1245
+ stride=1
1246
+ pad=1
1247
+ activation=swish
1248
+
1249
+ # Split
1250
+
1251
+ [convolutional]
1252
+ batch_normalize=1
1253
+ filters=320
1254
+ size=1
1255
+ stride=1
1256
+ pad=1
1257
+ activation=swish
1258
+
1259
+ [route]
1260
+ layers = -2
1261
+
1262
+ # Plain Block
1263
+
1264
+ [convolutional]
1265
+ batch_normalize=1
1266
+ filters=320
1267
+ size=1
1268
+ stride=1
1269
+ pad=1
1270
+ activation=swish
1271
+
1272
+ [convolutional]
1273
+ batch_normalize=1
1274
+ size=3
1275
+ stride=1
1276
+ pad=1
1277
+ filters=320
1278
+ activation=swish
1279
+
1280
+ [convolutional]
1281
+ batch_normalize=1
1282
+ filters=320
1283
+ size=1
1284
+ stride=1
1285
+ pad=1
1286
+ activation=swish
1287
+
1288
+ [convolutional]
1289
+ batch_normalize=1
1290
+ size=3
1291
+ stride=1
1292
+ pad=1
1293
+ filters=320
1294
+ activation=swish
1295
+
1296
+ [convolutional]
1297
+ batch_normalize=1
1298
+ filters=320
1299
+ size=1
1300
+ stride=1
1301
+ pad=1
1302
+ activation=swish
1303
+
1304
+ [convolutional]
1305
+ batch_normalize=1
1306
+ size=3
1307
+ stride=1
1308
+ pad=1
1309
+ filters=320
1310
+ activation=swish
1311
+
1312
+ [route]
1313
+ layers = -1,-8
1314
+
1315
+ # Transition last
1316
+
1317
+ # 178 (previous+3+4+2k)
1318
+ [convolutional]
1319
+ batch_normalize=1
1320
+ filters=320
1321
+ size=1
1322
+ stride=1
1323
+ pad=1
1324
+ activation=swish
1325
+
1326
+
1327
+ # PAN-5
1328
+
1329
+ [convolutional]
1330
+ batch_normalize=1
1331
+ size=3
1332
+ stride=2
1333
+ pad=1
1334
+ filters=640
1335
+ activation=swish
1336
+
1337
+ [route]
1338
+ layers = -1, 133
1339
+
1340
+ [convolutional]
1341
+ batch_normalize=1
1342
+ filters=640
1343
+ size=1
1344
+ stride=1
1345
+ pad=1
1346
+ activation=swish
1347
+
1348
+ # Split
1349
+
1350
+ [convolutional]
1351
+ batch_normalize=1
1352
+ filters=640
1353
+ size=1
1354
+ stride=1
1355
+ pad=1
1356
+ activation=swish
1357
+
1358
+ [route]
1359
+ layers = -2
1360
+
1361
+ # Plain Block
1362
+
1363
+ [convolutional]
1364
+ batch_normalize=1
1365
+ filters=640
1366
+ size=1
1367
+ stride=1
1368
+ pad=1
1369
+ activation=swish
1370
+
1371
+ [convolutional]
1372
+ batch_normalize=1
1373
+ size=3
1374
+ stride=1
1375
+ pad=1
1376
+ filters=640
1377
+ activation=swish
1378
+
1379
+ [convolutional]
1380
+ batch_normalize=1
1381
+ filters=640
1382
+ size=1
1383
+ stride=1
1384
+ pad=1
1385
+ activation=swish
1386
+
1387
+ [convolutional]
1388
+ batch_normalize=1
1389
+ size=3
1390
+ stride=1
1391
+ pad=1
1392
+ filters=640
1393
+ activation=swish
1394
+
1395
+ [convolutional]
1396
+ batch_normalize=1
1397
+ filters=640
1398
+ size=1
1399
+ stride=1
1400
+ pad=1
1401
+ activation=swish
1402
+
1403
+ [convolutional]
1404
+ batch_normalize=1
1405
+ size=3
1406
+ stride=1
1407
+ pad=1
1408
+ filters=640
1409
+ activation=swish
1410
+
1411
+ [route]
1412
+ layers = -1,-8
1413
+
1414
+ # Transition last
1415
+
1416
+ # 191 (previous+3+4+2k)
1417
+ [convolutional]
1418
+ batch_normalize=1
1419
+ filters=640
1420
+ size=1
1421
+ stride=1
1422
+ pad=1
1423
+ activation=swish
1424
+
1425
+ # ============ End of Neck ============ #
1426
+
1427
+ # ============ Head ============ #
1428
+
1429
+ # YOLO-3
1430
+
1431
+ [route]
1432
+ layers = 165
1433
+
1434
+ [convolutional]
1435
+ batch_normalize=1
1436
+ size=3
1437
+ stride=1
1438
+ pad=1
1439
+ filters=320
1440
+ activation=swish
1441
+
1442
+ [convolutional]
1443
+ size=1
1444
+ stride=1
1445
+ pad=1
1446
+ filters=255
1447
+ activation=logistic
1448
+
1449
+ [yolo]
1450
+ mask = 0,1,2
1451
+ anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401
1452
+ classes=80
1453
+ num=9
1454
+ jitter=.1
1455
+ scale_x_y = 2.0
1456
+ objectness_smooth=1
1457
+ ignore_thresh = .7
1458
+ truth_thresh = 1
1459
+ #random=1
1460
+ resize=1.5
1461
+ iou_thresh=0.2
1462
+ iou_normalizer=0.05
1463
+ cls_normalizer=0.5
1464
+ obj_normalizer=0.4
1465
+ iou_loss=ciou
1466
+ nms_kind=diounms
1467
+ beta_nms=0.6
1468
+ new_coords=1
1469
+ max_delta=2
1470
+
1471
+
1472
+ # YOLO-4
1473
+
1474
+ [route]
1475
+ layers = 178
1476
+
1477
+ [convolutional]
1478
+ batch_normalize=1
1479
+ size=3
1480
+ stride=1
1481
+ pad=1
1482
+ filters=640
1483
+ activation=swish
1484
+
1485
+ [convolutional]
1486
+ size=1
1487
+ stride=1
1488
+ pad=1
1489
+ filters=255
1490
+ activation=logistic
1491
+
1492
+ [yolo]
1493
+ mask = 3,4,5
1494
+ anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401
1495
+ classes=80
1496
+ num=9
1497
+ jitter=.1
1498
+ scale_x_y = 2.0
1499
+ objectness_smooth=1
1500
+ ignore_thresh = .7
1501
+ truth_thresh = 1
1502
+ #random=1
1503
+ resize=1.5
1504
+ iou_thresh=0.2
1505
+ iou_normalizer=0.05
1506
+ cls_normalizer=0.5
1507
+ obj_normalizer=0.4
1508
+ iou_loss=ciou
1509
+ nms_kind=diounms
1510
+ beta_nms=0.6
1511
+ new_coords=1
1512
+ max_delta=2
1513
+
1514
+
1515
+ # YOLO-5
1516
+
1517
+ [route]
1518
+ layers = 191
1519
+
1520
+ [convolutional]
1521
+ batch_normalize=1
1522
+ size=3
1523
+ stride=1
1524
+ pad=1
1525
+ filters=1280
1526
+ activation=swish
1527
+
1528
+ [convolutional]
1529
+ size=1
1530
+ stride=1
1531
+ pad=1
1532
+ filters=255
1533
+ activation=logistic
1534
+
1535
+ [yolo]
1536
+ mask = 6,7,8
1537
+ anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401
1538
+ classes=80
1539
+ num=9
1540
+ jitter=.1
1541
+ scale_x_y = 2.0
1542
+ objectness_smooth=1
1543
+ ignore_thresh = .7
1544
+ truth_thresh = 1
1545
+ #random=1
1546
+ resize=1.5
1547
+ iou_thresh=0.2
1548
+ iou_normalizer=0.05
1549
+ cls_normalizer=0.5
1550
+ obj_normalizer=0.4
1551
+ iou_loss=ciou
1552
+ nms_kind=diounms
1553
+ beta_nms=0.6
1554
+ new_coords=1
1555
+ max_delta=2
darknet/cfg/yolov4-csp.cfg ADDED
@@ -0,0 +1,1354 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [net]
2
+ # Testing
3
+ #batch=1
4
+ #subdivisions=1
5
+ # Training
6
+ batch=64
7
+ subdivisions=8
8
+ width=640
9
+ height=640
10
+ channels=3
11
+ momentum=0.949
12
+ decay=0.0005
13
+ angle=0
14
+ saturation = 1.5
15
+ exposure = 1.5
16
+ hue=.1
17
+
18
+ learning_rate=0.001
19
+ burn_in=1000
20
+ max_batches = 500500
21
+ policy=steps
22
+ steps=400000,450000
23
+ scales=.1,.1
24
+
25
+ mosaic=1
26
+
27
+ letter_box=1
28
+
29
+ ema_alpha=0.9998
30
+
31
+ #optimized_memory=1
32
+
33
+
34
+ # ============ Backbone ============ #
35
+
36
+ # Stem
37
+
38
+ # 0
39
+ [convolutional]
40
+ batch_normalize=1
41
+ filters=32
42
+ size=3
43
+ stride=1
44
+ pad=1
45
+ activation=swish
46
+
47
+ # P1
48
+
49
+ # Downsample
50
+
51
+ [convolutional]
52
+ batch_normalize=1
53
+ filters=64
54
+ size=3
55
+ stride=2
56
+ pad=1
57
+ activation=swish
58
+
59
+ # Residual Block
60
+
61
+ [convolutional]
62
+ batch_normalize=1
63
+ filters=32
64
+ size=1
65
+ stride=1
66
+ pad=1
67
+ activation=swish
68
+
69
+ [convolutional]
70
+ batch_normalize=1
71
+ filters=64
72
+ size=3
73
+ stride=1
74
+ pad=1
75
+ activation=swish
76
+
77
+ # 4 (previous+1+3k)
78
+ [shortcut]
79
+ from=-3
80
+ activation=linear
81
+
82
+ # P2
83
+
84
+ # Downsample
85
+
86
+ [convolutional]
87
+ batch_normalize=1
88
+ filters=128
89
+ size=3
90
+ stride=2
91
+ pad=1
92
+ activation=swish
93
+
94
+ # Split
95
+
96
+ [convolutional]
97
+ batch_normalize=1
98
+ filters=64
99
+ size=1
100
+ stride=1
101
+ pad=1
102
+ activation=swish
103
+
104
+ [route]
105
+ layers = -2
106
+
107
+ [convolutional]
108
+ batch_normalize=1
109
+ filters=64
110
+ size=1
111
+ stride=1
112
+ pad=1
113
+ activation=swish
114
+
115
+ # Residual Block
116
+
117
+ [convolutional]
118
+ batch_normalize=1
119
+ filters=64
120
+ size=1
121
+ stride=1
122
+ pad=1
123
+ activation=swish
124
+
125
+ [convolutional]
126
+ batch_normalize=1
127
+ filters=64
128
+ size=3
129
+ stride=1
130
+ pad=1
131
+ activation=swish
132
+
133
+ [shortcut]
134
+ from=-3
135
+ activation=linear
136
+
137
+ [convolutional]
138
+ batch_normalize=1
139
+ filters=64
140
+ size=1
141
+ stride=1
142
+ pad=1
143
+ activation=swish
144
+
145
+ [convolutional]
146
+ batch_normalize=1
147
+ filters=64
148
+ size=3
149
+ stride=1
150
+ pad=1
151
+ activation=swish
152
+
153
+ [shortcut]
154
+ from=-3
155
+ activation=linear
156
+
157
+ # Transition first
158
+
159
+ [convolutional]
160
+ batch_normalize=1
161
+ filters=64
162
+ size=1
163
+ stride=1
164
+ pad=1
165
+ activation=swish
166
+
167
+ # Merge [-1, -(3k+4)]
168
+
169
+ [route]
170
+ layers = -1,-10
171
+
172
+ # Transition last
173
+
174
+ # 17 (previous+7+3k)
175
+ [convolutional]
176
+ batch_normalize=1
177
+ filters=128
178
+ size=1
179
+ stride=1
180
+ pad=1
181
+ activation=swish
182
+
183
+ # P3
184
+
185
+ # Downsample
186
+
187
+ [convolutional]
188
+ batch_normalize=1
189
+ filters=256
190
+ size=3
191
+ stride=2
192
+ pad=1
193
+ activation=swish
194
+
195
+ # Split
196
+
197
+ [convolutional]
198
+ batch_normalize=1
199
+ filters=128
200
+ size=1
201
+ stride=1
202
+ pad=1
203
+ activation=swish
204
+
205
+ [route]
206
+ layers = -2
207
+
208
+ [convolutional]
209
+ batch_normalize=1
210
+ filters=128
211
+ size=1
212
+ stride=1
213
+ pad=1
214
+ activation=swish
215
+
216
+ # Residual Block
217
+
218
+ [convolutional]
219
+ batch_normalize=1
220
+ filters=128
221
+ size=1
222
+ stride=1
223
+ pad=1
224
+ activation=swish
225
+
226
+ [convolutional]
227
+ batch_normalize=1
228
+ filters=128
229
+ size=3
230
+ stride=1
231
+ pad=1
232
+ activation=swish
233
+
234
+ [shortcut]
235
+ from=-3
236
+ activation=linear
237
+
238
+ [convolutional]
239
+ batch_normalize=1
240
+ filters=128
241
+ size=1
242
+ stride=1
243
+ pad=1
244
+ activation=swish
245
+
246
+ [convolutional]
247
+ batch_normalize=1
248
+ filters=128
249
+ size=3
250
+ stride=1
251
+ pad=1
252
+ activation=swish
253
+
254
+ [shortcut]
255
+ from=-3
256
+ activation=linear
257
+
258
+ [convolutional]
259
+ batch_normalize=1
260
+ filters=128
261
+ size=1
262
+ stride=1
263
+ pad=1
264
+ activation=swish
265
+
266
+ [convolutional]
267
+ batch_normalize=1
268
+ filters=128
269
+ size=3
270
+ stride=1
271
+ pad=1
272
+ activation=swish
273
+
274
+ [shortcut]
275
+ from=-3
276
+ activation=linear
277
+
278
+ [convolutional]
279
+ batch_normalize=1
280
+ filters=128
281
+ size=1
282
+ stride=1
283
+ pad=1
284
+ activation=swish
285
+
286
+ [convolutional]
287
+ batch_normalize=1
288
+ filters=128
289
+ size=3
290
+ stride=1
291
+ pad=1
292
+ activation=swish
293
+
294
+ [shortcut]
295
+ from=-3
296
+ activation=linear
297
+
298
+ [convolutional]
299
+ batch_normalize=1
300
+ filters=128
301
+ size=1
302
+ stride=1
303
+ pad=1
304
+ activation=swish
305
+
306
+ [convolutional]
307
+ batch_normalize=1
308
+ filters=128
309
+ size=3
310
+ stride=1
311
+ pad=1
312
+ activation=swish
313
+
314
+ [shortcut]
315
+ from=-3
316
+ activation=linear
317
+
318
+ [convolutional]
319
+ batch_normalize=1
320
+ filters=128
321
+ size=1
322
+ stride=1
323
+ pad=1
324
+ activation=swish
325
+
326
+ [convolutional]
327
+ batch_normalize=1
328
+ filters=128
329
+ size=3
330
+ stride=1
331
+ pad=1
332
+ activation=swish
333
+
334
+ [shortcut]
335
+ from=-3
336
+ activation=linear
337
+
338
+ [convolutional]
339
+ batch_normalize=1
340
+ filters=128
341
+ size=1
342
+ stride=1
343
+ pad=1
344
+ activation=swish
345
+
346
+ [convolutional]
347
+ batch_normalize=1
348
+ filters=128
349
+ size=3
350
+ stride=1
351
+ pad=1
352
+ activation=swish
353
+
354
+ [shortcut]
355
+ from=-3
356
+ activation=linear
357
+
358
+ [convolutional]
359
+ batch_normalize=1
360
+ filters=128
361
+ size=1
362
+ stride=1
363
+ pad=1
364
+ activation=swish
365
+
366
+ [convolutional]
367
+ batch_normalize=1
368
+ filters=128
369
+ size=3
370
+ stride=1
371
+ pad=1
372
+ activation=swish
373
+
374
+ [shortcut]
375
+ from=-3
376
+ activation=linear
377
+
378
+ # Transition first
379
+
380
+ [convolutional]
381
+ batch_normalize=1
382
+ filters=128
383
+ size=1
384
+ stride=1
385
+ pad=1
386
+ activation=swish
387
+
388
+ # Merge [-1 -(4+3k)]
389
+
390
+ [route]
391
+ layers = -1,-28
392
+
393
+ # Transition last
394
+
395
+ # 48 (previous+7+3k)
396
+ [convolutional]
397
+ batch_normalize=1
398
+ filters=256
399
+ size=1
400
+ stride=1
401
+ pad=1
402
+ activation=swish
403
+
404
+ # P4
405
+
406
+ # Downsample
407
+
408
+ [convolutional]
409
+ batch_normalize=1
410
+ filters=512
411
+ size=3
412
+ stride=2
413
+ pad=1
414
+ activation=swish
415
+
416
+ # Split
417
+
418
+ [convolutional]
419
+ batch_normalize=1
420
+ filters=256
421
+ size=1
422
+ stride=1
423
+ pad=1
424
+ activation=swish
425
+
426
+ [route]
427
+ layers = -2
428
+
429
+ [convolutional]
430
+ batch_normalize=1
431
+ filters=256
432
+ size=1
433
+ stride=1
434
+ pad=1
435
+ activation=swish
436
+
437
+ # Residual Block
438
+
439
+ [convolutional]
440
+ batch_normalize=1
441
+ filters=256
442
+ size=1
443
+ stride=1
444
+ pad=1
445
+ activation=swish
446
+
447
+ [convolutional]
448
+ batch_normalize=1
449
+ filters=256
450
+ size=3
451
+ stride=1
452
+ pad=1
453
+ activation=swish
454
+
455
+ [shortcut]
456
+ from=-3
457
+ activation=linear
458
+
459
+ [convolutional]
460
+ batch_normalize=1
461
+ filters=256
462
+ size=1
463
+ stride=1
464
+ pad=1
465
+ activation=swish
466
+
467
+ [convolutional]
468
+ batch_normalize=1
469
+ filters=256
470
+ size=3
471
+ stride=1
472
+ pad=1
473
+ activation=swish
474
+
475
+ [shortcut]
476
+ from=-3
477
+ activation=linear
478
+
479
+ [convolutional]
480
+ batch_normalize=1
481
+ filters=256
482
+ size=1
483
+ stride=1
484
+ pad=1
485
+ activation=swish
486
+
487
+ [convolutional]
488
+ batch_normalize=1
489
+ filters=256
490
+ size=3
491
+ stride=1
492
+ pad=1
493
+ activation=swish
494
+
495
+ [shortcut]
496
+ from=-3
497
+ activation=linear
498
+
499
+ [convolutional]
500
+ batch_normalize=1
501
+ filters=256
502
+ size=1
503
+ stride=1
504
+ pad=1
505
+ activation=swish
506
+
507
+ [convolutional]
508
+ batch_normalize=1
509
+ filters=256
510
+ size=3
511
+ stride=1
512
+ pad=1
513
+ activation=swish
514
+
515
+ [shortcut]
516
+ from=-3
517
+ activation=linear
518
+
519
+ [convolutional]
520
+ batch_normalize=1
521
+ filters=256
522
+ size=1
523
+ stride=1
524
+ pad=1
525
+ activation=swish
526
+
527
+ [convolutional]
528
+ batch_normalize=1
529
+ filters=256
530
+ size=3
531
+ stride=1
532
+ pad=1
533
+ activation=swish
534
+
535
+ [shortcut]
536
+ from=-3
537
+ activation=linear
538
+
539
+ [convolutional]
540
+ batch_normalize=1
541
+ filters=256
542
+ size=1
543
+ stride=1
544
+ pad=1
545
+ activation=swish
546
+
547
+ [convolutional]
548
+ batch_normalize=1
549
+ filters=256
550
+ size=3
551
+ stride=1
552
+ pad=1
553
+ activation=swish
554
+
555
+ [shortcut]
556
+ from=-3
557
+ activation=linear
558
+
559
+ [convolutional]
560
+ batch_normalize=1
561
+ filters=256
562
+ size=1
563
+ stride=1
564
+ pad=1
565
+ activation=swish
566
+
567
+ [convolutional]
568
+ batch_normalize=1
569
+ filters=256
570
+ size=3
571
+ stride=1
572
+ pad=1
573
+ activation=swish
574
+
575
+ [shortcut]
576
+ from=-3
577
+ activation=linear
578
+
579
+ [convolutional]
580
+ batch_normalize=1
581
+ filters=256
582
+ size=1
583
+ stride=1
584
+ pad=1
585
+ activation=swish
586
+
587
+ [convolutional]
588
+ batch_normalize=1
589
+ filters=256
590
+ size=3
591
+ stride=1
592
+ pad=1
593
+ activation=swish
594
+
595
+ [shortcut]
596
+ from=-3
597
+ activation=linear
598
+
599
+ # Transition first
600
+
601
+ [convolutional]
602
+ batch_normalize=1
603
+ filters=256
604
+ size=1
605
+ stride=1
606
+ pad=1
607
+ activation=swish
608
+
609
+ # Merge [-1 -(3k+4)]
610
+
611
+ [route]
612
+ layers = -1,-28
613
+
614
+ # Transition last
615
+
616
+ # 79 (previous+7+3k)
617
+ [convolutional]
618
+ batch_normalize=1
619
+ filters=512
620
+ size=1
621
+ stride=1
622
+ pad=1
623
+ activation=swish
624
+
625
+ # P5
626
+
627
+ # Downsample
628
+
629
+ [convolutional]
630
+ batch_normalize=1
631
+ filters=1024
632
+ size=3
633
+ stride=2
634
+ pad=1
635
+ activation=swish
636
+
637
+ # Split
638
+
639
+ [convolutional]
640
+ batch_normalize=1
641
+ filters=512
642
+ size=1
643
+ stride=1
644
+ pad=1
645
+ activation=swish
646
+
647
+ [route]
648
+ layers = -2
649
+
650
+ [convolutional]
651
+ batch_normalize=1
652
+ filters=512
653
+ size=1
654
+ stride=1
655
+ pad=1
656
+ activation=swish
657
+
658
+ # Residual Block
659
+
660
+ [convolutional]
661
+ batch_normalize=1
662
+ filters=512
663
+ size=1
664
+ stride=1
665
+ pad=1
666
+ activation=swish
667
+
668
+ [convolutional]
669
+ batch_normalize=1
670
+ filters=512
671
+ size=3
672
+ stride=1
673
+ pad=1
674
+ activation=swish
675
+
676
+ [shortcut]
677
+ from=-3
678
+ activation=linear
679
+
680
+ [convolutional]
681
+ batch_normalize=1
682
+ filters=512
683
+ size=1
684
+ stride=1
685
+ pad=1
686
+ activation=swish
687
+
688
+ [convolutional]
689
+ batch_normalize=1
690
+ filters=512
691
+ size=3
692
+ stride=1
693
+ pad=1
694
+ activation=swish
695
+
696
+ [shortcut]
697
+ from=-3
698
+ activation=linear
699
+
700
+ [convolutional]
701
+ batch_normalize=1
702
+ filters=512
703
+ size=1
704
+ stride=1
705
+ pad=1
706
+ activation=swish
707
+
708
+ [convolutional]
709
+ batch_normalize=1
710
+ filters=512
711
+ size=3
712
+ stride=1
713
+ pad=1
714
+ activation=swish
715
+
716
+ [shortcut]
717
+ from=-3
718
+ activation=linear
719
+
720
+ [convolutional]
721
+ batch_normalize=1
722
+ filters=512
723
+ size=1
724
+ stride=1
725
+ pad=1
726
+ activation=swish
727
+
728
+ [convolutional]
729
+ batch_normalize=1
730
+ filters=512
731
+ size=3
732
+ stride=1
733
+ pad=1
734
+ activation=swish
735
+
736
+ [shortcut]
737
+ from=-3
738
+ activation=linear
739
+
740
+ # Transition first
741
+
742
+ [convolutional]
743
+ batch_normalize=1
744
+ filters=512
745
+ size=1
746
+ stride=1
747
+ pad=1
748
+ activation=swish
749
+
750
+ # Merge [-1 -(3k+4)]
751
+
752
+ [route]
753
+ layers = -1,-16
754
+
755
+ # Transition last
756
+
757
+ # 98 (previous+7+3k)
758
+ [convolutional]
759
+ batch_normalize=1
760
+ filters=1024
761
+ size=1
762
+ stride=1
763
+ pad=1
764
+ activation=swish
765
+
766
+ # ============ End of Backbone ============ #
767
+
768
+ # ============ Neck ============ #
769
+
770
+ # CSPSPP
771
+
772
+ [convolutional]
773
+ batch_normalize=1
774
+ filters=512
775
+ size=1
776
+ stride=1
777
+ pad=1
778
+ activation=swish
779
+
780
+ [route]
781
+ layers = -2
782
+
783
+ [convolutional]
784
+ batch_normalize=1
785
+ filters=512
786
+ size=1
787
+ stride=1
788
+ pad=1
789
+ activation=swish
790
+
791
+ [convolutional]
792
+ batch_normalize=1
793
+ size=3
794
+ stride=1
795
+ pad=1
796
+ filters=512
797
+ activation=swish
798
+
799
+ [convolutional]
800
+ batch_normalize=1
801
+ filters=512
802
+ size=1
803
+ stride=1
804
+ pad=1
805
+ activation=swish
806
+
807
+ ### SPP ###
808
+ [maxpool]
809
+ stride=1
810
+ size=5
811
+
812
+ [route]
813
+ layers=-2
814
+
815
+ [maxpool]
816
+ stride=1
817
+ size=9
818
+
819
+ [route]
820
+ layers=-4
821
+
822
+ [maxpool]
823
+ stride=1
824
+ size=13
825
+
826
+ [route]
827
+ layers=-1,-3,-5,-6
828
+ ### End SPP ###
829
+
830
+ [convolutional]
831
+ batch_normalize=1
832
+ filters=512
833
+ size=1
834
+ stride=1
835
+ pad=1
836
+ activation=swish
837
+
838
+ [convolutional]
839
+ batch_normalize=1
840
+ size=3
841
+ stride=1
842
+ pad=1
843
+ filters=512
844
+ activation=swish
845
+
846
+ [route]
847
+ layers = -1, -13
848
+
849
+ # 113 (previous+6+5+2k)
850
+ [convolutional]
851
+ batch_normalize=1
852
+ filters=512
853
+ size=1
854
+ stride=1
855
+ pad=1
856
+ activation=swish
857
+
858
+ # End of CSPSPP
859
+
860
+
861
+ # FPN-4
862
+
863
+ [convolutional]
864
+ batch_normalize=1
865
+ filters=256
866
+ size=1
867
+ stride=1
868
+ pad=1
869
+ activation=swish
870
+
871
+ [upsample]
872
+ stride=2
873
+
874
+ [route]
875
+ layers = 79
876
+
877
+ [convolutional]
878
+ batch_normalize=1
879
+ filters=256
880
+ size=1
881
+ stride=1
882
+ pad=1
883
+ activation=swish
884
+
885
+ [route]
886
+ layers = -1, -3
887
+
888
+ [convolutional]
889
+ batch_normalize=1
890
+ filters=256
891
+ size=1
892
+ stride=1
893
+ pad=1
894
+ activation=swish
895
+
896
+ # Split
897
+
898
+ [convolutional]
899
+ batch_normalize=1
900
+ filters=256
901
+ size=1
902
+ stride=1
903
+ pad=1
904
+ activation=swish
905
+
906
+ [route]
907
+ layers = -2
908
+
909
+ # Plain Block
910
+
911
+ [convolutional]
912
+ batch_normalize=1
913
+ filters=256
914
+ size=1
915
+ stride=1
916
+ pad=1
917
+ activation=swish
918
+
919
+ [convolutional]
920
+ batch_normalize=1
921
+ size=3
922
+ stride=1
923
+ pad=1
924
+ filters=256
925
+ activation=swish
926
+
927
+ [convolutional]
928
+ batch_normalize=1
929
+ filters=256
930
+ size=1
931
+ stride=1
932
+ pad=1
933
+ activation=swish
934
+
935
+ [convolutional]
936
+ batch_normalize=1
937
+ size=3
938
+ stride=1
939
+ pad=1
940
+ filters=256
941
+ activation=swish
942
+
943
+ # Merge [-1, -(2k+2)]
944
+
945
+ [route]
946
+ layers = -1, -6
947
+
948
+ # Transition last
949
+
950
+ # 127 (previous+6+4+2k)
951
+ [convolutional]
952
+ batch_normalize=1
953
+ filters=256
954
+ size=1
955
+ stride=1
956
+ pad=1
957
+ activation=swish
958
+
959
+
960
+ # FPN-3
961
+
962
+ [convolutional]
963
+ batch_normalize=1
964
+ filters=128
965
+ size=1
966
+ stride=1
967
+ pad=1
968
+ activation=swish
969
+
970
+ [upsample]
971
+ stride=2
972
+
973
+ [route]
974
+ layers = 48
975
+
976
+ [convolutional]
977
+ batch_normalize=1
978
+ filters=128
979
+ size=1
980
+ stride=1
981
+ pad=1
982
+ activation=swish
983
+
984
+ [route]
985
+ layers = -1, -3
986
+
987
+ [convolutional]
988
+ batch_normalize=1
989
+ filters=128
990
+ size=1
991
+ stride=1
992
+ pad=1
993
+ activation=swish
994
+
995
+ # Split
996
+
997
+ [convolutional]
998
+ batch_normalize=1
999
+ filters=128
1000
+ size=1
1001
+ stride=1
1002
+ pad=1
1003
+ activation=swish
1004
+
1005
+ [route]
1006
+ layers = -2
1007
+
1008
+ # Plain Block
1009
+
1010
+ [convolutional]
1011
+ batch_normalize=1
1012
+ filters=128
1013
+ size=1
1014
+ stride=1
1015
+ pad=1
1016
+ activation=swish
1017
+
1018
+ [convolutional]
1019
+ batch_normalize=1
1020
+ size=3
1021
+ stride=1
1022
+ pad=1
1023
+ filters=128
1024
+ activation=swish
1025
+
1026
+ [convolutional]
1027
+ batch_normalize=1
1028
+ filters=128
1029
+ size=1
1030
+ stride=1
1031
+ pad=1
1032
+ activation=swish
1033
+
1034
+ [convolutional]
1035
+ batch_normalize=1
1036
+ size=3
1037
+ stride=1
1038
+ pad=1
1039
+ filters=128
1040
+ activation=swish
1041
+
1042
+ # Merge [-1, -(2k+2)]
1043
+
1044
+ [route]
1045
+ layers = -1, -6
1046
+
1047
+ # Transition last
1048
+
1049
+ # 141 (previous+6+4+2k)
1050
+ [convolutional]
1051
+ batch_normalize=1
1052
+ filters=128
1053
+ size=1
1054
+ stride=1
1055
+ pad=1
1056
+ activation=swish
1057
+
1058
+
1059
+ # PAN-4
1060
+
1061
+ [convolutional]
1062
+ batch_normalize=1
1063
+ size=3
1064
+ stride=2
1065
+ pad=1
1066
+ filters=256
1067
+ activation=swish
1068
+
1069
+ [route]
1070
+ layers = -1, 127
1071
+
1072
+ [convolutional]
1073
+ batch_normalize=1
1074
+ filters=256
1075
+ size=1
1076
+ stride=1
1077
+ pad=1
1078
+ activation=swish
1079
+
1080
+ # Split
1081
+
1082
+ [convolutional]
1083
+ batch_normalize=1
1084
+ filters=256
1085
+ size=1
1086
+ stride=1
1087
+ pad=1
1088
+ activation=swish
1089
+
1090
+ [route]
1091
+ layers = -2
1092
+
1093
+ # Plain Block
1094
+
1095
+ [convolutional]
1096
+ batch_normalize=1
1097
+ filters=256
1098
+ size=1
1099
+ stride=1
1100
+ pad=1
1101
+ activation=swish
1102
+
1103
+ [convolutional]
1104
+ batch_normalize=1
1105
+ size=3
1106
+ stride=1
1107
+ pad=1
1108
+ filters=256
1109
+ activation=swish
1110
+
1111
+ [convolutional]
1112
+ batch_normalize=1
1113
+ filters=256
1114
+ size=1
1115
+ stride=1
1116
+ pad=1
1117
+ activation=swish
1118
+
1119
+ [convolutional]
1120
+ batch_normalize=1
1121
+ size=3
1122
+ stride=1
1123
+ pad=1
1124
+ filters=256
1125
+ activation=swish
1126
+
1127
+ [route]
1128
+ layers = -1,-6
1129
+
1130
+ # Transition last
1131
+
1132
+ # 152 (previous+3+4+2k)
1133
+ [convolutional]
1134
+ batch_normalize=1
1135
+ filters=256
1136
+ size=1
1137
+ stride=1
1138
+ pad=1
1139
+ activation=swish
1140
+
1141
+
1142
+ # PAN-5
1143
+
1144
+ [convolutional]
1145
+ batch_normalize=1
1146
+ size=3
1147
+ stride=2
1148
+ pad=1
1149
+ filters=512
1150
+ activation=swish
1151
+
1152
+ [route]
1153
+ layers = -1, 113
1154
+
1155
+ [convolutional]
1156
+ batch_normalize=1
1157
+ filters=512
1158
+ size=1
1159
+ stride=1
1160
+ pad=1
1161
+ activation=swish
1162
+
1163
+ # Split
1164
+
1165
+ [convolutional]
1166
+ batch_normalize=1
1167
+ filters=512
1168
+ size=1
1169
+ stride=1
1170
+ pad=1
1171
+ activation=swish
1172
+
1173
+ [route]
1174
+ layers = -2
1175
+
1176
+ # Plain Block
1177
+
1178
+ [convolutional]
1179
+ batch_normalize=1
1180
+ filters=512
1181
+ size=1
1182
+ stride=1
1183
+ pad=1
1184
+ activation=swish
1185
+
1186
+ [convolutional]
1187
+ batch_normalize=1
1188
+ size=3
1189
+ stride=1
1190
+ pad=1
1191
+ filters=512
1192
+ activation=swish
1193
+
1194
+ [convolutional]
1195
+ batch_normalize=1
1196
+ filters=512
1197
+ size=1
1198
+ stride=1
1199
+ pad=1
1200
+ activation=swish
1201
+
1202
+ [convolutional]
1203
+ batch_normalize=1
1204
+ size=3
1205
+ stride=1
1206
+ pad=1
1207
+ filters=512
1208
+ activation=swish
1209
+
1210
+ [route]
1211
+ layers = -1,-6
1212
+
1213
+ # Transition last
1214
+
1215
+ # 163 (previous+3+4+2k)
1216
+ [convolutional]
1217
+ batch_normalize=1
1218
+ filters=512
1219
+ size=1
1220
+ stride=1
1221
+ pad=1
1222
+ activation=swish
1223
+
1224
+ # ============ End of Neck ============ #
1225
+
1226
+ # ============ Head ============ #
1227
+
1228
+ # YOLO-3
1229
+
1230
+ [route]
1231
+ layers = 141
1232
+
1233
+ [convolutional]
1234
+ batch_normalize=1
1235
+ size=3
1236
+ stride=1
1237
+ pad=1
1238
+ filters=256
1239
+ activation=swish
1240
+
1241
+ [convolutional]
1242
+ size=1
1243
+ stride=1
1244
+ pad=1
1245
+ filters=255
1246
+ activation=logistic
1247
+
1248
+ [yolo]
1249
+ mask = 0,1,2
1250
+ anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401
1251
+ classes=80
1252
+ num=9
1253
+ jitter=.1
1254
+ scale_x_y = 2.0
1255
+ objectness_smooth=1
1256
+ ignore_thresh = .7
1257
+ truth_thresh = 1
1258
+ #random=1
1259
+ resize=1.5
1260
+ iou_thresh=0.2
1261
+ iou_normalizer=0.05
1262
+ cls_normalizer=0.5
1263
+ obj_normalizer=0.4
1264
+ iou_loss=ciou
1265
+ nms_kind=diounms
1266
+ beta_nms=0.6
1267
+ new_coords=1
1268
+ max_delta=2
1269
+
1270
+
1271
+ # YOLO-4
1272
+
1273
+ [route]
1274
+ layers = 152
1275
+
1276
+ [convolutional]
1277
+ batch_normalize=1
1278
+ size=3
1279
+ stride=1
1280
+ pad=1
1281
+ filters=512
1282
+ activation=swish
1283
+
1284
+ [convolutional]
1285
+ size=1
1286
+ stride=1
1287
+ pad=1
1288
+ filters=255
1289
+ activation=logistic
1290
+
1291
+ [yolo]
1292
+ mask = 3,4,5
1293
+ anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401
1294
+ classes=80
1295
+ num=9
1296
+ jitter=.1
1297
+ scale_x_y = 2.0
1298
+ objectness_smooth=1
1299
+ ignore_thresh = .7
1300
+ truth_thresh = 1
1301
+ #random=1
1302
+ resize=1.5
1303
+ iou_thresh=0.2
1304
+ iou_normalizer=0.05
1305
+ cls_normalizer=0.5
1306
+ obj_normalizer=0.4
1307
+ iou_loss=ciou
1308
+ nms_kind=diounms
1309
+ beta_nms=0.6
1310
+ new_coords=1
1311
+ max_delta=2
1312
+
1313
+
1314
+ # YOLO-5
1315
+
1316
+ [route]
1317
+ layers = 163
1318
+
1319
+ [convolutional]
1320
+ batch_normalize=1
1321
+ size=3
1322
+ stride=1
1323
+ pad=1
1324
+ filters=1024
1325
+ activation=swish
1326
+
1327
+ [convolutional]
1328
+ size=1
1329
+ stride=1
1330
+ pad=1
1331
+ filters=255
1332
+ activation=logistic
1333
+
1334
+ [yolo]
1335
+ mask = 6,7,8
1336
+ anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401
1337
+ classes=80
1338
+ num=9
1339
+ jitter=.1
1340
+ scale_x_y = 2.0
1341
+ objectness_smooth=1
1342
+ ignore_thresh = .7
1343
+ truth_thresh = 1
1344
+ #random=1
1345
+ resize=1.5
1346
+ iou_thresh=0.2
1347
+ iou_normalizer=0.05
1348
+ cls_normalizer=0.5
1349
+ obj_normalizer=0.4
1350
+ iou_loss=ciou
1351
+ nms_kind=diounms
1352
+ beta_nms=0.6
1353
+ new_coords=1
1354
+ max_delta=2
darknet/new_layers.md ADDED
@@ -0,0 +1,329 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ![Implicit Modeling](https://github.com/WongKinYiu/yolor/blob/main/figure/implicit_modeling.png)
2
+
3
+ ### 1. silence layer
4
+
5
+ Usage:
6
+
7
+ ```
8
+ [silence]
9
+ ```
10
+
11
+ PyTorch code:
12
+
13
+ ``` python
14
+ class Silence(nn.Module):
15
+ def __init__(self):
16
+ super(Silence, self).__init__()
17
+ def forward(self, x):
18
+ return x
19
+ ```
20
+
21
+
22
+ ### 2. implicit_add layer
23
+
24
+ Usage:
25
+
26
+ ```
27
+ [implicit_add]
28
+ filters=128
29
+ ```
30
+
31
+ PyTorch code:
32
+
33
+ ``` python
34
+ class ImplicitA(nn.Module):
35
+ def __init__(self, channel):
36
+ super(ImplicitA, self).__init__()
37
+ self.channel = channel
38
+ self.implicit = nn.Parameter(torch.zeros(1, channel, 1, 1))
39
+ nn.init.normal_(self.implicit, std=.02)
40
+
41
+ def forward(self):
42
+ return self.implicit
43
+ ```
44
+
45
+
46
+ ### 3. shift_channels layer
47
+
48
+ Usage:
49
+
50
+ ```
51
+ [shift_channels]
52
+ from=101
53
+ ```
54
+
55
+ PyTorch code:
56
+
57
+ ``` python
58
+ class ShiftChannel(nn.Module):
59
+ def __init__(self, layers):
60
+ super(ShiftChannel, self).__init__()
61
+ self.layers = layers # layer indices
62
+
63
+ def forward(self, x, outputs):
64
+ a = outputs[self.layers[0]]
65
+ return a.expand_as(x) + x
66
+ ```
67
+
68
+
69
+ ### 4. implicit_mul layer
70
+
71
+ Usage:
72
+
73
+ ```
74
+ [implicit_mul]
75
+ filters=128
76
+ ```
77
+
78
+ PyTorch code:
79
+
80
+ ``` python
81
+ class ImplicitM(nn.Module):
82
+ def __init__(self, channel):
83
+ super(ImplicitM, self).__init__()
84
+ self.channel = channel
85
+ self.implicit = nn.Parameter(torch.ones(1, channel, 1, 1))
86
+ nn.init.normal_(self.implicit, mean=1., std=.02)
87
+
88
+ def forward(self):
89
+ return self.implicit
90
+ ```
91
+
92
+
93
+ ### 5. control_channels layer
94
+
95
+ Usage:
96
+
97
+ ```
98
+ [control_channels]
99
+ from=101
100
+ ```
101
+
102
+ PyTorch code:
103
+
104
+ ``` python
105
+ class ControlChannel(nn.Module):
106
+ def __init__(self, layers):
107
+ super(ControlChannel, self).__init__()
108
+ self.layers = layers # layer indices
109
+
110
+ def forward(self, x, outputs):
111
+ a = outputs[self.layers[0]]
112
+ return a.expand_as(x) * x
113
+ ```
114
+
115
+
116
+ ### 6. implicit_cat layer
117
+
118
+ Usage:
119
+
120
+ ```
121
+ [implicit_cat]
122
+ filters=128
123
+ ```
124
+
125
+ PyTorch code: (same as ImplicitA)
126
+
127
+ ``` python
128
+ class ImplicitC(nn.Module):
129
+ def __init__(self, channel):
130
+ super(ImplicitC, self).__init__()
131
+ self.channel = channel
132
+ self.implicit = nn.Parameter(torch.zeros(1, channel, 1, 1))
133
+ nn.init.normal_(self.implicit, std=.02)
134
+
135
+ def forward(self):
136
+ return self.implicit
137
+ ```
138
+
139
+
140
+ ### 7. alternate_channels layer
141
+
142
+ Usage:
143
+
144
+ ```
145
+ [alternate_channels]
146
+ from=101
147
+ ```
148
+
149
+ PyTorch code:
150
+
151
+ ``` python
152
+ class AlternateChannel(nn.Module):
153
+ def __init__(self, layers):
154
+ super(AlternateChannel, self).__init__()
155
+ self.layers = layers # layer indices
156
+
157
+ def forward(self, x, outputs):
158
+ a = outputs[self.layers[0]]
159
+ return torch.cat([a.expand_as(x), x], dim=1)
160
+ ```
161
+
162
+
163
+ ### 8. implicit_add_2d layer
164
+
165
+ Usage:
166
+
167
+ ```
168
+ [implicit_add_2d]
169
+ filters=128
170
+ atoms=128
171
+ ```
172
+
173
+ PyTorch code:
174
+
175
+ ``` python
176
+ class Implicit2DA(nn.Module):
177
+ def __init__(self, atom, channel):
178
+ super(Implicit2DA, self).__init__()
179
+ self.channel = channel
180
+ self.implicit = nn.Parameter(torch.zeros(1, atom, channel, 1))
181
+ nn.init.normal_(self.implicit, std=.02)
182
+
183
+ def forward(self):
184
+ return self.implicit
185
+ ```
186
+
187
+
188
+ ### 9. shift_channels_2d layer
189
+
190
+ Usage:
191
+
192
+ ```
193
+ [shift_channels_2d]
194
+ from=101
195
+ ```
196
+
197
+ PyTorch code:
198
+
199
+ ``` python
200
+ class ShiftChannel2D(nn.Module):
201
+ def __init__(self, layers):
202
+ super(ShiftChannel2D, self).__init__()
203
+ self.layers = layers # layer indices
204
+
205
+ def forward(self, x, outputs):
206
+ a = outputs[self.layers[0]].view(1,-1,1,1)
207
+ return a.expand_as(x) + x
208
+ ```
209
+
210
+
211
+ ### 10. implicit_mul_2d layer
212
+
213
+ Usage:
214
+
215
+ ```
216
+ [implicit_mul_2d]
217
+ filters=128
218
+ atoms=128
219
+ ```
220
+
221
+ PyTorch code:
222
+
223
+ ``` python
224
+ class Implicit2DM(nn.Module):
225
+ def __init__(self, atom, channel):
226
+ super(Implicit2DM, self).__init__()
227
+ self.channel = channel
228
+ self.implicit = nn.Parameter(torch.ones(1, atom, channel, 1))
229
+ nn.init.normal_(self.implicit, mean=1., std=.02)
230
+
231
+ def forward(self):
232
+ return self.implicit
233
+ ```
234
+
235
+
236
+ ### 11. control_channels_2d layer
237
+
238
+ Usage:
239
+
240
+ ```
241
+ [control_channels_2d]
242
+ from=101
243
+ ```
244
+
245
+ PyTorch code:
246
+
247
+ ``` python
248
+ class ControlChannel2D(nn.Module):
249
+ def __init__(self, layers):
250
+ super(ControlChannel2D, self).__init__()
251
+ self.layers = layers # layer indices
252
+
253
+ def forward(self, x, outputs):
254
+ a = outputs[self.layers[0]].view(1,-1,1,1)
255
+ return a.expand_as(x) * x
256
+ ```
257
+
258
+
259
+ ### 12. implicit_cat_2d layer
260
+
261
+ Usage:
262
+
263
+ ```
264
+ [implicit_cat_2d]
265
+ filters=128
266
+ atoms=128
267
+ ```
268
+
269
+ PyTorch code: (same as Implicit2DA)
270
+
271
+ ``` python
272
+ class Implicit2DC(nn.Module):
273
+ def __init__(self, atom, channel):
274
+ super(Implicit2DC, self).__init__()
275
+ self.channel = channel
276
+ self.implicit = nn.Parameter(torch.zeros(1, atom, channel, 1))
277
+ nn.init.normal_(self.implicit, std=.02)
278
+
279
+ def forward(self):
280
+ return self.implicit
281
+ ```
282
+
283
+
284
+ ### 13. alternate_channels_2d layer
285
+
286
+ Usage:
287
+
288
+ ```
289
+ [alternate_channels_2d]
290
+ from=101
291
+ ```
292
+
293
+ PyTorch code:
294
+
295
+ ``` python
296
+ class AlternateChannel2D(nn.Module):
297
+ def __init__(self, layers):
298
+ super(AlternateChannel2D, self).__init__()
299
+ self.layers = layers # layer indices
300
+
301
+ def forward(self, x, outputs):
302
+ a = outputs[self.layers[0]].view(1,-1,1,1)
303
+ return torch.cat([a.expand_as(x), x], dim=1)
304
+ ```
305
+
306
+
307
+ ### 14. dwt layer
308
+
309
+ Usage:
310
+
311
+ ```
312
+ [dwt]
313
+ ```
314
+
315
+ PyTorch code:
316
+
317
+ ``` python
318
+ # https://github.com/fbcotter/pytorch_wavelets
319
+ from pytorch_wavelets import DWTForward, DWTInverse
320
+ class DWT(nn.Module):
321
+ def __init__(self):
322
+ super(DWT, self).__init__()
323
+ self.xfm = DWTForward(J=1, wave='db1', mode='zero')
324
+
325
+ def forward(self, x):
326
+ b,c,w,h = x.shape
327
+ yl, yh = self.xfm(x)
328
+ return torch.cat([yl/2., yh[0].view(b,-1,w//2,h//2)/2.+.5], 1)
329
+ ```
data/coco.names ADDED
@@ -0,0 +1,80 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ person
2
+ bicycle
3
+ car
4
+ motorcycle
5
+ airplane
6
+ bus
7
+ train
8
+ truck
9
+ boat
10
+ traffic light
11
+ fire hydrant
12
+ stop sign
13
+ parking meter
14
+ bench
15
+ bird
16
+ cat
17
+ dog
18
+ horse
19
+ sheep
20
+ cow
21
+ elephant
22
+ bear
23
+ zebra
24
+ giraffe
25
+ backpack
26
+ umbrella
27
+ handbag
28
+ tie
29
+ suitcase
30
+ frisbee
31
+ skis
32
+ snowboard
33
+ sports ball
34
+ kite
35
+ baseball bat
36
+ baseball glove
37
+ skateboard
38
+ surfboard
39
+ tennis racket
40
+ bottle
41
+ wine glass
42
+ cup
43
+ fork
44
+ knife
45
+ spoon
46
+ bowl
47
+ banana
48
+ apple
49
+ sandwich
50
+ orange
51
+ broccoli
52
+ carrot
53
+ hot dog
54
+ pizza
55
+ donut
56
+ cake
57
+ chair
58
+ couch
59
+ potted plant
60
+ bed
61
+ dining table
62
+ toilet
63
+ tv
64
+ laptop
65
+ mouse
66
+ remote
67
+ keyboard
68
+ cell phone
69
+ microwave
70
+ oven
71
+ toaster
72
+ sink
73
+ refrigerator
74
+ book
75
+ clock
76
+ vase
77
+ scissors
78
+ teddy bear
79
+ hair drier
80
+ toothbrush
data/coco.yaml ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # train and val datasets (image directory or *.txt file with image paths)
2
+ train: ../coco/train2017.txt # 118k images
3
+ val: ../coco/val2017.txt # 5k images
4
+ test: ../coco/test-dev2017.txt # 20k images for submission to https://competitions.codalab.org/competitions/20794
5
+
6
+ # number of classes
7
+ nc: 80
8
+
9
+ # class names
10
+ names: ['person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'boat', 'traffic light',
11
+ 'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow',
12
+ 'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee',
13
+ 'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard',
14
+ 'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple',
15
+ 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch',
16
+ 'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone',
17
+ 'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear',
18
+ 'hair drier', 'toothbrush']
data/hyp.finetune.1280.yaml ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ lr0: 0.01 # initial learning rate (SGD=1E-2, Adam=1E-3)
2
+ lrf: 0.2 # final OneCycleLR learning rate (lr0 * lrf)
3
+ momentum: 0.937 # SGD momentum/Adam beta1
4
+ weight_decay: 0.0005 # optimizer weight decay 5e-4
5
+ warmup_epochs: 3.0 # warmup epochs (fractions ok)
6
+ warmup_momentum: 0.8 # warmup initial momentum
7
+ warmup_bias_lr: 0.1 # warmup initial bias lr
8
+ box: 0.05 # box loss gain
9
+ cls: 0.5 # cls loss gain
10
+ cls_pw: 1.0 # cls BCELoss positive_weight
11
+ obj: 1.0 # obj loss gain (scale with pixels)
12
+ obj_pw: 1.0 # obj BCELoss positive_weight
13
+ iou_t: 0.20 # IoU training threshold
14
+ anchor_t: 4.0 # anchor-multiple threshold
15
+ # anchors: 3 # anchors per output layer (0 to ignore)
16
+ fl_gamma: 0.0 # focal loss gamma (efficientDet default gamma=1.5)
17
+ hsv_h: 0.015 # image HSV-Hue augmentation (fraction)
18
+ hsv_s: 0.7 # image HSV-Saturation augmentation (fraction)
19
+ hsv_v: 0.4 # image HSV-Value augmentation (fraction)
20
+ degrees: 0.0 # image rotation (+/- deg)
21
+ translate: 0.5 # image translation (+/- fraction)
22
+ scale: 0.8 # image scale (+/- gain)
23
+ shear: 0.0 # image shear (+/- deg)
24
+ perspective: 0.0 # image perspective (+/- fraction), range 0-0.001
25
+ flipud: 0.0 # image flip up-down (probability)
26
+ fliplr: 0.5 # image flip left-right (probability)
27
+ mosaic: 1.0 # image mosaic (probability)
28
+ mixup: 0.2 # image mixup (probability)
data/hyp.scratch.1280.yaml ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ lr0: 0.01 # initial learning rate (SGD=1E-2, Adam=1E-3)
2
+ lrf: 0.2 # final OneCycleLR learning rate (lr0 * lrf)
3
+ momentum: 0.937 # SGD momentum/Adam beta1
4
+ weight_decay: 0.0005 # optimizer weight decay 5e-4
5
+ warmup_epochs: 3.0 # warmup epochs (fractions ok)
6
+ warmup_momentum: 0.8 # warmup initial momentum
7
+ warmup_bias_lr: 0.1 # warmup initial bias lr
8
+ box: 0.05 # box loss gain
9
+ cls: 0.5 # cls loss gain
10
+ cls_pw: 1.0 # cls BCELoss positive_weight
11
+ obj: 1.0 # obj loss gain (scale with pixels)
12
+ obj_pw: 1.0 # obj BCELoss positive_weight
13
+ iou_t: 0.20 # IoU training threshold
14
+ anchor_t: 4.0 # anchor-multiple threshold
15
+ # anchors: 3 # anchors per output layer (0 to ignore)
16
+ fl_gamma: 0.0 # focal loss gamma (efficientDet default gamma=1.5)
17
+ hsv_h: 0.015 # image HSV-Hue augmentation (fraction)
18
+ hsv_s: 0.7 # image HSV-Saturation augmentation (fraction)
19
+ hsv_v: 0.4 # image HSV-Value augmentation (fraction)
20
+ degrees: 0.0 # image rotation (+/- deg)
21
+ translate: 0.5 # image translation (+/- fraction)
22
+ scale: 0.5 # image scale (+/- gain)
23
+ shear: 0.0 # image shear (+/- deg)
24
+ perspective: 0.0 # image perspective (+/- fraction), range 0-0.001
25
+ flipud: 0.0 # image flip up-down (probability)
26
+ fliplr: 0.5 # image flip left-right (probability)
27
+ mosaic: 1.0 # image mosaic (probability)
28
+ mixup: 0.0 # image mixup (probability)
data/hyp.scratch.640.yaml ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ lr0: 0.01 # initial learning rate (SGD=1E-2, Adam=1E-3)
2
+ lrf: 0.2 # final OneCycleLR learning rate (lr0 * lrf)
3
+ momentum: 0.937 # SGD momentum/Adam beta1
4
+ weight_decay: 0.0005 # optimizer weight decay 5e-4
5
+ warmup_epochs: 3.0 # warmup epochs (fractions ok)
6
+ warmup_momentum: 0.8 # warmup initial momentum
7
+ warmup_bias_lr: 0.1 # warmup initial bias lr
8
+ box: 0.05 # box loss gain
9
+ cls: 0.3 # cls loss gain
10
+ cls_pw: 1.0 # cls BCELoss positive_weight
11
+ obj: 0.7 # obj loss gain (scale with pixels)
12
+ obj_pw: 1.0 # obj BCELoss positive_weight
13
+ iou_t: 0.20 # IoU training threshold
14
+ anchor_t: 4.0 # anchor-multiple threshold
15
+ # anchors: 3 # anchors per output layer (0 to ignore)
16
+ fl_gamma: 0.0 # focal loss gamma (efficientDet default gamma=1.5)
17
+ hsv_h: 0.015 # image HSV-Hue augmentation (fraction)
18
+ hsv_s: 0.7 # image HSV-Saturation augmentation (fraction)
19
+ hsv_v: 0.4 # image HSV-Value augmentation (fraction)
20
+ degrees: 0.0 # image rotation (+/- deg)
21
+ translate: 0.1 # image translation (+/- fraction)
22
+ scale: 0.9 # image scale (+/- gain)
23
+ shear: 0.0 # image shear (+/- deg)
24
+ perspective: 0.0 # image perspective (+/- fraction), range 0-0.001
25
+ flipud: 0.0 # image flip up-down (probability)
26
+ fliplr: 0.5 # image flip left-right (probability)
27
+ mosaic: 1.0 # image mosaic (probability)
28
+ mixup: 0.0 # image mixup (probability)
figure/implicit_modeling.png ADDED
figure/performance.png ADDED
figure/schedule.png ADDED
figure/unifued_network.png ADDED
inference/images/horses.jpg ADDED
inference/output/horses.jpg ADDED
models/__init__.py ADDED
@@ -0,0 +1 @@
 
 
1
+
models/__pycache__/__init__.cpython-37.pyc ADDED
Binary file (153 Bytes). View file
 
models/__pycache__/models.cpython-37.pyc ADDED
Binary file (20.9 kB). View file
 
models/export.py ADDED
@@ -0,0 +1,68 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import argparse
2
+
3
+ import torch
4
+
5
+ from utils.google_utils import attempt_download
6
+
7
+ if __name__ == '__main__':
8
+ parser = argparse.ArgumentParser()
9
+ parser.add_argument('--weights', type=str, default='./yolov4.pt', help='weights path')
10
+ parser.add_argument('--img-size', nargs='+', type=int, default=[640, 640], help='image size')
11
+ parser.add_argument('--batch-size', type=int, default=1, help='batch size')
12
+ opt = parser.parse_args()
13
+ opt.img_size *= 2 if len(opt.img_size) == 1 else 1 # expand
14
+ print(opt)
15
+
16
+ # Input
17
+ img = torch.zeros((opt.batch_size, 3, *opt.img_size)) # image size(1,3,320,192) iDetection
18
+
19
+ # Load PyTorch model
20
+ attempt_download(opt.weights)
21
+ model = torch.load(opt.weights, map_location=torch.device('cpu'))['model'].float()
22
+ model.eval()
23
+ model.model[-1].export = True # set Detect() layer export=True
24
+ y = model(img) # dry run
25
+
26
+ # TorchScript export
27
+ try:
28
+ print('\nStarting TorchScript export with torch %s...' % torch.__version__)
29
+ f = opt.weights.replace('.pt', '.torchscript.pt') # filename
30
+ ts = torch.jit.trace(model, img)
31
+ ts.save(f)
32
+ print('TorchScript export success, saved as %s' % f)
33
+ except Exception as e:
34
+ print('TorchScript export failure: %s' % e)
35
+
36
+ # ONNX export
37
+ try:
38
+ import onnx
39
+
40
+ print('\nStarting ONNX export with onnx %s...' % onnx.__version__)
41
+ f = opt.weights.replace('.pt', '.onnx') # filename
42
+ model.fuse() # only for ONNX
43
+ torch.onnx.export(model, img, f, verbose=False, opset_version=12, input_names=['images'],
44
+ output_names=['classes', 'boxes'] if y is None else ['output'])
45
+
46
+ # Checks
47
+ onnx_model = onnx.load(f) # load onnx model
48
+ onnx.checker.check_model(onnx_model) # check onnx model
49
+ print(onnx.helper.printable_graph(onnx_model.graph)) # print a human readable model
50
+ print('ONNX export success, saved as %s' % f)
51
+ except Exception as e:
52
+ print('ONNX export failure: %s' % e)
53
+
54
+ # CoreML export
55
+ try:
56
+ import coremltools as ct
57
+
58
+ print('\nStarting CoreML export with coremltools %s...' % ct.__version__)
59
+ # convert model from torchscript and apply pixel scaling as per detect.py
60
+ model = ct.convert(ts, inputs=[ct.ImageType(name='images', shape=img.shape, scale=1 / 255.0, bias=[0, 0, 0])])
61
+ f = opt.weights.replace('.pt', '.mlmodel') # filename
62
+ model.save(f)
63
+ print('CoreML export success, saved as %s' % f)
64
+ except Exception as e:
65
+ print('CoreML export failure: %s' % e)
66
+
67
+ # Finish
68
+ print('\nExport complete. Visualize with https://github.com/lutzroeder/netron.')
models/models.py ADDED
@@ -0,0 +1,761 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from utils.google_utils import *
2
+ from utils.layers import *
3
+ from utils.parse_config import *
4
+ from utils import torch_utils
5
+
6
+ ONNX_EXPORT = False
7
+
8
+
9
+ def create_modules(module_defs, img_size, cfg):
10
+ # Constructs module list of layer blocks from module configuration in module_defs
11
+
12
+ img_size = [img_size] * 2 if isinstance(img_size, int) else img_size # expand if necessary
13
+ _ = module_defs.pop(0) # cfg training hyperparams (unused)
14
+ output_filters = [3] # input channels
15
+ module_list = nn.ModuleList()
16
+ routs = [] # list of layers which rout to deeper layers
17
+ yolo_index = -1
18
+
19
+ for i, mdef in enumerate(module_defs):
20
+ modules = nn.Sequential()
21
+
22
+ if mdef['type'] == 'convolutional':
23
+ bn = mdef['batch_normalize']
24
+ filters = mdef['filters']
25
+ k = mdef['size'] # kernel size
26
+ stride = mdef['stride'] if 'stride' in mdef else (mdef['stride_y'], mdef['stride_x'])
27
+ if isinstance(k, int): # single-size conv
28
+ modules.add_module('Conv2d', nn.Conv2d(in_channels=output_filters[-1],
29
+ out_channels=filters,
30
+ kernel_size=k,
31
+ stride=stride,
32
+ padding=k // 2 if mdef['pad'] else 0,
33
+ groups=mdef['groups'] if 'groups' in mdef else 1,
34
+ bias=not bn))
35
+ else: # multiple-size conv
36
+ modules.add_module('MixConv2d', MixConv2d(in_ch=output_filters[-1],
37
+ out_ch=filters,
38
+ k=k,
39
+ stride=stride,
40
+ bias=not bn))
41
+
42
+ if bn:
43
+ modules.add_module('BatchNorm2d', nn.BatchNorm2d(filters, momentum=0.03, eps=1E-4))
44
+ else:
45
+ routs.append(i) # detection output (goes into yolo layer)
46
+
47
+ if mdef['activation'] == 'leaky': # activation study https://github.com/ultralytics/yolov3/issues/441
48
+ modules.add_module('activation', nn.LeakyReLU(0.1, inplace=True))
49
+ elif mdef['activation'] == 'swish':
50
+ modules.add_module('activation', Swish())
51
+ elif mdef['activation'] == 'mish':
52
+ modules.add_module('activation', Mish())
53
+ elif mdef['activation'] == 'emb':
54
+ modules.add_module('activation', F.normalize())
55
+ elif mdef['activation'] == 'logistic':
56
+ modules.add_module('activation', nn.Sigmoid())
57
+ elif mdef['activation'] == 'silu':
58
+ modules.add_module('activation', nn.SiLU())
59
+
60
+ elif mdef['type'] == 'deformableconvolutional':
61
+ bn = mdef['batch_normalize']
62
+ filters = mdef['filters']
63
+ k = mdef['size'] # kernel size
64
+ stride = mdef['stride'] if 'stride' in mdef else (mdef['stride_y'], mdef['stride_x'])
65
+ if isinstance(k, int): # single-size conv
66
+ modules.add_module('DeformConv2d', DeformConv2d(output_filters[-1],
67
+ filters,
68
+ kernel_size=k,
69
+ padding=k // 2 if mdef['pad'] else 0,
70
+ stride=stride,
71
+ bias=not bn,
72
+ modulation=True))
73
+ else: # multiple-size conv
74
+ modules.add_module('MixConv2d', MixConv2d(in_ch=output_filters[-1],
75
+ out_ch=filters,
76
+ k=k,
77
+ stride=stride,
78
+ bias=not bn))
79
+
80
+ if bn:
81
+ modules.add_module('BatchNorm2d', nn.BatchNorm2d(filters, momentum=0.03, eps=1E-4))
82
+ else:
83
+ routs.append(i) # detection output (goes into yolo layer)
84
+
85
+ if mdef['activation'] == 'leaky': # activation study https://github.com/ultralytics/yolov3/issues/441
86
+ modules.add_module('activation', nn.LeakyReLU(0.1, inplace=True))
87
+ elif mdef['activation'] == 'swish':
88
+ modules.add_module('activation', Swish())
89
+ elif mdef['activation'] == 'mish':
90
+ modules.add_module('activation', Mish())
91
+ elif mdef['activation'] == 'silu':
92
+ modules.add_module('activation', nn.SiLU())
93
+
94
+ elif mdef['type'] == 'dropout':
95
+ p = mdef['probability']
96
+ modules = nn.Dropout(p)
97
+
98
+ elif mdef['type'] == 'avgpool':
99
+ modules = GAP()
100
+
101
+ elif mdef['type'] == 'silence':
102
+ filters = output_filters[-1]
103
+ modules = Silence()
104
+
105
+ elif mdef['type'] == 'scale_channels': # nn.Sequential() placeholder for 'shortcut' layer
106
+ layers = mdef['from']
107
+ filters = output_filters[-1]
108
+ routs.extend([i + l if l < 0 else l for l in layers])
109
+ modules = ScaleChannel(layers=layers)
110
+
111
+ elif mdef['type'] == 'shift_channels': # nn.Sequential() placeholder for 'shortcut' layer
112
+ layers = mdef['from']
113
+ filters = output_filters[-1]
114
+ routs.extend([i + l if l < 0 else l for l in layers])
115
+ modules = ShiftChannel(layers=layers)
116
+
117
+ elif mdef['type'] == 'shift_channels_2d': # nn.Sequential() placeholder for 'shortcut' layer
118
+ layers = mdef['from']
119
+ filters = output_filters[-1]
120
+ routs.extend([i + l if l < 0 else l for l in layers])
121
+ modules = ShiftChannel2D(layers=layers)
122
+
123
+ elif mdef['type'] == 'control_channels': # nn.Sequential() placeholder for 'shortcut' layer
124
+ layers = mdef['from']
125
+ filters = output_filters[-1]
126
+ routs.extend([i + l if l < 0 else l for l in layers])
127
+ modules = ControlChannel(layers=layers)
128
+
129
+ elif mdef['type'] == 'control_channels_2d': # nn.Sequential() placeholder for 'shortcut' layer
130
+ layers = mdef['from']
131
+ filters = output_filters[-1]
132
+ routs.extend([i + l if l < 0 else l for l in layers])
133
+ modules = ControlChannel2D(layers=layers)
134
+
135
+ elif mdef['type'] == 'alternate_channels': # nn.Sequential() placeholder for 'shortcut' layer
136
+ layers = mdef['from']
137
+ filters = output_filters[-1] * 2
138
+ routs.extend([i + l if l < 0 else l for l in layers])
139
+ modules = AlternateChannel(layers=layers)
140
+
141
+ elif mdef['type'] == 'alternate_channels_2d': # nn.Sequential() placeholder for 'shortcut' layer
142
+ layers = mdef['from']
143
+ filters = output_filters[-1] * 2
144
+ routs.extend([i + l if l < 0 else l for l in layers])
145
+ modules = AlternateChannel2D(layers=layers)
146
+
147
+ elif mdef['type'] == 'select_channels': # nn.Sequential() placeholder for 'shortcut' layer
148
+ layers = mdef['from']
149
+ filters = output_filters[-1]
150
+ routs.extend([i + l if l < 0 else l for l in layers])
151
+ modules = SelectChannel(layers=layers)
152
+
153
+ elif mdef['type'] == 'select_channels_2d': # nn.Sequential() placeholder for 'shortcut' layer
154
+ layers = mdef['from']
155
+ filters = output_filters[-1]
156
+ routs.extend([i + l if l < 0 else l for l in layers])
157
+ modules = SelectChannel2D(layers=layers)
158
+
159
+ elif mdef['type'] == 'sam': # nn.Sequential() placeholder for 'shortcut' layer
160
+ layers = mdef['from']
161
+ filters = output_filters[-1]
162
+ routs.extend([i + l if l < 0 else l for l in layers])
163
+ modules = ScaleSpatial(layers=layers)
164
+
165
+ elif mdef['type'] == 'BatchNorm2d':
166
+ filters = output_filters[-1]
167
+ modules = nn.BatchNorm2d(filters, momentum=0.03, eps=1E-4)
168
+ if i == 0 and filters == 3: # normalize RGB image
169
+ # imagenet mean and var https://pytorch.org/docs/stable/torchvision/models.html#classification
170
+ modules.running_mean = torch.tensor([0.485, 0.456, 0.406])
171
+ modules.running_var = torch.tensor([0.0524, 0.0502, 0.0506])
172
+
173
+ elif mdef['type'] == 'maxpool':
174
+ k = mdef['size'] # kernel size
175
+ stride = mdef['stride']
176
+ maxpool = nn.MaxPool2d(kernel_size=k, stride=stride, padding=(k - 1) // 2)
177
+ if k == 2 and stride == 1: # yolov3-tiny
178
+ modules.add_module('ZeroPad2d', nn.ZeroPad2d((0, 1, 0, 1)))
179
+ modules.add_module('MaxPool2d', maxpool)
180
+ else:
181
+ modules = maxpool
182
+
183
+ elif mdef['type'] == 'local_avgpool':
184
+ k = mdef['size'] # kernel size
185
+ stride = mdef['stride']
186
+ avgpool = nn.AvgPool2d(kernel_size=k, stride=stride, padding=(k - 1) // 2)
187
+ if k == 2 and stride == 1: # yolov3-tiny
188
+ modules.add_module('ZeroPad2d', nn.ZeroPad2d((0, 1, 0, 1)))
189
+ modules.add_module('AvgPool2d', avgpool)
190
+ else:
191
+ modules = avgpool
192
+
193
+ elif mdef['type'] == 'upsample':
194
+ if ONNX_EXPORT: # explicitly state size, avoid scale_factor
195
+ g = (yolo_index + 1) * 2 / 32 # gain
196
+ modules = nn.Upsample(size=tuple(int(x * g) for x in img_size)) # img_size = (320, 192)
197
+ else:
198
+ modules = nn.Upsample(scale_factor=mdef['stride'])
199
+
200
+ elif mdef['type'] == 'route': # nn.Sequential() placeholder for 'route' layer
201
+ layers = mdef['layers']
202
+ filters = sum([output_filters[l + 1 if l > 0 else l] for l in layers])
203
+ routs.extend([i + l if l < 0 else l for l in layers])
204
+ modules = FeatureConcat(layers=layers)
205
+
206
+ elif mdef['type'] == 'route2': # nn.Sequential() placeholder for 'route' layer
207
+ layers = mdef['layers']
208
+ filters = sum([output_filters[l + 1 if l > 0 else l] for l in layers])
209
+ routs.extend([i + l if l < 0 else l for l in layers])
210
+ modules = FeatureConcat2(layers=layers)
211
+
212
+ elif mdef['type'] == 'route3': # nn.Sequential() placeholder for 'route' layer
213
+ layers = mdef['layers']
214
+ filters = sum([output_filters[l + 1 if l > 0 else l] for l in layers])
215
+ routs.extend([i + l if l < 0 else l for l in layers])
216
+ modules = FeatureConcat3(layers=layers)
217
+
218
+ elif mdef['type'] == 'route_lhalf': # nn.Sequential() placeholder for 'route' layer
219
+ layers = mdef['layers']
220
+ filters = sum([output_filters[l + 1 if l > 0 else l] for l in layers])//2
221
+ routs.extend([i + l if l < 0 else l for l in layers])
222
+ modules = FeatureConcat_l(layers=layers)
223
+
224
+ elif mdef['type'] == 'shortcut': # nn.Sequential() placeholder for 'shortcut' layer
225
+ layers = mdef['from']
226
+ filters = output_filters[-1]
227
+ routs.extend([i + l if l < 0 else l for l in layers])
228
+ modules = WeightedFeatureFusion(layers=layers, weight='weights_type' in mdef)
229
+
230
+ elif mdef['type'] == 'reorg3d': # yolov3-spp-pan-scale
231
+ pass
232
+
233
+ elif mdef['type'] == 'reorg': # yolov3-spp-pan-scale
234
+ filters = 4 * output_filters[-1]
235
+ modules.add_module('Reorg', Reorg())
236
+
237
+ elif mdef['type'] == 'dwt': # yolov3-spp-pan-scale
238
+ filters = 4 * output_filters[-1]
239
+ modules.add_module('DWT', DWT())
240
+
241
+ elif mdef['type'] == 'implicit_add': # yolov3-spp-pan-scale
242
+ filters = mdef['filters']
243
+ modules = ImplicitA(channel=filters)
244
+
245
+ elif mdef['type'] == 'implicit_mul': # yolov3-spp-pan-scale
246
+ filters = mdef['filters']
247
+ modules = ImplicitM(channel=filters)
248
+
249
+ elif mdef['type'] == 'implicit_cat': # yolov3-spp-pan-scale
250
+ filters = mdef['filters']
251
+ modules = ImplicitC(channel=filters)
252
+
253
+ elif mdef['type'] == 'implicit_add_2d': # yolov3-spp-pan-scale
254
+ channels = mdef['filters']
255
+ filters = mdef['atoms']
256
+ modules = Implicit2DA(atom=filters, channel=channels)
257
+
258
+ elif mdef['type'] == 'implicit_mul_2d': # yolov3-spp-pan-scale
259
+ channels = mdef['filters']
260
+ filters = mdef['atoms']
261
+ modules = Implicit2DM(atom=filters, channel=channels)
262
+
263
+ elif mdef['type'] == 'implicit_cat_2d': # yolov3-spp-pan-scale
264
+ channels = mdef['filters']
265
+ filters = mdef['atoms']
266
+ modules = Implicit2DC(atom=filters, channel=channels)
267
+
268
+ elif mdef['type'] == 'yolo':
269
+ yolo_index += 1
270
+ stride = [8, 16, 32, 64, 128] # P3, P4, P5, P6, P7 strides
271
+ if any(x in cfg for x in ['yolov4-tiny', 'fpn', 'yolov3']): # P5, P4, P3 strides
272
+ stride = [32, 16, 8]
273
+ layers = mdef['from'] if 'from' in mdef else []
274
+ modules = YOLOLayer(anchors=mdef['anchors'][mdef['mask']], # anchor list
275
+ nc=mdef['classes'], # number of classes
276
+ img_size=img_size, # (416, 416)
277
+ yolo_index=yolo_index, # 0, 1, 2...
278
+ layers=layers, # output layers
279
+ stride=stride[yolo_index])
280
+
281
+ # Initialize preceding Conv2d() bias (https://arxiv.org/pdf/1708.02002.pdf section 3.3)
282
+ try:
283
+ j = layers[yolo_index] if 'from' in mdef else -2
284
+ bias_ = module_list[j][0].bias # shape(255,)
285
+ bias = bias_[:modules.no * modules.na].view(modules.na, -1) # shape(3,85)
286
+ #bias[:, 4] += -4.5 # obj
287
+ bias.data[:, 4] += math.log(8 / (640 / stride[yolo_index]) ** 2) # obj (8 objects per 640 image)
288
+ bias.data[:, 5:] += math.log(0.6 / (modules.nc - 0.99)) # cls (sigmoid(p) = 1/nc)
289
+ module_list[j][0].bias = torch.nn.Parameter(bias_, requires_grad=bias_.requires_grad)
290
+
291
+ #j = [-2, -5, -8]
292
+ #for sj in j:
293
+ # bias_ = module_list[sj][0].bias
294
+ # bias = bias_[:modules.no * 1].view(1, -1)
295
+ # bias.data[:, 4] += math.log(8 / (640 / stride[yolo_index]) ** 2)
296
+ # bias.data[:, 5:] += math.log(0.6 / (modules.nc - 0.99))
297
+ # module_list[sj][0].bias = torch.nn.Parameter(bias_, requires_grad=bias_.requires_grad)
298
+ except:
299
+ print('WARNING: smart bias initialization failure.')
300
+
301
+ elif mdef['type'] == 'jde':
302
+ yolo_index += 1
303
+ stride = [8, 16, 32, 64, 128] # P3, P4, P5, P6, P7 strides
304
+ if any(x in cfg for x in ['yolov4-tiny', 'fpn', 'yolov3']): # P5, P4, P3 strides
305
+ stride = [32, 16, 8]
306
+ layers = mdef['from'] if 'from' in mdef else []
307
+ modules = JDELayer(anchors=mdef['anchors'][mdef['mask']], # anchor list
308
+ nc=mdef['classes'], # number of classes
309
+ img_size=img_size, # (416, 416)
310
+ yolo_index=yolo_index, # 0, 1, 2...
311
+ layers=layers, # output layers
312
+ stride=stride[yolo_index])
313
+
314
+ # Initialize preceding Conv2d() bias (https://arxiv.org/pdf/1708.02002.pdf section 3.3)
315
+ try:
316
+ j = layers[yolo_index] if 'from' in mdef else -1
317
+ bias_ = module_list[j][0].bias # shape(255,)
318
+ bias = bias_[:modules.no * modules.na].view(modules.na, -1) # shape(3,85)
319
+ #bias[:, 4] += -4.5 # obj
320
+ bias.data[:, 4] += math.log(8 / (640 / stride[yolo_index]) ** 2) # obj (8 objects per 640 image)
321
+ bias.data[:, 5:] += math.log(0.6 / (modules.nc - 0.99)) # cls (sigmoid(p) = 1/nc)
322
+ module_list[j][0].bias = torch.nn.Parameter(bias_, requires_grad=bias_.requires_grad)
323
+ except:
324
+ print('WARNING: smart bias initialization failure.')
325
+
326
+ else:
327
+ print('Warning: Unrecognized Layer Type: ' + mdef['type'])
328
+
329
+ # Register module list and number of output filters
330
+ module_list.append(modules)
331
+ output_filters.append(filters)
332
+
333
+ routs_binary = [False] * (i + 1)
334
+ for i in routs:
335
+ routs_binary[i] = True
336
+ return module_list, routs_binary
337
+
338
+
339
+ class YOLOLayer(nn.Module):
340
+ def __init__(self, anchors, nc, img_size, yolo_index, layers, stride):
341
+ super(YOLOLayer, self).__init__()
342
+ self.anchors = torch.Tensor(anchors)
343
+ self.index = yolo_index # index of this layer in layers
344
+ self.layers = layers # model output layer indices
345
+ self.stride = stride # layer stride
346
+ self.nl = len(layers) # number of output layers (3)
347
+ self.na = len(anchors) # number of anchors (3)
348
+ self.nc = nc # number of classes (80)
349
+ self.no = nc + 5 # number of outputs (85)
350
+ self.nx, self.ny, self.ng = 0, 0, 0 # initialize number of x, y gridpoints
351
+ self.anchor_vec = self.anchors / self.stride
352
+ self.anchor_wh = self.anchor_vec.view(1, self.na, 1, 1, 2)
353
+
354
+ if ONNX_EXPORT:
355
+ self.training = False
356
+ self.create_grids((img_size[1] // stride, img_size[0] // stride)) # number x, y grid points
357
+
358
+ def create_grids(self, ng=(13, 13), device='cpu'):
359
+ self.nx, self.ny = ng # x and y grid size
360
+ self.ng = torch.tensor(ng, dtype=torch.float)
361
+
362
+ # build xy offsets
363
+ if not self.training:
364
+ yv, xv = torch.meshgrid([torch.arange(self.ny, device=device), torch.arange(self.nx, device=device)])
365
+ self.grid = torch.stack((xv, yv), 2).view((1, 1, self.ny, self.nx, 2)).float()
366
+
367
+ if self.anchor_vec.device != device:
368
+ self.anchor_vec = self.anchor_vec.to(device)
369
+ self.anchor_wh = self.anchor_wh.to(device)
370
+
371
+ def forward(self, p, out):
372
+ ASFF = False # https://arxiv.org/abs/1911.09516
373
+ if ASFF:
374
+ i, n = self.index, self.nl # index in layers, number of layers
375
+ p = out[self.layers[i]]
376
+ bs, _, ny, nx = p.shape # bs, 255, 13, 13
377
+ if (self.nx, self.ny) != (nx, ny):
378
+ self.create_grids((nx, ny), p.device)
379
+
380
+ # outputs and weights
381
+ # w = F.softmax(p[:, -n:], 1) # normalized weights
382
+ w = torch.sigmoid(p[:, -n:]) * (2 / n) # sigmoid weights (faster)
383
+ # w = w / w.sum(1).unsqueeze(1) # normalize across layer dimension
384
+
385
+ # weighted ASFF sum
386
+ p = out[self.layers[i]][:, :-n] * w[:, i:i + 1]
387
+ for j in range(n):
388
+ if j != i:
389
+ p += w[:, j:j + 1] * \
390
+ F.interpolate(out[self.layers[j]][:, :-n], size=[ny, nx], mode='bilinear', align_corners=False)
391
+
392
+ elif ONNX_EXPORT:
393
+ bs = 1 # batch size
394
+ else:
395
+ bs, _, ny, nx = p.shape # bs, 255, 13, 13
396
+ if (self.nx, self.ny) != (nx, ny):
397
+ self.create_grids((nx, ny), p.device)
398
+
399
+ # p.view(bs, 255, 13, 13) -- > (bs, 3, 13, 13, 85) # (bs, anchors, grid, grid, classes + xywh)
400
+ p = p.view(bs, self.na, self.no, self.ny, self.nx).permute(0, 1, 3, 4, 2).contiguous() # prediction
401
+
402
+ if self.training:
403
+ return p
404
+
405
+ elif ONNX_EXPORT:
406
+ # Avoid broadcasting for ANE operations
407
+ m = self.na * self.nx * self.ny
408
+ ng = 1. / self.ng.repeat(m, 1)
409
+ grid = self.grid.repeat(1, self.na, 1, 1, 1).view(m, 2)
410
+ anchor_wh = self.anchor_wh.repeat(1, 1, self.nx, self.ny, 1).view(m, 2) * ng
411
+
412
+ p = p.view(m, self.no)
413
+ xy = torch.sigmoid(p[:, 0:2]) + grid # x, y
414
+ wh = torch.exp(p[:, 2:4]) * anchor_wh # width, height
415
+ p_cls = torch.sigmoid(p[:, 4:5]) if self.nc == 1 else \
416
+ torch.sigmoid(p[:, 5:self.no]) * torch.sigmoid(p[:, 4:5]) # conf
417
+ return p_cls, xy * ng, wh
418
+
419
+ else: # inference
420
+ io = p.sigmoid()
421
+ io[..., :2] = (io[..., :2] * 2. - 0.5 + self.grid)
422
+ io[..., 2:4] = (io[..., 2:4] * 2) ** 2 * self.anchor_wh
423
+ io[..., :4] *= self.stride
424
+ #io = p.clone() # inference output
425
+ #io[..., :2] = torch.sigmoid(io[..., :2]) + self.grid # xy
426
+ #io[..., 2:4] = torch.exp(io[..., 2:4]) * self.anchor_wh # wh yolo method
427
+ #io[..., :4] *= self.stride
428
+ #torch.sigmoid_(io[..., 4:])
429
+ return io.view(bs, -1, self.no), p # view [1, 3, 13, 13, 85] as [1, 507, 85]
430
+
431
+
432
+ class JDELayer(nn.Module):
433
+ def __init__(self, anchors, nc, img_size, yolo_index, layers, stride):
434
+ super(JDELayer, self).__init__()
435
+ self.anchors = torch.Tensor(anchors)
436
+ self.index = yolo_index # index of this layer in layers
437
+ self.layers = layers # model output layer indices
438
+ self.stride = stride # layer stride
439
+ self.nl = len(layers) # number of output layers (3)
440
+ self.na = len(anchors) # number of anchors (3)
441
+ self.nc = nc # number of classes (80)
442
+ self.no = nc + 5 # number of outputs (85)
443
+ self.nx, self.ny, self.ng = 0, 0, 0 # initialize number of x, y gridpoints
444
+ self.anchor_vec = self.anchors / self.stride
445
+ self.anchor_wh = self.anchor_vec.view(1, self.na, 1, 1, 2)
446
+
447
+ if ONNX_EXPORT:
448
+ self.training = False
449
+ self.create_grids((img_size[1] // stride, img_size[0] // stride)) # number x, y grid points
450
+
451
+ def create_grids(self, ng=(13, 13), device='cpu'):
452
+ self.nx, self.ny = ng # x and y grid size
453
+ self.ng = torch.tensor(ng, dtype=torch.float)
454
+
455
+ # build xy offsets
456
+ if not self.training:
457
+ yv, xv = torch.meshgrid([torch.arange(self.ny, device=device), torch.arange(self.nx, device=device)])
458
+ self.grid = torch.stack((xv, yv), 2).view((1, 1, self.ny, self.nx, 2)).float()
459
+
460
+ if self.anchor_vec.device != device:
461
+ self.anchor_vec = self.anchor_vec.to(device)
462
+ self.anchor_wh = self.anchor_wh.to(device)
463
+
464
+ def forward(self, p, out):
465
+ ASFF = False # https://arxiv.org/abs/1911.09516
466
+ if ASFF:
467
+ i, n = self.index, self.nl # index in layers, number of layers
468
+ p = out[self.layers[i]]
469
+ bs, _, ny, nx = p.shape # bs, 255, 13, 13
470
+ if (self.nx, self.ny) != (nx, ny):
471
+ self.create_grids((nx, ny), p.device)
472
+
473
+ # outputs and weights
474
+ # w = F.softmax(p[:, -n:], 1) # normalized weights
475
+ w = torch.sigmoid(p[:, -n:]) * (2 / n) # sigmoid weights (faster)
476
+ # w = w / w.sum(1).unsqueeze(1) # normalize across layer dimension
477
+
478
+ # weighted ASFF sum
479
+ p = out[self.layers[i]][:, :-n] * w[:, i:i + 1]
480
+ for j in range(n):
481
+ if j != i:
482
+ p += w[:, j:j + 1] * \
483
+ F.interpolate(out[self.layers[j]][:, :-n], size=[ny, nx], mode='bilinear', align_corners=False)
484
+
485
+ elif ONNX_EXPORT:
486
+ bs = 1 # batch size
487
+ else:
488
+ bs, _, ny, nx = p.shape # bs, 255, 13, 13
489
+ if (self.nx, self.ny) != (nx, ny):
490
+ self.create_grids((nx, ny), p.device)
491
+
492
+ # p.view(bs, 255, 13, 13) -- > (bs, 3, 13, 13, 85) # (bs, anchors, grid, grid, classes + xywh)
493
+ p = p.view(bs, self.na, self.no, self.ny, self.nx).permute(0, 1, 3, 4, 2).contiguous() # prediction
494
+
495
+ if self.training:
496
+ return p
497
+
498
+ elif ONNX_EXPORT:
499
+ # Avoid broadcasting for ANE operations
500
+ m = self.na * self.nx * self.ny
501
+ ng = 1. / self.ng.repeat(m, 1)
502
+ grid = self.grid.repeat(1, self.na, 1, 1, 1).view(m, 2)
503
+ anchor_wh = self.anchor_wh.repeat(1, 1, self.nx, self.ny, 1).view(m, 2) * ng
504
+
505
+ p = p.view(m, self.no)
506
+ xy = torch.sigmoid(p[:, 0:2]) + grid # x, y
507
+ wh = torch.exp(p[:, 2:4]) * anchor_wh # width, height
508
+ p_cls = torch.sigmoid(p[:, 4:5]) if self.nc == 1 else \
509
+ torch.sigmoid(p[:, 5:self.no]) * torch.sigmoid(p[:, 4:5]) # conf
510
+ return p_cls, xy * ng, wh
511
+
512
+ else: # inference
513
+ #io = p.sigmoid()
514
+ #io[..., :2] = (io[..., :2] * 2. - 0.5 + self.grid)
515
+ #io[..., 2:4] = (io[..., 2:4] * 2) ** 2 * self.anchor_wh
516
+ #io[..., :4] *= self.stride
517
+ io = p.clone() # inference output
518
+ io[..., :2] = torch.sigmoid(io[..., :2]) * 2. - 0.5 + self.grid # xy
519
+ io[..., 2:4] = (torch.sigmoid(io[..., 2:4]) * 2) ** 2 * self.anchor_wh # wh yolo method
520
+ io[..., :4] *= self.stride
521
+ io[..., 4:] = F.softmax(io[..., 4:])
522
+ return io.view(bs, -1, self.no), p # view [1, 3, 13, 13, 85] as [1, 507, 85]
523
+
524
+ class Darknet(nn.Module):
525
+ # YOLOv3 object detection model
526
+
527
+ def __init__(self, cfg, img_size=(416, 416), verbose=False):
528
+ super(Darknet, self).__init__()
529
+
530
+ self.module_defs = parse_model_cfg(cfg)
531
+ self.module_list, self.routs = create_modules(self.module_defs, img_size, cfg)
532
+ self.yolo_layers = get_yolo_layers(self)
533
+ # torch_utils.initialize_weights(self)
534
+
535
+ # Darknet Header https://github.com/AlexeyAB/darknet/issues/2914#issuecomment-496675346
536
+ self.version = np.array([0, 2, 5], dtype=np.int32) # (int32) version info: major, minor, revision
537
+ self.seen = np.array([0], dtype=np.int64) # (int64) number of images seen during training
538
+ self.info(verbose) if not ONNX_EXPORT else None # print model description
539
+
540
+ def forward(self, x, augment=False, verbose=False):
541
+
542
+ if not augment:
543
+ return self.forward_once(x)
544
+ else: # Augment images (inference and test only) https://github.com/ultralytics/yolov3/issues/931
545
+ img_size = x.shape[-2:] # height, width
546
+ s = [0.83, 0.67] # scales
547
+ y = []
548
+ for i, xi in enumerate((x,
549
+ torch_utils.scale_img(x.flip(3), s[0], same_shape=False), # flip-lr and scale
550
+ torch_utils.scale_img(x, s[1], same_shape=False), # scale
551
+ )):
552
+ # cv2.imwrite('img%g.jpg' % i, 255 * xi[0].numpy().transpose((1, 2, 0))[:, :, ::-1])
553
+ y.append(self.forward_once(xi)[0])
554
+
555
+ y[1][..., :4] /= s[0] # scale
556
+ y[1][..., 0] = img_size[1] - y[1][..., 0] # flip lr
557
+ y[2][..., :4] /= s[1] # scale
558
+
559
+ # for i, yi in enumerate(y): # coco small, medium, large = < 32**2 < 96**2 <
560
+ # area = yi[..., 2:4].prod(2)[:, :, None]
561
+ # if i == 1:
562
+ # yi *= (area < 96. ** 2).float()
563
+ # elif i == 2:
564
+ # yi *= (area > 32. ** 2).float()
565
+ # y[i] = yi
566
+
567
+ y = torch.cat(y, 1)
568
+ return y, None
569
+
570
+ def forward_once(self, x, augment=False, verbose=False):
571
+ img_size = x.shape[-2:] # height, width
572
+ yolo_out, out = [], []
573
+ if verbose:
574
+ print('0', x.shape)
575
+ str = ''
576
+
577
+ # Augment images (inference and test only)
578
+ if augment: # https://github.com/ultralytics/yolov3/issues/931
579
+ nb = x.shape[0] # batch size
580
+ s = [0.83, 0.67] # scales
581
+ x = torch.cat((x,
582
+ torch_utils.scale_img(x.flip(3), s[0]), # flip-lr and scale
583
+ torch_utils.scale_img(x, s[1]), # scale
584
+ ), 0)
585
+
586
+ for i, module in enumerate(self.module_list):
587
+ name = module.__class__.__name__
588
+ #print(name)
589
+ if name in ['WeightedFeatureFusion', 'FeatureConcat', 'FeatureConcat2', 'FeatureConcat3', 'FeatureConcat_l', 'ScaleChannel', 'ShiftChannel', 'ShiftChannel2D', 'ControlChannel', 'ControlChannel2D', 'AlternateChannel', 'AlternateChannel2D', 'SelectChannel', 'SelectChannel2D', 'ScaleSpatial']: # sum, concat
590
+ if verbose:
591
+ l = [i - 1] + module.layers # layers
592
+ sh = [list(x.shape)] + [list(out[i].shape) for i in module.layers] # shapes
593
+ str = ' >> ' + ' + '.join(['layer %g %s' % x for x in zip(l, sh)])
594
+ x = module(x, out) # WeightedFeatureFusion(), FeatureConcat()
595
+ elif name in ['ImplicitA', 'ImplicitM', 'ImplicitC', 'Implicit2DA', 'Implicit2DM', 'Implicit2DC']:
596
+ x = module()
597
+ elif name == 'YOLOLayer':
598
+ yolo_out.append(module(x, out))
599
+ elif name == 'JDELayer':
600
+ yolo_out.append(module(x, out))
601
+ else: # run module directly, i.e. mtype = 'convolutional', 'upsample', 'maxpool', 'batchnorm2d' etc.
602
+ #print(module)
603
+ #print(x.shape)
604
+ x = module(x)
605
+
606
+ out.append(x if self.routs[i] else [])
607
+ if verbose:
608
+ print('%g/%g %s -' % (i, len(self.module_list), name), list(x.shape), str)
609
+ str = ''
610
+
611
+ if self.training: # train
612
+ return yolo_out
613
+ elif ONNX_EXPORT: # export
614
+ x = [torch.cat(x, 0) for x in zip(*yolo_out)]
615
+ return x[0], torch.cat(x[1:3], 1) # scores, boxes: 3780x80, 3780x4
616
+ else: # inference or test
617
+ x, p = zip(*yolo_out) # inference output, training output
618
+ x = torch.cat(x, 1) # cat yolo outputs
619
+ if augment: # de-augment results
620
+ x = torch.split(x, nb, dim=0)
621
+ x[1][..., :4] /= s[0] # scale
622
+ x[1][..., 0] = img_size[1] - x[1][..., 0] # flip lr
623
+ x[2][..., :4] /= s[1] # scale
624
+ x = torch.cat(x, 1)
625
+ return x, p
626
+
627
+ def fuse(self):
628
+ # Fuse Conv2d + BatchNorm2d layers throughout model
629
+ print('Fusing layers...')
630
+ fused_list = nn.ModuleList()
631
+ for a in list(self.children())[0]:
632
+ if isinstance(a, nn.Sequential):
633
+ for i, b in enumerate(a):
634
+ if isinstance(b, nn.modules.batchnorm.BatchNorm2d):
635
+ # fuse this bn layer with the previous conv2d layer
636
+ conv = a[i - 1]
637
+ fused = torch_utils.fuse_conv_and_bn(conv, b)
638
+ a = nn.Sequential(fused, *list(a.children())[i + 1:])
639
+ break
640
+ fused_list.append(a)
641
+ self.module_list = fused_list
642
+ self.info() if not ONNX_EXPORT else None # yolov3-spp reduced from 225 to 152 layers
643
+
644
+ def info(self, verbose=False):
645
+ torch_utils.model_info(self, verbose)
646
+
647
+
648
+ def get_yolo_layers(model):
649
+ return [i for i, m in enumerate(model.module_list) if m.__class__.__name__ in ['YOLOLayer', 'JDELayer']] # [89, 101, 113]
650
+
651
+
652
+ def load_darknet_weights(self, weights, cutoff=-1):
653
+ # Parses and loads the weights stored in 'weights'
654
+
655
+ # Establish cutoffs (load layers between 0 and cutoff. if cutoff = -1 all are loaded)
656
+ file = Path(weights).name
657
+ if file == 'darknet53.conv.74':
658
+ cutoff = 75
659
+ elif file == 'yolov3-tiny.conv.15':
660
+ cutoff = 15
661
+
662
+ # Read weights file
663
+ with open(weights, 'rb') as f:
664
+ # Read Header https://github.com/AlexeyAB/darknet/issues/2914#issuecomment-496675346
665
+ self.version = np.fromfile(f, dtype=np.int32, count=3) # (int32) version info: major, minor, revision
666
+ self.seen = np.fromfile(f, dtype=np.int64, count=1) # (int64) number of images seen during training
667
+
668
+ weights = np.fromfile(f, dtype=np.float32) # the rest are weights
669
+
670
+ ptr = 0
671
+ for i, (mdef, module) in enumerate(zip(self.module_defs[:cutoff], self.module_list[:cutoff])):
672
+ if mdef['type'] == 'convolutional':
673
+ conv = module[0]
674
+ if mdef['batch_normalize']:
675
+ # Load BN bias, weights, running mean and running variance
676
+ bn = module[1]
677
+ nb = bn.bias.numel() # number of biases
678
+ # Bias
679
+ bn.bias.data.copy_(torch.from_numpy(weights[ptr:ptr + nb]).view_as(bn.bias))
680
+ ptr += nb
681
+ # Weight
682
+ bn.weight.data.copy_(torch.from_numpy(weights[ptr:ptr + nb]).view_as(bn.weight))
683
+ ptr += nb
684
+ # Running Mean
685
+ bn.running_mean.data.copy_(torch.from_numpy(weights[ptr:ptr + nb]).view_as(bn.running_mean))
686
+ ptr += nb
687
+ # Running Var
688
+ bn.running_var.data.copy_(torch.from_numpy(weights[ptr:ptr + nb]).view_as(bn.running_var))
689
+ ptr += nb
690
+ else:
691
+ # Load conv. bias
692
+ nb = conv.bias.numel()
693
+ conv_b = torch.from_numpy(weights[ptr:ptr + nb]).view_as(conv.bias)
694
+ conv.bias.data.copy_(conv_b)
695
+ ptr += nb
696
+ # Load conv. weights
697
+ nw = conv.weight.numel() # number of weights
698
+ conv.weight.data.copy_(torch.from_numpy(weights[ptr:ptr + nw]).view_as(conv.weight))
699
+ ptr += nw
700
+
701
+
702
+ def save_weights(self, path='model.weights', cutoff=-1):
703
+ # Converts a PyTorch model to Darket format (*.pt to *.weights)
704
+ # Note: Does not work if model.fuse() is applied
705
+ with open(path, 'wb') as f:
706
+ # Write Header https://github.com/AlexeyAB/darknet/issues/2914#issuecomment-496675346
707
+ self.version.tofile(f) # (int32) version info: major, minor, revision
708
+ self.seen.tofile(f) # (int64) number of images seen during training
709
+
710
+ # Iterate through layers
711
+ for i, (mdef, module) in enumerate(zip(self.module_defs[:cutoff], self.module_list[:cutoff])):
712
+ if mdef['type'] == 'convolutional':
713
+ conv_layer = module[0]
714
+ # If batch norm, load bn first
715
+ if mdef['batch_normalize']:
716
+ bn_layer = module[1]
717
+ bn_layer.bias.data.cpu().numpy().tofile(f)
718
+ bn_layer.weight.data.cpu().numpy().tofile(f)
719
+ bn_layer.running_mean.data.cpu().numpy().tofile(f)
720
+ bn_layer.running_var.data.cpu().numpy().tofile(f)
721
+ # Load conv bias
722
+ else:
723
+ conv_layer.bias.data.cpu().numpy().tofile(f)
724
+ # Load conv weights
725
+ conv_layer.weight.data.cpu().numpy().tofile(f)
726
+
727
+
728
+ def convert(cfg='cfg/yolov3-spp.cfg', weights='weights/yolov3-spp.weights', saveto='converted.weights'):
729
+ # Converts between PyTorch and Darknet format per extension (i.e. *.weights convert to *.pt and vice versa)
730
+ # from models import *; convert('cfg/yolov3-spp.cfg', 'weights/yolov3-spp.weights')
731
+
732
+ # Initialize model
733
+ model = Darknet(cfg)
734
+ ckpt = torch.load(weights) # load checkpoint
735
+ try:
736
+ ckpt['model'] = {k: v for k, v in ckpt['model'].items() if model.state_dict()[k].numel() == v.numel()}
737
+ model.load_state_dict(ckpt['model'], strict=False)
738
+ save_weights(model, path=saveto, cutoff=-1)
739
+ except KeyError as e:
740
+ print(e)
741
+
742
+ def attempt_download(weights):
743
+ # Attempt to download pretrained weights if not found locally
744
+ weights = weights.strip()
745
+ msg = weights + ' missing, try downloading from https://drive.google.com/open?id=1LezFG5g3BCW6iYaV89B2i64cqEUZD7e0'
746
+
747
+ if len(weights) > 0 and not os.path.isfile(weights):
748
+ d = {''}
749
+
750
+ file = Path(weights).name
751
+ if file in d:
752
+ r = gdrive_download(id=d[file], name=weights)
753
+ else: # download from pjreddie.com
754
+ url = 'https://pjreddie.com/media/files/' + file
755
+ print('Downloading ' + url)
756
+ r = os.system('curl -f ' + url + ' -o ' + weights)
757
+
758
+ # Error check
759
+ if not (r == 0 and os.path.exists(weights) and os.path.getsize(weights) > 1E6): # weights exist and > 1MB
760
+ os.system('rm ' + weights) # remove partial downloads
761
+ raise Exception(msg)
requirements.txt ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # pip install -qr requirements.txt
2
+
3
+ # base ----------------------------------------
4
+ Cython
5
+ matplotlib>=3.2.2
6
+ numpy>=1.18.5
7
+ opencv-python>=4.1.2
8
+ Pillow
9
+ PyYAML>=5.3.1
10
+ scipy>=1.4.1
11
+ tensorboard>=1.5
12
+ torch==1.7.0
13
+ torchvision==0.8.1
14
+ tqdm>=4.41.0
15
+
16
+ # logging -------------------------------------
17
+ # wandb
18
+
19
+ # plotting ------------------------------------
20
+ seaborn>=0.11.0
21
+ pandas
22
+
23
+ # export --------------------------------------
24
+ # coremltools>=4.1
25
+ # onnx>=1.8.1
26
+ # scikit-learn==0.19.2 # for coreml quantization
27
+
28
+ # extras --------------------------------------
29
+ thop # FLOPS computation
30
+ pycocotools==2.0 # COCO mAP
31
+
32
+
33
+ gdown
scripts/get_coco.sh ADDED
@@ -0,0 +1,27 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/bin/bash
2
+ # Script credit to https://github.com/ultralytics/yolov5
3
+ # COCO 2017 dataset http://cocodataset.org
4
+ # Download command: bash scripts/get_coco.sh
5
+ # Default dataset location is next to /yolor:
6
+ # /parent_folder
7
+ # /coco
8
+ # /yolor
9
+
10
+ # Download/unzip labels
11
+ d='../' # unzip directory
12
+ url=https://github.com/ultralytics/yolov5/releases/download/v1.0/
13
+ f='coco2017labels.zip' # or 'coco2017labels-segments.zip', 68 MB
14
+ echo 'Downloading' $url$f ' ...'
15
+ curl -L $url$f -o $f && unzip -q $f -d $d && rm $f & # download, unzip, remove in background
16
+
17
+ # Download/unzip images
18
+ d='../coco/images' # unzip directory
19
+ url=http://images.cocodataset.org/zips/
20
+ f1='train2017.zip' # 19G, 118k images
21
+ f2='val2017.zip' # 1G, 5k images
22
+ f3='test2017.zip' # 7G, 41k images (optional)
23
+ for f in $f1 $f2 $f3; do
24
+ echo 'Downloading' $url$f '...'
25
+ curl -L $url$f -o $f && unzip -q $f -d $d && rm $f & # download, unzip, remove in background
26
+ done
27
+ wait # finish background tasks
scripts/get_pretrain.sh ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ curl -c ./cookie -s -L "https://drive.google.com/uc?export=download&id=1Tdn3yqpZ79X7R1Ql0zNlNScB1Dv9Fp76" > /dev/null
2
+ curl -Lb ./cookie "https://drive.google.com/uc?export=download&confirm=`awk '/download/ {print $NF}' ./cookie`&id=1Tdn3yqpZ79X7R1Ql0zNlNScB1Dv9Fp76" -o yolor_p6.pt
3
+ rm ./cookie
4
+
5
+ curl -c ./cookie -s -L "https://drive.google.com/uc?export=download&id=1UflcHlN5ERPdhahMivQYCbWWw7d2wY7U" > /dev/null
6
+ curl -Lb ./cookie "https://drive.google.com/uc?export=download&confirm=`awk '/download/ {print $NF}' ./cookie`&id=1UflcHlN5ERPdhahMivQYCbWWw7d2wY7U" -o yolor_w6.pt
7
+ rm ./cookie
test.py ADDED
@@ -0,0 +1,344 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import argparse
2
+ import glob
3
+ import json
4
+ import os
5
+ from pathlib import Path
6
+
7
+ import numpy as np
8
+ import torch
9
+ import yaml
10
+ from tqdm import tqdm
11
+
12
+ from utils.google_utils import attempt_load
13
+ from utils.datasets import create_dataloader
14
+ from utils.general import coco80_to_coco91_class, check_dataset, check_file, check_img_size, box_iou, \
15
+ non_max_suppression, scale_coords, xyxy2xywh, xywh2xyxy, clip_coords, set_logging, increment_path
16
+ from utils.loss import compute_loss
17
+ from utils.metrics import ap_per_class
18
+ from utils.plots import plot_images, output_to_target
19
+ from utils.torch_utils import select_device, time_synchronized
20
+
21
+ from models.models import *
22
+
23
+ def load_classes(path):
24
+ # Loads *.names file at 'path'
25
+ with open(path, 'r') as f:
26
+ names = f.read().split('\n')
27
+ return list(filter(None, names)) # filter removes empty strings (such as last line)
28
+
29
+
30
+ def test(data,
31
+ weights=None,
32
+ batch_size=16,
33
+ imgsz=640,
34
+ conf_thres=0.001,
35
+ iou_thres=0.6, # for NMS
36
+ save_json=False,
37
+ single_cls=False,
38
+ augment=False,
39
+ verbose=False,
40
+ model=None,
41
+ dataloader=None,
42
+ save_dir=Path(''), # for saving images
43
+ save_txt=False, # for auto-labelling
44
+ save_conf=False,
45
+ plots=True,
46
+ log_imgs=0): # number of logged images
47
+
48
+ # Initialize/load model and set device
49
+ training = model is not None
50
+ if training: # called by train.py
51
+ device = next(model.parameters()).device # get model device
52
+
53
+ else: # called directly
54
+ set_logging()
55
+ device = select_device(opt.device, batch_size=batch_size)
56
+ save_txt = opt.save_txt # save *.txt labels
57
+
58
+ # Directories
59
+ save_dir = Path(increment_path(Path(opt.project) / opt.name, exist_ok=opt.exist_ok)) # increment run
60
+ (save_dir / 'labels' if save_txt else save_dir).mkdir(parents=True, exist_ok=True) # make dir
61
+
62
+ # Load model
63
+ model = Darknet(opt.cfg).to(device)
64
+
65
+ # load model
66
+ try:
67
+ ckpt = torch.load(weights[0], map_location=device) # load checkpoint
68
+ ckpt['model'] = {k: v for k, v in ckpt['model'].items() if model.state_dict()[k].numel() == v.numel()}
69
+ model.load_state_dict(ckpt['model'], strict=False)
70
+ except:
71
+ load_darknet_weights(model, weights[0])
72
+ imgsz = check_img_size(imgsz, s=64) # check img_size
73
+
74
+ # Half
75
+ half = device.type != 'cpu' # half precision only supported on CUDA
76
+ if half:
77
+ model.half()
78
+
79
+ # Configure
80
+ model.eval()
81
+ is_coco = data.endswith('coco.yaml') # is COCO dataset
82
+ with open(data) as f:
83
+ data = yaml.load(f, Loader=yaml.FullLoader) # model dict
84
+ check_dataset(data) # check
85
+ nc = 1 if single_cls else int(data['nc']) # number of classes
86
+ iouv = torch.linspace(0.5, 0.95, 10).to(device) # iou vector for mAP@0.5:0.95
87
+ niou = iouv.numel()
88
+
89
+ # Logging
90
+ log_imgs, wandb = min(log_imgs, 100), None # ceil
91
+ try:
92
+ import wandb # Weights & Biases
93
+ except ImportError:
94
+ log_imgs = 0
95
+
96
+ # Dataloader
97
+ if not training:
98
+ img = torch.zeros((1, 3, imgsz, imgsz), device=device) # init img
99
+ _ = model(img.half() if half else img) if device.type != 'cpu' else None # run once
100
+ path = data['test'] if opt.task == 'test' else data['val'] # path to val/test images
101
+ dataloader = create_dataloader(path, imgsz, batch_size, 64, opt, pad=0.5, rect=True)[0]
102
+
103
+ seen = 0
104
+ try:
105
+ names = model.names if hasattr(model, 'names') else model.module.names
106
+ except:
107
+ names = load_classes(opt.names)
108
+ coco91class = coco80_to_coco91_class()
109
+ s = ('%20s' + '%12s' * 6) % ('Class', 'Images', 'Targets', 'P', 'R', 'mAP@.5', 'mAP@.5:.95')
110
+ p, r, f1, mp, mr, map50, map, t0, t1 = 0., 0., 0., 0., 0., 0., 0., 0., 0.
111
+ loss = torch.zeros(3, device=device)
112
+ jdict, stats, ap, ap_class, wandb_images = [], [], [], [], []
113
+ for batch_i, (img, targets, paths, shapes) in enumerate(tqdm(dataloader, desc=s)):
114
+ img = img.to(device, non_blocking=True)
115
+ img = img.half() if half else img.float() # uint8 to fp16/32
116
+ img /= 255.0 # 0 - 255 to 0.0 - 1.0
117
+ targets = targets.to(device)
118
+ nb, _, height, width = img.shape # batch size, channels, height, width
119
+ whwh = torch.Tensor([width, height, width, height]).to(device)
120
+
121
+ # Disable gradients
122
+ with torch.no_grad():
123
+ # Run model
124
+ t = time_synchronized()
125
+ inf_out, train_out = model(img, augment=augment) # inference and training outputs
126
+ t0 += time_synchronized() - t
127
+
128
+ # Compute loss
129
+ if training: # if model has loss hyperparameters
130
+ loss += compute_loss([x.float() for x in train_out], targets, model)[1][:3] # box, obj, cls
131
+
132
+ # Run NMS
133
+ t = time_synchronized()
134
+ output = non_max_suppression(inf_out, conf_thres=conf_thres, iou_thres=iou_thres)
135
+ t1 += time_synchronized() - t
136
+
137
+ # Statistics per image
138
+ for si, pred in enumerate(output):
139
+ labels = targets[targets[:, 0] == si, 1:]
140
+ nl = len(labels)
141
+ tcls = labels[:, 0].tolist() if nl else [] # target class
142
+ seen += 1
143
+
144
+ if len(pred) == 0:
145
+ if nl:
146
+ stats.append((torch.zeros(0, niou, dtype=torch.bool), torch.Tensor(), torch.Tensor(), tcls))
147
+ continue
148
+
149
+ # Append to text file
150
+ path = Path(paths[si])
151
+ if save_txt:
152
+ gn = torch.tensor(shapes[si][0])[[1, 0, 1, 0]] # normalization gain whwh
153
+ x = pred.clone()
154
+ x[:, :4] = scale_coords(img[si].shape[1:], x[:, :4], shapes[si][0], shapes[si][1]) # to original
155
+ for *xyxy, conf, cls in x:
156
+ xywh = (xyxy2xywh(torch.tensor(xyxy).view(1, 4)) / gn).view(-1).tolist() # normalized xywh
157
+ line = (cls, *xywh, conf) if save_conf else (cls, *xywh) # label format
158
+ with open(save_dir / 'labels' / (path.stem + '.txt'), 'a') as f:
159
+ f.write(('%g ' * len(line)).rstrip() % line + '\n')
160
+
161
+ # W&B logging
162
+ if plots and len(wandb_images) < log_imgs:
163
+ box_data = [{"position": {"minX": xyxy[0], "minY": xyxy[1], "maxX": xyxy[2], "maxY": xyxy[3]},
164
+ "class_id": int(cls),
165
+ "box_caption": "%s %.3f" % (names[cls], conf),
166
+ "scores": {"class_score": conf},
167
+ "domain": "pixel"} for *xyxy, conf, cls in pred.tolist()]
168
+ boxes = {"predictions": {"box_data": box_data, "class_labels": names}}
169
+ wandb_images.append(wandb.Image(img[si], boxes=boxes, caption=path.name))
170
+
171
+ # Clip boxes to image bounds
172
+ clip_coords(pred, (height, width))
173
+
174
+ # Append to pycocotools JSON dictionary
175
+ if save_json:
176
+ # [{"image_id": 42, "category_id": 18, "bbox": [258.15, 41.29, 348.26, 243.78], "score": 0.236}, ...
177
+ image_id = int(path.stem) if path.stem.isnumeric() else path.stem
178
+ box = pred[:, :4].clone() # xyxy
179
+ scale_coords(img[si].shape[1:], box, shapes[si][0], shapes[si][1]) # to original shape
180
+ box = xyxy2xywh(box) # xywh
181
+ box[:, :2] -= box[:, 2:] / 2 # xy center to top-left corner
182
+ for p, b in zip(pred.tolist(), box.tolist()):
183
+ jdict.append({'image_id': image_id,
184
+ 'category_id': coco91class[int(p[5])] if is_coco else int(p[5]),
185
+ 'bbox': [round(x, 3) for x in b],
186
+ 'score': round(p[4], 5)})
187
+
188
+ # Assign all predictions as incorrect
189
+ correct = torch.zeros(pred.shape[0], niou, dtype=torch.bool, device=device)
190
+ if nl:
191
+ detected = [] # target indices
192
+ tcls_tensor = labels[:, 0]
193
+
194
+ # target boxes
195
+ tbox = xywh2xyxy(labels[:, 1:5]) * whwh
196
+
197
+ # Per target class
198
+ for cls in torch.unique(tcls_tensor):
199
+ ti = (cls == tcls_tensor).nonzero(as_tuple=False).view(-1) # prediction indices
200
+ pi = (cls == pred[:, 5]).nonzero(as_tuple=False).view(-1) # target indices
201
+
202
+ # Search for detections
203
+ if pi.shape[0]:
204
+ # Prediction to target ious
205
+ ious, i = box_iou(pred[pi, :4], tbox[ti]).max(1) # best ious, indices
206
+
207
+ # Append detections
208
+ detected_set = set()
209
+ for j in (ious > iouv[0]).nonzero(as_tuple=False):
210
+ d = ti[i[j]] # detected target
211
+ if d.item() not in detected_set:
212
+ detected_set.add(d.item())
213
+ detected.append(d)
214
+ correct[pi[j]] = ious[j] > iouv # iou_thres is 1xn
215
+ if len(detected) == nl: # all targets already located in image
216
+ break
217
+
218
+ # Append statistics (correct, conf, pcls, tcls)
219
+ stats.append((correct.cpu(), pred[:, 4].cpu(), pred[:, 5].cpu(), tcls))
220
+
221
+ # Plot images
222
+ if plots and batch_i < 3:
223
+ f = save_dir / f'test_batch{batch_i}_labels.jpg' # filename
224
+ plot_images(img, targets, paths, f, names) # labels
225
+ f = save_dir / f'test_batch{batch_i}_pred.jpg'
226
+ plot_images(img, output_to_target(output, width, height), paths, f, names) # predictions
227
+
228
+ # Compute statistics
229
+ stats = [np.concatenate(x, 0) for x in zip(*stats)] # to numpy
230
+ if len(stats) and stats[0].any():
231
+ p, r, ap, f1, ap_class = ap_per_class(*stats, plot=plots, fname=save_dir / 'precision-recall_curve.png')
232
+ p, r, ap50, ap = p[:, 0], r[:, 0], ap[:, 0], ap.mean(1) # [P, R, AP@0.5, AP@0.5:0.95]
233
+ mp, mr, map50, map = p.mean(), r.mean(), ap50.mean(), ap.mean()
234
+ nt = np.bincount(stats[3].astype(np.int64), minlength=nc) # number of targets per class
235
+ else:
236
+ nt = torch.zeros(1)
237
+
238
+ # W&B logging
239
+ if plots and wandb:
240
+ wandb.log({"Images": wandb_images})
241
+ wandb.log({"Validation": [wandb.Image(str(x), caption=x.name) for x in sorted(save_dir.glob('test*.jpg'))]})
242
+
243
+ # Print results
244
+ pf = '%20s' + '%12.3g' * 6 # print format
245
+ print(pf % ('all', seen, nt.sum(), mp, mr, map50, map))
246
+
247
+ # Print results per class
248
+ if verbose and nc > 1 and len(stats):
249
+ for i, c in enumerate(ap_class):
250
+ print(pf % (names[c], seen, nt[c], p[i], r[i], ap50[i], ap[i]))
251
+
252
+ # Print speeds
253
+ t = tuple(x / seen * 1E3 for x in (t0, t1, t0 + t1)) + (imgsz, imgsz, batch_size) # tuple
254
+ if not training:
255
+ print('Speed: %.1f/%.1f/%.1f ms inference/NMS/total per %gx%g image at batch-size %g' % t)
256
+
257
+ # Save JSON
258
+ if save_json and len(jdict):
259
+ w = Path(weights[0] if isinstance(weights, list) else weights).stem if weights is not None else '' # weights
260
+ anno_json = glob.glob('../coco/annotations/instances_val*.json')[0] # annotations json
261
+ pred_json = str(save_dir / f"{w}_predictions.json") # predictions json
262
+ print('\nEvaluating pycocotools mAP... saving %s...' % pred_json)
263
+ with open(pred_json, 'w') as f:
264
+ json.dump(jdict, f)
265
+
266
+ try: # https://github.com/cocodataset/cocoapi/blob/master/PythonAPI/pycocoEvalDemo.ipynb
267
+ from pycocotools.coco import COCO
268
+ from pycocotools.cocoeval import COCOeval
269
+
270
+ anno = COCO(anno_json) # init annotations api
271
+ pred = anno.loadRes(pred_json) # init predictions api
272
+ eval = COCOeval(anno, pred, 'bbox')
273
+ if is_coco:
274
+ eval.params.imgIds = [int(Path(x).stem) for x in dataloader.dataset.img_files] # image IDs to evaluate
275
+ eval.evaluate()
276
+ eval.accumulate()
277
+ eval.summarize()
278
+ map, map50 = eval.stats[:2] # update results (mAP@0.5:0.95, mAP@0.5)
279
+ except Exception as e:
280
+ print('ERROR: pycocotools unable to run: %s' % e)
281
+
282
+ # Return results
283
+ if not training:
284
+ print('Results saved to %s' % save_dir)
285
+ model.float() # for training
286
+ maps = np.zeros(nc) + map
287
+ for i, c in enumerate(ap_class):
288
+ maps[c] = ap[i]
289
+ return (mp, mr, map50, map, *(loss.cpu() / len(dataloader)).tolist()), maps, t
290
+
291
+
292
+ if __name__ == '__main__':
293
+ parser = argparse.ArgumentParser(prog='test.py')
294
+ parser.add_argument('--weights', nargs='+', type=str, default='yolor_p6.pt', help='model.pt path(s)')
295
+ parser.add_argument('--data', type=str, default='data/coco.yaml', help='*.data path')
296
+ parser.add_argument('--batch-size', type=int, default=32, help='size of each image batch')
297
+ parser.add_argument('--img-size', type=int, default=1280, help='inference size (pixels)')
298
+ parser.add_argument('--conf-thres', type=float, default=0.001, help='object confidence threshold')
299
+ parser.add_argument('--iou-thres', type=float, default=0.65, help='IOU threshold for NMS')
300
+ parser.add_argument('--task', default='val', help="'val', 'test', 'study'")
301
+ parser.add_argument('--device', default='', help='cuda device, i.e. 0 or 0,1,2,3 or cpu')
302
+ parser.add_argument('--single-cls', action='store_true', help='treat as single-class dataset')
303
+ parser.add_argument('--augment', action='store_true', help='augmented inference')
304
+ parser.add_argument('--verbose', action='store_true', help='report mAP by class')
305
+ parser.add_argument('--save-txt', action='store_true', help='save results to *.txt')
306
+ parser.add_argument('--save-conf', action='store_true', help='save confidences in --save-txt labels')
307
+ parser.add_argument('--save-json', action='store_true', help='save a cocoapi-compatible JSON results file')
308
+ parser.add_argument('--project', default='runs/test', help='save to project/name')
309
+ parser.add_argument('--name', default='exp', help='save to project/name')
310
+ parser.add_argument('--exist-ok', action='store_true', help='existing project/name ok, do not increment')
311
+ parser.add_argument('--cfg', type=str, default='cfg/yolor_p6.cfg', help='*.cfg path')
312
+ parser.add_argument('--names', type=str, default='data/coco.names', help='*.cfg path')
313
+ opt = parser.parse_args()
314
+ opt.save_json |= opt.data.endswith('coco.yaml')
315
+ opt.data = check_file(opt.data) # check file
316
+ print(opt)
317
+
318
+ if opt.task in ['val', 'test']: # run normally
319
+ test(opt.data,
320
+ opt.weights,
321
+ opt.batch_size,
322
+ opt.img_size,
323
+ opt.conf_thres,
324
+ opt.iou_thres,
325
+ opt.save_json,
326
+ opt.single_cls,
327
+ opt.augment,
328
+ opt.verbose,
329
+ save_txt=opt.save_txt,
330
+ save_conf=opt.save_conf,
331
+ )
332
+
333
+ elif opt.task == 'study': # run over a range of settings and save/plot
334
+ for weights in ['yolor_p6.pt', 'yolor_w6.pt']:
335
+ f = 'study_%s_%s.txt' % (Path(opt.data).stem, Path(weights).stem) # filename to save to
336
+ x = list(range(320, 800, 64)) # x axis
337
+ y = [] # y axis
338
+ for i in x: # img-size
339
+ print('\nRunning %s point %s...' % (f, i))
340
+ r, _, t = test(opt.data, weights, opt.batch_size, i, opt.conf_thres, opt.iou_thres, opt.save_json)
341
+ y.append(r + t) # results and times
342
+ np.savetxt(f, y, fmt='%10.4g') # save
343
+ os.system('zip -r study.zip study_*.txt')
344
+ # utils.general.plot_study_txt(f, x) # plot
train.py ADDED
@@ -0,0 +1,619 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import argparse
2
+ import logging
3
+ import math
4
+ import os
5
+ import random
6
+ import time
7
+ from pathlib import Path
8
+ from warnings import warn
9
+
10
+ import numpy as np
11
+ import torch.distributed as dist
12
+ import torch.nn as nn
13
+ import torch.nn.functional as F
14
+ import torch.optim as optim
15
+ import torch.optim.lr_scheduler as lr_scheduler
16
+ import torch.utils.data
17
+ import yaml
18
+ from torch.cuda import amp
19
+ from torch.nn.parallel import DistributedDataParallel as DDP
20
+ from torch.utils.tensorboard import SummaryWriter
21
+ from tqdm import tqdm
22
+
23
+ import test # import test.py to get mAP after each epoch
24
+ #from models.yolo import Model
25
+ from models.models import *
26
+ from utils.autoanchor import check_anchors
27
+ from utils.datasets import create_dataloader
28
+ from utils.general import labels_to_class_weights, increment_path, labels_to_image_weights, init_seeds, \
29
+ fitness, fitness_p, fitness_r, fitness_ap50, fitness_ap, fitness_f, strip_optimizer, get_latest_run,\
30
+ check_dataset, check_file, check_git_status, check_img_size, print_mutation, set_logging
31
+ from utils.google_utils import attempt_download
32
+ from utils.loss import compute_loss
33
+ from utils.plots import plot_images, plot_labels, plot_results, plot_evolution
34
+ from utils.torch_utils import ModelEMA, select_device, intersect_dicts, torch_distributed_zero_first
35
+
36
+ logger = logging.getLogger(__name__)
37
+
38
+ try:
39
+ import wandb
40
+ except ImportError:
41
+ wandb = None
42
+ logger.info("Install Weights & Biases for experiment logging via 'pip install wandb' (recommended)")
43
+
44
+ def train(hyp, opt, device, tb_writer=None, wandb=None):
45
+ logger.info(f'Hyperparameters {hyp}')
46
+ save_dir, epochs, batch_size, total_batch_size, weights, rank = \
47
+ Path(opt.save_dir), opt.epochs, opt.batch_size, opt.total_batch_size, opt.weights, opt.global_rank
48
+
49
+ # Directories
50
+ wdir = save_dir / 'weights'
51
+ wdir.mkdir(parents=True, exist_ok=True) # make dir
52
+ last = wdir / 'last.pt'
53
+ best = wdir / 'best.pt'
54
+ results_file = save_dir / 'results.txt'
55
+
56
+ # Save run settings
57
+ with open(save_dir / 'hyp.yaml', 'w') as f:
58
+ yaml.dump(hyp, f, sort_keys=False)
59
+ with open(save_dir / 'opt.yaml', 'w') as f:
60
+ yaml.dump(vars(opt), f, sort_keys=False)
61
+
62
+ # Configure
63
+ plots = not opt.evolve # create plots
64
+ cuda = device.type != 'cpu'
65
+ init_seeds(2 + rank)
66
+ with open(opt.data) as f:
67
+ data_dict = yaml.load(f, Loader=yaml.FullLoader) # data dict
68
+ with torch_distributed_zero_first(rank):
69
+ check_dataset(data_dict) # check
70
+ train_path = data_dict['train']
71
+ test_path = data_dict['val']
72
+ nc, names = (1, ['item']) if opt.single_cls else (int(data_dict['nc']), data_dict['names']) # number classes, names
73
+ assert len(names) == nc, '%g names found for nc=%g dataset in %s' % (len(names), nc, opt.data) # check
74
+
75
+ # Model
76
+ pretrained = weights.endswith('.pt')
77
+ if pretrained:
78
+ with torch_distributed_zero_first(rank):
79
+ attempt_download(weights) # download if not found locally
80
+ ckpt = torch.load(weights, map_location=device) # load checkpoint
81
+ model = Darknet(opt.cfg).to(device) # create
82
+ state_dict = {k: v for k, v in ckpt['model'].items() if model.state_dict()[k].numel() == v.numel()}
83
+ model.load_state_dict(state_dict, strict=False)
84
+ print('Transferred %g/%g items from %s' % (len(state_dict), len(model.state_dict()), weights)) # report
85
+ else:
86
+ model = Darknet(opt.cfg).to(device) # create
87
+
88
+ # Optimizer
89
+ nbs = 64 # nominal batch size
90
+ accumulate = max(round(nbs / total_batch_size), 1) # accumulate loss before optimizing
91
+ hyp['weight_decay'] *= total_batch_size * accumulate / nbs # scale weight_decay
92
+
93
+ pg0, pg1, pg2 = [], [], [] # optimizer parameter groups
94
+ for k, v in dict(model.named_parameters()).items():
95
+ if '.bias' in k:
96
+ pg2.append(v) # biases
97
+ elif 'Conv2d.weight' in k:
98
+ pg1.append(v) # apply weight_decay
99
+ elif 'm.weight' in k:
100
+ pg1.append(v) # apply weight_decay
101
+ elif 'w.weight' in k:
102
+ pg1.append(v) # apply weight_decay
103
+ else:
104
+ pg0.append(v) # all else
105
+
106
+ if opt.adam:
107
+ optimizer = optim.Adam(pg0, lr=hyp['lr0'], betas=(hyp['momentum'], 0.999)) # adjust beta1 to momentum
108
+ else:
109
+ optimizer = optim.SGD(pg0, lr=hyp['lr0'], momentum=hyp['momentum'], nesterov=True)
110
+
111
+ optimizer.add_param_group({'params': pg1, 'weight_decay': hyp['weight_decay']}) # add pg1 with weight_decay
112
+ optimizer.add_param_group({'params': pg2}) # add pg2 (biases)
113
+ logger.info('Optimizer groups: %g .bias, %g conv.weight, %g other' % (len(pg2), len(pg1), len(pg0)))
114
+ del pg0, pg1, pg2
115
+
116
+ # Scheduler https://arxiv.org/pdf/1812.01187.pdf
117
+ # https://pytorch.org/docs/stable/_modules/torch/optim/lr_scheduler.html#OneCycleLR
118
+ lf = lambda x: ((1 + math.cos(x * math.pi / epochs)) / 2) * (1 - hyp['lrf']) + hyp['lrf'] # cosine
119
+ scheduler = lr_scheduler.LambdaLR(optimizer, lr_lambda=lf)
120
+ # plot_lr_scheduler(optimizer, scheduler, epochs)
121
+
122
+ # Logging
123
+ if wandb and wandb.run is None:
124
+ opt.hyp = hyp # add hyperparameters
125
+ wandb_run = wandb.init(config=opt, resume="allow",
126
+ project='YOLOR' if opt.project == 'runs/train' else Path(opt.project).stem,
127
+ name=save_dir.stem,
128
+ id=ckpt.get('wandb_id') if 'ckpt' in locals() else None)
129
+
130
+ # Resume
131
+ start_epoch, best_fitness = 0, 0.0
132
+ best_fitness_p, best_fitness_r, best_fitness_ap50, best_fitness_ap, best_fitness_f = 0.0, 0.0, 0.0, 0.0, 0.0
133
+ if pretrained:
134
+ # Optimizer
135
+ if ckpt['optimizer'] is not None:
136
+ optimizer.load_state_dict(ckpt['optimizer'])
137
+ best_fitness = ckpt['best_fitness']
138
+ best_fitness_p = ckpt['best_fitness_p']
139
+ best_fitness_r = ckpt['best_fitness_r']
140
+ best_fitness_ap50 = ckpt['best_fitness_ap50']
141
+ best_fitness_ap = ckpt['best_fitness_ap']
142
+ best_fitness_f = ckpt['best_fitness_f']
143
+
144
+ # Results
145
+ if ckpt.get('training_results') is not None:
146
+ with open(results_file, 'w') as file:
147
+ file.write(ckpt['training_results']) # write results.txt
148
+
149
+ # Epochs
150
+ start_epoch = ckpt['epoch'] + 1
151
+ if opt.resume:
152
+ assert start_epoch > 0, '%s training to %g epochs is finished, nothing to resume.' % (weights, epochs)
153
+ if epochs < start_epoch:
154
+ logger.info('%s has been trained for %g epochs. Fine-tuning for %g additional epochs.' %
155
+ (weights, ckpt['epoch'], epochs))
156
+ epochs += ckpt['epoch'] # finetune additional epochs
157
+
158
+ del ckpt, state_dict
159
+
160
+ # Image sizes
161
+ gs = 64 #int(max(model.stride)) # grid size (max stride)
162
+ imgsz, imgsz_test = [check_img_size(x, gs) for x in opt.img_size] # verify imgsz are gs-multiples
163
+
164
+ # DP mode
165
+ if cuda and rank == -1 and torch.cuda.device_count() > 1:
166
+ model = torch.nn.DataParallel(model)
167
+
168
+ # SyncBatchNorm
169
+ if opt.sync_bn and cuda and rank != -1:
170
+ model = torch.nn.SyncBatchNorm.convert_sync_batchnorm(model).to(device)
171
+ logger.info('Using SyncBatchNorm()')
172
+
173
+ # EMA
174
+ ema = ModelEMA(model) if rank in [-1, 0] else None
175
+
176
+ # DDP mode
177
+ if cuda and rank != -1:
178
+ model = DDP(model, device_ids=[opt.local_rank], output_device=opt.local_rank)
179
+
180
+ # Trainloader
181
+ dataloader, dataset = create_dataloader(train_path, imgsz, batch_size, gs, opt,
182
+ hyp=hyp, augment=True, cache=opt.cache_images, rect=opt.rect,
183
+ rank=rank, world_size=opt.world_size, workers=opt.workers)
184
+ mlc = np.concatenate(dataset.labels, 0)[:, 0].max() # max label class
185
+ nb = len(dataloader) # number of batches
186
+ assert mlc < nc, 'Label class %g exceeds nc=%g in %s. Possible class labels are 0-%g' % (mlc, nc, opt.data, nc - 1)
187
+
188
+ # Process 0
189
+ if rank in [-1, 0]:
190
+ ema.updates = start_epoch * nb // accumulate # set EMA updates
191
+ testloader = create_dataloader(test_path, imgsz_test, batch_size*2, gs, opt,
192
+ hyp=hyp, cache=opt.cache_images and not opt.notest, rect=True,
193
+ rank=-1, world_size=opt.world_size, workers=opt.workers)[0] # testloader
194
+
195
+ if not opt.resume:
196
+ labels = np.concatenate(dataset.labels, 0)
197
+ c = torch.tensor(labels[:, 0]) # classes
198
+ # cf = torch.bincount(c.long(), minlength=nc) + 1. # frequency
199
+ # model._initialize_biases(cf.to(device))
200
+ if plots:
201
+ plot_labels(labels, save_dir=save_dir)
202
+ if tb_writer:
203
+ tb_writer.add_histogram('classes', c, 0)
204
+ if wandb:
205
+ wandb.log({"Labels": [wandb.Image(str(x), caption=x.name) for x in save_dir.glob('*labels*.png')]})
206
+
207
+ # Anchors
208
+ # if not opt.noautoanchor:
209
+ # check_anchors(dataset, model=model, thr=hyp['anchor_t'], imgsz=imgsz)
210
+
211
+ # Model parameters
212
+ hyp['cls'] *= nc / 80. # scale coco-tuned hyp['cls'] to current dataset
213
+ model.nc = nc # attach number of classes to model
214
+ model.hyp = hyp # attach hyperparameters to model
215
+ model.gr = 1.0 # iou loss ratio (obj_loss = 1.0 or iou)
216
+ model.class_weights = labels_to_class_weights(dataset.labels, nc).to(device) # attach class weights
217
+ model.names = names
218
+
219
+ # Start training
220
+ t0 = time.time()
221
+ nw = max(round(hyp['warmup_epochs'] * nb), 1000) # number of warmup iterations, max(3 epochs, 1k iterations)
222
+ # nw = min(nw, (epochs - start_epoch) / 2 * nb) # limit warmup to < 1/2 of training
223
+ maps = np.zeros(nc) # mAP per class
224
+ results = (0, 0, 0, 0, 0, 0, 0) # P, R, mAP@.5, mAP@.5-.95, val_loss(box, obj, cls)
225
+ scheduler.last_epoch = start_epoch - 1 # do not move
226
+ scaler = amp.GradScaler(enabled=cuda)
227
+ logger.info('Image sizes %g train, %g test\n'
228
+ 'Using %g dataloader workers\nLogging results to %s\n'
229
+ 'Starting training for %g epochs...' % (imgsz, imgsz_test, dataloader.num_workers, save_dir, epochs))
230
+
231
+ torch.save(model, wdir / 'init.pt')
232
+
233
+ for epoch in range(start_epoch, epochs): # epoch ------------------------------------------------------------------
234
+ model.train()
235
+
236
+ # Update image weights (optional)
237
+ if opt.image_weights:
238
+ # Generate indices
239
+ if rank in [-1, 0]:
240
+ cw = model.class_weights.cpu().numpy() * (1 - maps) ** 2 # class weights
241
+ iw = labels_to_image_weights(dataset.labels, nc=nc, class_weights=cw) # image weights
242
+ dataset.indices = random.choices(range(dataset.n), weights=iw, k=dataset.n) # rand weighted idx
243
+ # Broadcast if DDP
244
+ if rank != -1:
245
+ indices = (torch.tensor(dataset.indices) if rank == 0 else torch.zeros(dataset.n)).int()
246
+ dist.broadcast(indices, 0)
247
+ if rank != 0:
248
+ dataset.indices = indices.cpu().numpy()
249
+
250
+ # Update mosaic border
251
+ # b = int(random.uniform(0.25 * imgsz, 0.75 * imgsz + gs) // gs * gs)
252
+ # dataset.mosaic_border = [b - imgsz, -b] # height, width borders
253
+
254
+ mloss = torch.zeros(4, device=device) # mean losses
255
+ if rank != -1:
256
+ dataloader.sampler.set_epoch(epoch)
257
+ pbar = enumerate(dataloader)
258
+ logger.info(('\n' + '%10s' * 8) % ('Epoch', 'gpu_mem', 'box', 'obj', 'cls', 'total', 'targets', 'img_size'))
259
+ if rank in [-1, 0]:
260
+ pbar = tqdm(pbar, total=nb) # progress bar
261
+ optimizer.zero_grad()
262
+ for i, (imgs, targets, paths, _) in pbar: # batch -------------------------------------------------------------
263
+ ni = i + nb * epoch # number integrated batches (since train start)
264
+ imgs = imgs.to(device, non_blocking=True).float() / 255.0 # uint8 to float32, 0-255 to 0.0-1.0
265
+
266
+ # Warmup
267
+ if ni <= nw:
268
+ xi = [0, nw] # x interp
269
+ # model.gr = np.interp(ni, xi, [0.0, 1.0]) # iou loss ratio (obj_loss = 1.0 or iou)
270
+ accumulate = max(1, np.interp(ni, xi, [1, nbs / total_batch_size]).round())
271
+ for j, x in enumerate(optimizer.param_groups):
272
+ # bias lr falls from 0.1 to lr0, all other lrs rise from 0.0 to lr0
273
+ x['lr'] = np.interp(ni, xi, [hyp['warmup_bias_lr'] if j == 2 else 0.0, x['initial_lr'] * lf(epoch)])
274
+ if 'momentum' in x:
275
+ x['momentum'] = np.interp(ni, xi, [hyp['warmup_momentum'], hyp['momentum']])
276
+
277
+ # Multi-scale
278
+ if opt.multi_scale:
279
+ sz = random.randrange(imgsz * 0.5, imgsz * 1.5 + gs) // gs * gs # size
280
+ sf = sz / max(imgs.shape[2:]) # scale factor
281
+ if sf != 1:
282
+ ns = [math.ceil(x * sf / gs) * gs for x in imgs.shape[2:]] # new shape (stretched to gs-multiple)
283
+ imgs = F.interpolate(imgs, size=ns, mode='bilinear', align_corners=False)
284
+
285
+ # Forward
286
+ with amp.autocast(enabled=cuda):
287
+ pred = model(imgs) # forward
288
+ loss, loss_items = compute_loss(pred, targets.to(device), model) # loss scaled by batch_size
289
+ if rank != -1:
290
+ loss *= opt.world_size # gradient averaged between devices in DDP mode
291
+
292
+ # Backward
293
+ scaler.scale(loss).backward()
294
+
295
+ # Optimize
296
+ if ni % accumulate == 0:
297
+ scaler.step(optimizer) # optimizer.step
298
+ scaler.update()
299
+ optimizer.zero_grad()
300
+ if ema:
301
+ ema.update(model)
302
+
303
+ # Print
304
+ if rank in [-1, 0]:
305
+ mloss = (mloss * i + loss_items) / (i + 1) # update mean losses
306
+ mem = '%.3gG' % (torch.cuda.memory_reserved() / 1E9 if torch.cuda.is_available() else 0) # (GB)
307
+ s = ('%10s' * 2 + '%10.4g' * 6) % (
308
+ '%g/%g' % (epoch, epochs - 1), mem, *mloss, targets.shape[0], imgs.shape[-1])
309
+ pbar.set_description(s)
310
+
311
+ # Plot
312
+ if plots and ni < 3:
313
+ f = save_dir / f'train_batch{ni}.jpg' # filename
314
+ plot_images(images=imgs, targets=targets, paths=paths, fname=f)
315
+ # if tb_writer:
316
+ # tb_writer.add_image(f, result, dataformats='HWC', global_step=epoch)
317
+ # tb_writer.add_graph(model, imgs) # add model to tensorboard
318
+ elif plots and ni == 3 and wandb:
319
+ wandb.log({"Mosaics": [wandb.Image(str(x), caption=x.name) for x in save_dir.glob('train*.jpg')]})
320
+
321
+ # end batch ------------------------------------------------------------------------------------------------
322
+ # end epoch ----------------------------------------------------------------------------------------------------
323
+
324
+ # Scheduler
325
+ lr = [x['lr'] for x in optimizer.param_groups] # for tensorboard
326
+ scheduler.step()
327
+
328
+ # DDP process 0 or single-GPU
329
+ if rank in [-1, 0]:
330
+ # mAP
331
+ if ema:
332
+ ema.update_attr(model)
333
+ final_epoch = epoch + 1 == epochs
334
+ if not opt.notest or final_epoch: # Calculate mAP
335
+ if epoch >= 3:
336
+ results, maps, times = test.test(opt.data,
337
+ batch_size=batch_size*2,
338
+ imgsz=imgsz_test,
339
+ model=ema.ema.module if hasattr(ema.ema, 'module') else ema.ema,
340
+ single_cls=opt.single_cls,
341
+ dataloader=testloader,
342
+ save_dir=save_dir,
343
+ plots=plots and final_epoch,
344
+ log_imgs=opt.log_imgs if wandb else 0)
345
+
346
+ # Write
347
+ with open(results_file, 'a') as f:
348
+ f.write(s + '%10.4g' * 7 % results + '\n') # P, R, mAP@.5, mAP@.5-.95, val_loss(box, obj, cls)
349
+ if len(opt.name) and opt.bucket:
350
+ os.system('gsutil cp %s gs://%s/results/results%s.txt' % (results_file, opt.bucket, opt.name))
351
+
352
+ # Log
353
+ tags = ['train/box_loss', 'train/obj_loss', 'train/cls_loss', # train loss
354
+ 'metrics/precision', 'metrics/recall', 'metrics/mAP_0.5', 'metrics/mAP_0.5:0.95',
355
+ 'val/box_loss', 'val/obj_loss', 'val/cls_loss', # val loss
356
+ 'x/lr0', 'x/lr1', 'x/lr2'] # params
357
+ for x, tag in zip(list(mloss[:-1]) + list(results) + lr, tags):
358
+ if tb_writer:
359
+ tb_writer.add_scalar(tag, x, epoch) # tensorboard
360
+ if wandb:
361
+ wandb.log({tag: x}) # W&B
362
+
363
+ # Update best mAP
364
+ fi = fitness(np.array(results).reshape(1, -1)) # weighted combination of [P, R, mAP@.5, mAP@.5-.95]
365
+ fi_p = fitness_p(np.array(results).reshape(1, -1)) # weighted combination of [P, R, mAP@.5, mAP@.5-.95]
366
+ fi_r = fitness_r(np.array(results).reshape(1, -1)) # weighted combination of [P, R, mAP@.5, mAP@.5-.95]
367
+ fi_ap50 = fitness_ap50(np.array(results).reshape(1, -1)) # weighted combination of [P, R, mAP@.5, mAP@.5-.95]
368
+ fi_ap = fitness_ap(np.array(results).reshape(1, -1)) # weighted combination of [P, R, mAP@.5, mAP@.5-.95]
369
+ if (fi_p > 0.0) or (fi_r > 0.0):
370
+ fi_f = fitness_f(np.array(results).reshape(1, -1)) # weighted combination of [P, R, mAP@.5, mAP@.5-.95]
371
+ else:
372
+ fi_f = 0.0
373
+ if fi > best_fitness:
374
+ best_fitness = fi
375
+ if fi_p > best_fitness_p:
376
+ best_fitness_p = fi_p
377
+ if fi_r > best_fitness_r:
378
+ best_fitness_r = fi_r
379
+ if fi_ap50 > best_fitness_ap50:
380
+ best_fitness_ap50 = fi_ap50
381
+ if fi_ap > best_fitness_ap:
382
+ best_fitness_ap = fi_ap
383
+ if fi_f > best_fitness_f:
384
+ best_fitness_f = fi_f
385
+
386
+ # Save model
387
+ save = (not opt.nosave) or (final_epoch and not opt.evolve)
388
+ if save:
389
+ with open(results_file, 'r') as f: # create checkpoint
390
+ ckpt = {'epoch': epoch,
391
+ 'best_fitness': best_fitness,
392
+ 'best_fitness_p': best_fitness_p,
393
+ 'best_fitness_r': best_fitness_r,
394
+ 'best_fitness_ap50': best_fitness_ap50,
395
+ 'best_fitness_ap': best_fitness_ap,
396
+ 'best_fitness_f': best_fitness_f,
397
+ 'training_results': f.read(),
398
+ 'model': ema.ema.module.state_dict() if hasattr(ema, 'module') else ema.ema.state_dict(),
399
+ 'optimizer': None if final_epoch else optimizer.state_dict(),
400
+ 'wandb_id': wandb_run.id if wandb else None}
401
+
402
+ # Save last, best and delete
403
+ torch.save(ckpt, last)
404
+ if best_fitness == fi:
405
+ torch.save(ckpt, best)
406
+ if (best_fitness == fi) and (epoch >= 200):
407
+ torch.save(ckpt, wdir / 'best_{:03d}.pt'.format(epoch))
408
+ if best_fitness == fi:
409
+ torch.save(ckpt, wdir / 'best_overall.pt')
410
+ if best_fitness_p == fi_p:
411
+ torch.save(ckpt, wdir / 'best_p.pt')
412
+ if best_fitness_r == fi_r:
413
+ torch.save(ckpt, wdir / 'best_r.pt')
414
+ if best_fitness_ap50 == fi_ap50:
415
+ torch.save(ckpt, wdir / 'best_ap50.pt')
416
+ if best_fitness_ap == fi_ap:
417
+ torch.save(ckpt, wdir / 'best_ap.pt')
418
+ if best_fitness_f == fi_f:
419
+ torch.save(ckpt, wdir / 'best_f.pt')
420
+ if epoch == 0:
421
+ torch.save(ckpt, wdir / 'epoch_{:03d}.pt'.format(epoch))
422
+ if ((epoch+1) % 25) == 0:
423
+ torch.save(ckpt, wdir / 'epoch_{:03d}.pt'.format(epoch))
424
+ if epoch >= (epochs-5):
425
+ torch.save(ckpt, wdir / 'last_{:03d}.pt'.format(epoch))
426
+ elif epoch >= 420:
427
+ torch.save(ckpt, wdir / 'last_{:03d}.pt'.format(epoch))
428
+ del ckpt
429
+ # end epoch ----------------------------------------------------------------------------------------------------
430
+ # end training
431
+
432
+ if rank in [-1, 0]:
433
+ # Strip optimizers
434
+ n = opt.name if opt.name.isnumeric() else ''
435
+ fresults, flast, fbest = save_dir / f'results{n}.txt', wdir / f'last{n}.pt', wdir / f'best{n}.pt'
436
+ for f1, f2 in zip([wdir / 'last.pt', wdir / 'best.pt', results_file], [flast, fbest, fresults]):
437
+ if f1.exists():
438
+ os.rename(f1, f2) # rename
439
+ if str(f2).endswith('.pt'): # is *.pt
440
+ strip_optimizer(f2) # strip optimizer
441
+ os.system('gsutil cp %s gs://%s/weights' % (f2, opt.bucket)) if opt.bucket else None # upload
442
+ # Finish
443
+ if plots:
444
+ plot_results(save_dir=save_dir) # save as results.png
445
+ if wandb:
446
+ wandb.log({"Results": [wandb.Image(str(save_dir / x), caption=x) for x in
447
+ ['results.png', 'precision-recall_curve.png']]})
448
+ logger.info('%g epochs completed in %.3f hours.\n' % (epoch - start_epoch + 1, (time.time() - t0) / 3600))
449
+ else:
450
+ dist.destroy_process_group()
451
+
452
+ wandb.run.finish() if wandb and wandb.run else None
453
+ torch.cuda.empty_cache()
454
+ return results
455
+
456
+
457
+ if __name__ == '__main__':
458
+ parser = argparse.ArgumentParser()
459
+ parser.add_argument('--weights', type=str, default='yolor_p6.pt', help='initial weights path')
460
+ parser.add_argument('--cfg', type=str, default='', help='model.yaml path')
461
+ parser.add_argument('--data', type=str, default='data/coco.yaml', help='data.yaml path')
462
+ parser.add_argument('--hyp', type=str, default='data/hyp.scratch.1280.yaml', help='hyperparameters path')
463
+ parser.add_argument('--epochs', type=int, default=300)
464
+ parser.add_argument('--batch-size', type=int, default=8, help='total batch size for all GPUs')
465
+ parser.add_argument('--img-size', nargs='+', type=int, default=[1280, 1280], help='[train, test] image sizes')
466
+ parser.add_argument('--rect', action='store_true', help='rectangular training')
467
+ parser.add_argument('--resume', nargs='?', const=True, default=False, help='resume most recent training')
468
+ parser.add_argument('--nosave', action='store_true', help='only save final checkpoint')
469
+ parser.add_argument('--notest', action='store_true', help='only test final epoch')
470
+ parser.add_argument('--noautoanchor', action='store_true', help='disable autoanchor check')
471
+ parser.add_argument('--evolve', action='store_true', help='evolve hyperparameters')
472
+ parser.add_argument('--bucket', type=str, default='', help='gsutil bucket')
473
+ parser.add_argument('--cache-images', action='store_true', help='cache images for faster training')
474
+ parser.add_argument('--image-weights', action='store_true', help='use weighted image selection for training')
475
+ parser.add_argument('--device', default='', help='cuda device, i.e. 0 or 0,1,2,3 or cpu')
476
+ parser.add_argument('--multi-scale', action='store_true', help='vary img-size +/- 50%%')
477
+ parser.add_argument('--single-cls', action='store_true', help='train as single-class dataset')
478
+ parser.add_argument('--adam', action='store_true', help='use torch.optim.Adam() optimizer')
479
+ parser.add_argument('--sync-bn', action='store_true', help='use SyncBatchNorm, only available in DDP mode')
480
+ parser.add_argument('--local_rank', type=int, default=-1, help='DDP parameter, do not modify')
481
+ parser.add_argument('--log-imgs', type=int, default=16, help='number of images for W&B logging, max 100')
482
+ parser.add_argument('--workers', type=int, default=8, help='maximum number of dataloader workers')
483
+ parser.add_argument('--project', default='runs/train', help='save to project/name')
484
+ parser.add_argument('--name', default='exp', help='save to project/name')
485
+ parser.add_argument('--exist-ok', action='store_true', help='existing project/name ok, do not increment')
486
+ opt = parser.parse_args()
487
+
488
+ # Set DDP variables
489
+ opt.total_batch_size = opt.batch_size
490
+ opt.world_size = int(os.environ['WORLD_SIZE']) if 'WORLD_SIZE' in os.environ else 1
491
+ opt.global_rank = int(os.environ['RANK']) if 'RANK' in os.environ else -1
492
+ set_logging(opt.global_rank)
493
+ if opt.global_rank in [-1, 0]:
494
+ check_git_status()
495
+
496
+ # Resume
497
+ if opt.resume: # resume an interrupted run
498
+ ckpt = opt.resume if isinstance(opt.resume, str) else get_latest_run() # specified or most recent path
499
+ assert os.path.isfile(ckpt), 'ERROR: --resume checkpoint does not exist'
500
+ with open(Path(ckpt).parent.parent / 'opt.yaml') as f:
501
+ opt = argparse.Namespace(**yaml.load(f, Loader=yaml.FullLoader)) # replace
502
+ opt.cfg, opt.weights, opt.resume = '', ckpt, True
503
+ logger.info('Resuming training from %s' % ckpt)
504
+ else:
505
+ # opt.hyp = opt.hyp or ('hyp.finetune.yaml' if opt.weights else 'hyp.scratch.yaml')
506
+ opt.data, opt.cfg, opt.hyp = check_file(opt.data), check_file(opt.cfg), check_file(opt.hyp) # check files
507
+ assert len(opt.cfg) or len(opt.weights), 'either --cfg or --weights must be specified'
508
+ opt.img_size.extend([opt.img_size[-1]] * (2 - len(opt.img_size))) # extend to 2 sizes (train, test)
509
+ opt.name = 'evolve' if opt.evolve else opt.name
510
+ opt.save_dir = increment_path(Path(opt.project) / opt.name, exist_ok=opt.exist_ok | opt.evolve) # increment run
511
+
512
+ # DDP mode
513
+ device = select_device(opt.device, batch_size=opt.batch_size)
514
+ if opt.local_rank != -1:
515
+ assert torch.cuda.device_count() > opt.local_rank
516
+ torch.cuda.set_device(opt.local_rank)
517
+ device = torch.device('cuda', opt.local_rank)
518
+ dist.init_process_group(backend='nccl', init_method='env://') # distributed backend
519
+ assert opt.batch_size % opt.world_size == 0, '--batch-size must be multiple of CUDA device count'
520
+ opt.batch_size = opt.total_batch_size // opt.world_size
521
+
522
+ # Hyperparameters
523
+ with open(opt.hyp) as f:
524
+ hyp = yaml.load(f, Loader=yaml.FullLoader) # load hyps
525
+ if 'box' not in hyp:
526
+ warn('Compatibility: %s missing "box" which was renamed from "giou" in %s' %
527
+ (opt.hyp, 'https://github.com/ultralytics/yolov5/pull/1120'))
528
+ hyp['box'] = hyp.pop('giou')
529
+
530
+ # Train
531
+ logger.info(opt)
532
+ if not opt.evolve:
533
+ tb_writer = None # init loggers
534
+ if opt.global_rank in [-1, 0]:
535
+ logger.info(f'Start Tensorboard with "tensorboard --logdir {opt.project}", view at http://localhost:6006/')
536
+ tb_writer = SummaryWriter(opt.save_dir) # Tensorboard
537
+ train(hyp, opt, device, tb_writer, wandb)
538
+
539
+ # Evolve hyperparameters (optional)
540
+ else:
541
+ # Hyperparameter evolution metadata (mutation scale 0-1, lower_limit, upper_limit)
542
+ meta = {'lr0': (1, 1e-5, 1e-1), # initial learning rate (SGD=1E-2, Adam=1E-3)
543
+ 'lrf': (1, 0.01, 1.0), # final OneCycleLR learning rate (lr0 * lrf)
544
+ 'momentum': (0.3, 0.6, 0.98), # SGD momentum/Adam beta1
545
+ 'weight_decay': (1, 0.0, 0.001), # optimizer weight decay
546
+ 'warmup_epochs': (1, 0.0, 5.0), # warmup epochs (fractions ok)
547
+ 'warmup_momentum': (1, 0.0, 0.95), # warmup initial momentum
548
+ 'warmup_bias_lr': (1, 0.0, 0.2), # warmup initial bias lr
549
+ 'box': (1, 0.02, 0.2), # box loss gain
550
+ 'cls': (1, 0.2, 4.0), # cls loss gain
551
+ 'cls_pw': (1, 0.5, 2.0), # cls BCELoss positive_weight
552
+ 'obj': (1, 0.2, 4.0), # obj loss gain (scale with pixels)
553
+ 'obj_pw': (1, 0.5, 2.0), # obj BCELoss positive_weight
554
+ 'iou_t': (0, 0.1, 0.7), # IoU training threshold
555
+ 'anchor_t': (1, 2.0, 8.0), # anchor-multiple threshold
556
+ 'anchors': (2, 2.0, 10.0), # anchors per output grid (0 to ignore)
557
+ 'fl_gamma': (0, 0.0, 2.0), # focal loss gamma (efficientDet default gamma=1.5)
558
+ 'hsv_h': (1, 0.0, 0.1), # image HSV-Hue augmentation (fraction)
559
+ 'hsv_s': (1, 0.0, 0.9), # image HSV-Saturation augmentation (fraction)
560
+ 'hsv_v': (1, 0.0, 0.9), # image HSV-Value augmentation (fraction)
561
+ 'degrees': (1, 0.0, 45.0), # image rotation (+/- deg)
562
+ 'translate': (1, 0.0, 0.9), # image translation (+/- fraction)
563
+ 'scale': (1, 0.0, 0.9), # image scale (+/- gain)
564
+ 'shear': (1, 0.0, 10.0), # image shear (+/- deg)
565
+ 'perspective': (0, 0.0, 0.001), # image perspective (+/- fraction), range 0-0.001
566
+ 'flipud': (1, 0.0, 1.0), # image flip up-down (probability)
567
+ 'fliplr': (0, 0.0, 1.0), # image flip left-right (probability)
568
+ 'mosaic': (1, 0.0, 1.0), # image mixup (probability)
569
+ 'mixup': (1, 0.0, 1.0)} # image mixup (probability)
570
+
571
+ assert opt.local_rank == -1, 'DDP mode not implemented for --evolve'
572
+ opt.notest, opt.nosave = True, True # only test/save final epoch
573
+ # ei = [isinstance(x, (int, float)) for x in hyp.values()] # evolvable indices
574
+ yaml_file = Path(opt.save_dir) / 'hyp_evolved.yaml' # save best result here
575
+ if opt.bucket:
576
+ os.system('gsutil cp gs://%s/evolve.txt .' % opt.bucket) # download evolve.txt if exists
577
+
578
+ for _ in range(300): # generations to evolve
579
+ if Path('evolve.txt').exists(): # if evolve.txt exists: select best hyps and mutate
580
+ # Select parent(s)
581
+ parent = 'single' # parent selection method: 'single' or 'weighted'
582
+ x = np.loadtxt('evolve.txt', ndmin=2)
583
+ n = min(5, len(x)) # number of previous results to consider
584
+ x = x[np.argsort(-fitness(x))][:n] # top n mutations
585
+ w = fitness(x) - fitness(x).min() # weights
586
+ if parent == 'single' or len(x) == 1:
587
+ # x = x[random.randint(0, n - 1)] # random selection
588
+ x = x[random.choices(range(n), weights=w)[0]] # weighted selection
589
+ elif parent == 'weighted':
590
+ x = (x * w.reshape(n, 1)).sum(0) / w.sum() # weighted combination
591
+
592
+ # Mutate
593
+ mp, s = 0.8, 0.2 # mutation probability, sigma
594
+ npr = np.random
595
+ npr.seed(int(time.time()))
596
+ g = np.array([x[0] for x in meta.values()]) # gains 0-1
597
+ ng = len(meta)
598
+ v = np.ones(ng)
599
+ while all(v == 1): # mutate until a change occurs (prevent duplicates)
600
+ v = (g * (npr.random(ng) < mp) * npr.randn(ng) * npr.random() * s + 1).clip(0.3, 3.0)
601
+ for i, k in enumerate(hyp.keys()): # plt.hist(v.ravel(), 300)
602
+ hyp[k] = float(x[i + 7] * v[i]) # mutate
603
+
604
+ # Constrain to limits
605
+ for k, v in meta.items():
606
+ hyp[k] = max(hyp[k], v[1]) # lower limit
607
+ hyp[k] = min(hyp[k], v[2]) # upper limit
608
+ hyp[k] = round(hyp[k], 5) # significant digits
609
+
610
+ # Train mutation
611
+ results = train(hyp.copy(), opt, device, wandb=wandb)
612
+
613
+ # Write mutation results
614
+ print_mutation(hyp.copy(), results, yaml_file, opt.bucket)
615
+
616
+ # Plot results
617
+ plot_evolution(yaml_file)
618
+ print(f'Hyperparameter evolution complete. Best results saved as: {yaml_file}\n'
619
+ f'Command to train a new model with these hyperparameters: $ python train.py --hyp {yaml_file}')
tune.py ADDED
@@ -0,0 +1,619 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import argparse
2
+ import logging
3
+ import math
4
+ import os
5
+ import random
6
+ import time
7
+ from pathlib import Path
8
+ from warnings import warn
9
+
10
+ import numpy as np
11
+ import torch.distributed as dist
12
+ import torch.nn as nn
13
+ import torch.nn.functional as F
14
+ import torch.optim as optim
15
+ import torch.optim.lr_scheduler as lr_scheduler
16
+ import torch.utils.data
17
+ import yaml
18
+ from torch.cuda import amp
19
+ from torch.nn.parallel import DistributedDataParallel as DDP
20
+ from torch.utils.tensorboard import SummaryWriter
21
+ from tqdm import tqdm
22
+
23
+ import test # import test.py to get mAP after each epoch
24
+ #from models.yolo import Model
25
+ from models.models import *
26
+ from utils.autoanchor import check_anchors
27
+ from utils.datasets import create_dataloader9 as create_dataloader
28
+ from utils.general import labels_to_class_weights, increment_path, labels_to_image_weights, init_seeds, \
29
+ fitness, fitness_p, fitness_r, fitness_ap50, fitness_ap, fitness_f, strip_optimizer, get_latest_run,\
30
+ check_dataset, check_file, check_git_status, check_img_size, print_mutation, set_logging
31
+ from utils.google_utils import attempt_download
32
+ from utils.loss import compute_loss
33
+ from utils.plots import plot_images, plot_labels, plot_results, plot_evolution
34
+ from utils.torch_utils import ModelEMA, select_device, intersect_dicts, torch_distributed_zero_first
35
+
36
+ logger = logging.getLogger(__name__)
37
+
38
+ try:
39
+ import wandb
40
+ except ImportError:
41
+ wandb = None
42
+ logger.info("Install Weights & Biases for experiment logging via 'pip install wandb' (recommended)")
43
+
44
+ def train(hyp, opt, device, tb_writer=None, wandb=None):
45
+ logger.info(f'Hyperparameters {hyp}')
46
+ save_dir, epochs, batch_size, total_batch_size, weights, rank = \
47
+ Path(opt.save_dir), opt.epochs, opt.batch_size, opt.total_batch_size, opt.weights, opt.global_rank
48
+
49
+ # Directories
50
+ wdir = save_dir / 'weights'
51
+ wdir.mkdir(parents=True, exist_ok=True) # make dir
52
+ last = wdir / 'last.pt'
53
+ best = wdir / 'best.pt'
54
+ results_file = save_dir / 'results.txt'
55
+
56
+ # Save run settings
57
+ with open(save_dir / 'hyp.yaml', 'w') as f:
58
+ yaml.dump(hyp, f, sort_keys=False)
59
+ with open(save_dir / 'opt.yaml', 'w') as f:
60
+ yaml.dump(vars(opt), f, sort_keys=False)
61
+
62
+ # Configure
63
+ plots = not opt.evolve # create plots
64
+ cuda = device.type != 'cpu'
65
+ init_seeds(2 + rank)
66
+ with open(opt.data) as f:
67
+ data_dict = yaml.load(f, Loader=yaml.FullLoader) # data dict
68
+ with torch_distributed_zero_first(rank):
69
+ check_dataset(data_dict) # check
70
+ train_path = data_dict['train']
71
+ test_path = data_dict['val']
72
+ nc, names = (1, ['item']) if opt.single_cls else (int(data_dict['nc']), data_dict['names']) # number classes, names
73
+ assert len(names) == nc, '%g names found for nc=%g dataset in %s' % (len(names), nc, opt.data) # check
74
+
75
+ # Model
76
+ pretrained = weights.endswith('.pt')
77
+ if pretrained:
78
+ with torch_distributed_zero_first(rank):
79
+ attempt_download(weights) # download if not found locally
80
+ ckpt = torch.load(weights, map_location=device) # load checkpoint
81
+ model = Darknet(opt.cfg).to(device) # create
82
+ state_dict = {k: v for k, v in ckpt['model'].items() if model.state_dict()[k].numel() == v.numel()}
83
+ model.load_state_dict(state_dict, strict=False)
84
+ print('Transferred %g/%g items from %s' % (len(state_dict), len(model.state_dict()), weights)) # report
85
+ else:
86
+ model = Darknet(opt.cfg).to(device) # create
87
+
88
+ # Optimizer
89
+ nbs = 64 # nominal batch size
90
+ accumulate = max(round(nbs / total_batch_size), 1) # accumulate loss before optimizing
91
+ hyp['weight_decay'] *= total_batch_size * accumulate / nbs # scale weight_decay
92
+
93
+ pg0, pg1, pg2 = [], [], [] # optimizer parameter groups
94
+ for k, v in dict(model.named_parameters()).items():
95
+ if '.bias' in k:
96
+ pg2.append(v) # biases
97
+ elif 'Conv2d.weight' in k:
98
+ pg1.append(v) # apply weight_decay
99
+ elif 'm.weight' in k:
100
+ pg1.append(v) # apply weight_decay
101
+ elif 'w.weight' in k:
102
+ pg1.append(v) # apply weight_decay
103
+ else:
104
+ pg0.append(v) # all else
105
+
106
+ if opt.adam:
107
+ optimizer = optim.Adam(pg0, lr=hyp['lr0'], betas=(hyp['momentum'], 0.999)) # adjust beta1 to momentum
108
+ else:
109
+ optimizer = optim.SGD(pg0, lr=hyp['lr0'], momentum=hyp['momentum'], nesterov=True)
110
+
111
+ optimizer.add_param_group({'params': pg1, 'weight_decay': hyp['weight_decay']}) # add pg1 with weight_decay
112
+ optimizer.add_param_group({'params': pg2}) # add pg2 (biases)
113
+ logger.info('Optimizer groups: %g .bias, %g conv.weight, %g other' % (len(pg2), len(pg1), len(pg0)))
114
+ del pg0, pg1, pg2
115
+
116
+ # Scheduler https://arxiv.org/pdf/1812.01187.pdf
117
+ # https://pytorch.org/docs/stable/_modules/torch/optim/lr_scheduler.html#OneCycleLR
118
+ lf = lambda x: ((1 + math.cos(x * math.pi / epochs)) / 2) * (1 - hyp['lrf']) + hyp['lrf'] # cosine
119
+ scheduler = lr_scheduler.LambdaLR(optimizer, lr_lambda=lf)
120
+ # plot_lr_scheduler(optimizer, scheduler, epochs)
121
+
122
+ # Logging
123
+ if wandb and wandb.run is None:
124
+ opt.hyp = hyp # add hyperparameters
125
+ wandb_run = wandb.init(config=opt, resume="allow",
126
+ project='YOLOR' if opt.project == 'runs/train' else Path(opt.project).stem,
127
+ name=save_dir.stem,
128
+ id=ckpt.get('wandb_id') if 'ckpt' in locals() else None)
129
+
130
+ # Resume
131
+ start_epoch, best_fitness = 0, 0.0
132
+ best_fitness_p, best_fitness_r, best_fitness_ap50, best_fitness_ap, best_fitness_f = 0.0, 0.0, 0.0, 0.0, 0.0
133
+ if pretrained:
134
+ # Optimizer
135
+ if ckpt['optimizer'] is not None:
136
+ optimizer.load_state_dict(ckpt['optimizer'])
137
+ best_fitness = ckpt['best_fitness']
138
+ best_fitness_p = ckpt['best_fitness_p']
139
+ best_fitness_r = ckpt['best_fitness_r']
140
+ best_fitness_ap50 = ckpt['best_fitness_ap50']
141
+ best_fitness_ap = ckpt['best_fitness_ap']
142
+ best_fitness_f = ckpt['best_fitness_f']
143
+
144
+ # Results
145
+ if ckpt.get('training_results') is not None:
146
+ with open(results_file, 'w') as file:
147
+ file.write(ckpt['training_results']) # write results.txt
148
+
149
+ # Epochs
150
+ start_epoch = ckpt['epoch'] + 1
151
+ if opt.resume:
152
+ assert start_epoch > 0, '%s training to %g epochs is finished, nothing to resume.' % (weights, epochs)
153
+ if epochs < start_epoch:
154
+ logger.info('%s has been trained for %g epochs. Fine-tuning for %g additional epochs.' %
155
+ (weights, ckpt['epoch'], epochs))
156
+ epochs += ckpt['epoch'] # finetune additional epochs
157
+
158
+ del ckpt, state_dict
159
+
160
+ # Image sizes
161
+ gs = 64 #int(max(model.stride)) # grid size (max stride)
162
+ imgsz, imgsz_test = [check_img_size(x, gs) for x in opt.img_size] # verify imgsz are gs-multiples
163
+
164
+ # DP mode
165
+ if cuda and rank == -1 and torch.cuda.device_count() > 1:
166
+ model = torch.nn.DataParallel(model)
167
+
168
+ # SyncBatchNorm
169
+ if opt.sync_bn and cuda and rank != -1:
170
+ model = torch.nn.SyncBatchNorm.convert_sync_batchnorm(model).to(device)
171
+ logger.info('Using SyncBatchNorm()')
172
+
173
+ # EMA
174
+ ema = ModelEMA(model) if rank in [-1, 0] else None
175
+
176
+ # DDP mode
177
+ if cuda and rank != -1:
178
+ model = DDP(model, device_ids=[opt.local_rank], output_device=opt.local_rank)
179
+
180
+ # Trainloader
181
+ dataloader, dataset = create_dataloader(train_path, imgsz, batch_size, gs, opt,
182
+ hyp=hyp, augment=True, cache=opt.cache_images, rect=opt.rect,
183
+ rank=rank, world_size=opt.world_size, workers=opt.workers)
184
+ mlc = np.concatenate(dataset.labels, 0)[:, 0].max() # max label class
185
+ nb = len(dataloader) # number of batches
186
+ assert mlc < nc, 'Label class %g exceeds nc=%g in %s. Possible class labels are 0-%g' % (mlc, nc, opt.data, nc - 1)
187
+
188
+ # Process 0
189
+ if rank in [-1, 0]:
190
+ ema.updates = start_epoch * nb // accumulate # set EMA updates
191
+ testloader = create_dataloader(test_path, imgsz_test, batch_size*2, gs, opt,
192
+ hyp=hyp, cache=opt.cache_images and not opt.notest, rect=True,
193
+ rank=-1, world_size=opt.world_size, workers=opt.workers)[0] # testloader
194
+
195
+ if not opt.resume:
196
+ labels = np.concatenate(dataset.labels, 0)
197
+ c = torch.tensor(labels[:, 0]) # classes
198
+ # cf = torch.bincount(c.long(), minlength=nc) + 1. # frequency
199
+ # model._initialize_biases(cf.to(device))
200
+ if plots:
201
+ plot_labels(labels, save_dir=save_dir)
202
+ if tb_writer:
203
+ tb_writer.add_histogram('classes', c, 0)
204
+ if wandb:
205
+ wandb.log({"Labels": [wandb.Image(str(x), caption=x.name) for x in save_dir.glob('*labels*.png')]})
206
+
207
+ # Anchors
208
+ # if not opt.noautoanchor:
209
+ # check_anchors(dataset, model=model, thr=hyp['anchor_t'], imgsz=imgsz)
210
+
211
+ # Model parameters
212
+ hyp['cls'] *= nc / 80. # scale coco-tuned hyp['cls'] to current dataset
213
+ model.nc = nc # attach number of classes to model
214
+ model.hyp = hyp # attach hyperparameters to model
215
+ model.gr = 1.0 # iou loss ratio (obj_loss = 1.0 or iou)
216
+ model.class_weights = labels_to_class_weights(dataset.labels, nc).to(device) # attach class weights
217
+ model.names = names
218
+
219
+ # Start training
220
+ t0 = time.time()
221
+ nw = max(round(hyp['warmup_epochs'] * nb), 1000) # number of warmup iterations, max(3 epochs, 1k iterations)
222
+ # nw = min(nw, (epochs - start_epoch) / 2 * nb) # limit warmup to < 1/2 of training
223
+ maps = np.zeros(nc) # mAP per class
224
+ results = (0, 0, 0, 0, 0, 0, 0) # P, R, mAP@.5, mAP@.5-.95, val_loss(box, obj, cls)
225
+ scheduler.last_epoch = start_epoch - 1 # do not move
226
+ scaler = amp.GradScaler(enabled=cuda)
227
+ logger.info('Image sizes %g train, %g test\n'
228
+ 'Using %g dataloader workers\nLogging results to %s\n'
229
+ 'Starting training for %g epochs...' % (imgsz, imgsz_test, dataloader.num_workers, save_dir, epochs))
230
+
231
+ torch.save(model, wdir / 'init.pt')
232
+
233
+ for epoch in range(start_epoch, epochs): # epoch ------------------------------------------------------------------
234
+ model.train()
235
+
236
+ # Update image weights (optional)
237
+ if opt.image_weights:
238
+ # Generate indices
239
+ if rank in [-1, 0]:
240
+ cw = model.class_weights.cpu().numpy() * (1 - maps) ** 2 # class weights
241
+ iw = labels_to_image_weights(dataset.labels, nc=nc, class_weights=cw) # image weights
242
+ dataset.indices = random.choices(range(dataset.n), weights=iw, k=dataset.n) # rand weighted idx
243
+ # Broadcast if DDP
244
+ if rank != -1:
245
+ indices = (torch.tensor(dataset.indices) if rank == 0 else torch.zeros(dataset.n)).int()
246
+ dist.broadcast(indices, 0)
247
+ if rank != 0:
248
+ dataset.indices = indices.cpu().numpy()
249
+
250
+ # Update mosaic border
251
+ # b = int(random.uniform(0.25 * imgsz, 0.75 * imgsz + gs) // gs * gs)
252
+ # dataset.mosaic_border = [b - imgsz, -b] # height, width borders
253
+
254
+ mloss = torch.zeros(4, device=device) # mean losses
255
+ if rank != -1:
256
+ dataloader.sampler.set_epoch(epoch)
257
+ pbar = enumerate(dataloader)
258
+ logger.info(('\n' + '%10s' * 8) % ('Epoch', 'gpu_mem', 'box', 'obj', 'cls', 'total', 'targets', 'img_size'))
259
+ if rank in [-1, 0]:
260
+ pbar = tqdm(pbar, total=nb) # progress bar
261
+ optimizer.zero_grad()
262
+ for i, (imgs, targets, paths, _) in pbar: # batch -------------------------------------------------------------
263
+ ni = i + nb * epoch # number integrated batches (since train start)
264
+ imgs = imgs.to(device, non_blocking=True).float() / 255.0 # uint8 to float32, 0-255 to 0.0-1.0
265
+
266
+ # Warmup
267
+ if ni <= nw:
268
+ xi = [0, nw] # x interp
269
+ # model.gr = np.interp(ni, xi, [0.0, 1.0]) # iou loss ratio (obj_loss = 1.0 or iou)
270
+ accumulate = max(1, np.interp(ni, xi, [1, nbs / total_batch_size]).round())
271
+ for j, x in enumerate(optimizer.param_groups):
272
+ # bias lr falls from 0.1 to lr0, all other lrs rise from 0.0 to lr0
273
+ x['lr'] = np.interp(ni, xi, [hyp['warmup_bias_lr'] if j == 2 else 0.0, x['initial_lr'] * lf(epoch)])
274
+ if 'momentum' in x:
275
+ x['momentum'] = np.interp(ni, xi, [hyp['warmup_momentum'], hyp['momentum']])
276
+
277
+ # Multi-scale
278
+ if opt.multi_scale:
279
+ sz = random.randrange(imgsz * 0.5, imgsz * 1.5 + gs) // gs * gs # size
280
+ sf = sz / max(imgs.shape[2:]) # scale factor
281
+ if sf != 1:
282
+ ns = [math.ceil(x * sf / gs) * gs for x in imgs.shape[2:]] # new shape (stretched to gs-multiple)
283
+ imgs = F.interpolate(imgs, size=ns, mode='bilinear', align_corners=False)
284
+
285
+ # Forward
286
+ with amp.autocast(enabled=cuda):
287
+ pred = model(imgs) # forward
288
+ loss, loss_items = compute_loss(pred, targets.to(device), model) # loss scaled by batch_size
289
+ if rank != -1:
290
+ loss *= opt.world_size # gradient averaged between devices in DDP mode
291
+
292
+ # Backward
293
+ scaler.scale(loss).backward()
294
+
295
+ # Optimize
296
+ if ni % accumulate == 0:
297
+ scaler.step(optimizer) # optimizer.step
298
+ scaler.update()
299
+ optimizer.zero_grad()
300
+ if ema:
301
+ ema.update(model)
302
+
303
+ # Print
304
+ if rank in [-1, 0]:
305
+ mloss = (mloss * i + loss_items) / (i + 1) # update mean losses
306
+ mem = '%.3gG' % (torch.cuda.memory_reserved() / 1E9 if torch.cuda.is_available() else 0) # (GB)
307
+ s = ('%10s' * 2 + '%10.4g' * 6) % (
308
+ '%g/%g' % (epoch, epochs - 1), mem, *mloss, targets.shape[0], imgs.shape[-1])
309
+ pbar.set_description(s)
310
+
311
+ # Plot
312
+ if plots and ni < 3:
313
+ f = save_dir / f'train_batch{ni}.jpg' # filename
314
+ plot_images(images=imgs, targets=targets, paths=paths, fname=f)
315
+ # if tb_writer:
316
+ # tb_writer.add_image(f, result, dataformats='HWC', global_step=epoch)
317
+ # tb_writer.add_graph(model, imgs) # add model to tensorboard
318
+ elif plots and ni == 3 and wandb:
319
+ wandb.log({"Mosaics": [wandb.Image(str(x), caption=x.name) for x in save_dir.glob('train*.jpg')]})
320
+
321
+ # end batch ------------------------------------------------------------------------------------------------
322
+ # end epoch ----------------------------------------------------------------------------------------------------
323
+
324
+ # Scheduler
325
+ lr = [x['lr'] for x in optimizer.param_groups] # for tensorboard
326
+ scheduler.step()
327
+
328
+ # DDP process 0 or single-GPU
329
+ if rank in [-1, 0]:
330
+ # mAP
331
+ if ema:
332
+ ema.update_attr(model)
333
+ final_epoch = epoch + 1 == epochs
334
+ if not opt.notest or final_epoch: # Calculate mAP
335
+ if epoch >= 3:
336
+ results, maps, times = test.test(opt.data,
337
+ batch_size=batch_size*2,
338
+ imgsz=imgsz_test,
339
+ model=ema.ema.module if hasattr(ema.ema, 'module') else ema.ema,
340
+ single_cls=opt.single_cls,
341
+ dataloader=testloader,
342
+ save_dir=save_dir,
343
+ plots=plots and final_epoch,
344
+ log_imgs=opt.log_imgs if wandb else 0)
345
+
346
+ # Write
347
+ with open(results_file, 'a') as f:
348
+ f.write(s + '%10.4g' * 7 % results + '\n') # P, R, mAP@.5, mAP@.5-.95, val_loss(box, obj, cls)
349
+ if len(opt.name) and opt.bucket:
350
+ os.system('gsutil cp %s gs://%s/results/results%s.txt' % (results_file, opt.bucket, opt.name))
351
+
352
+ # Log
353
+ tags = ['train/box_loss', 'train/obj_loss', 'train/cls_loss', # train loss
354
+ 'metrics/precision', 'metrics/recall', 'metrics/mAP_0.5', 'metrics/mAP_0.5:0.95',
355
+ 'val/box_loss', 'val/obj_loss', 'val/cls_loss', # val loss
356
+ 'x/lr0', 'x/lr1', 'x/lr2'] # params
357
+ for x, tag in zip(list(mloss[:-1]) + list(results) + lr, tags):
358
+ if tb_writer:
359
+ tb_writer.add_scalar(tag, x, epoch) # tensorboard
360
+ if wandb:
361
+ wandb.log({tag: x}) # W&B
362
+
363
+ # Update best mAP
364
+ fi = fitness(np.array(results).reshape(1, -1)) # weighted combination of [P, R, mAP@.5, mAP@.5-.95]
365
+ fi_p = fitness_p(np.array(results).reshape(1, -1)) # weighted combination of [P, R, mAP@.5, mAP@.5-.95]
366
+ fi_r = fitness_r(np.array(results).reshape(1, -1)) # weighted combination of [P, R, mAP@.5, mAP@.5-.95]
367
+ fi_ap50 = fitness_ap50(np.array(results).reshape(1, -1)) # weighted combination of [P, R, mAP@.5, mAP@.5-.95]
368
+ fi_ap = fitness_ap(np.array(results).reshape(1, -1)) # weighted combination of [P, R, mAP@.5, mAP@.5-.95]
369
+ if (fi_p > 0.0) or (fi_r > 0.0):
370
+ fi_f = fitness_f(np.array(results).reshape(1, -1)) # weighted combination of [P, R, mAP@.5, mAP@.5-.95]
371
+ else:
372
+ fi_f = 0.0
373
+ if fi > best_fitness:
374
+ best_fitness = fi
375
+ if fi_p > best_fitness_p:
376
+ best_fitness_p = fi_p
377
+ if fi_r > best_fitness_r:
378
+ best_fitness_r = fi_r
379
+ if fi_ap50 > best_fitness_ap50:
380
+ best_fitness_ap50 = fi_ap50
381
+ if fi_ap > best_fitness_ap:
382
+ best_fitness_ap = fi_ap
383
+ if fi_f > best_fitness_f:
384
+ best_fitness_f = fi_f
385
+
386
+ # Save model
387
+ save = (not opt.nosave) or (final_epoch and not opt.evolve)
388
+ if save:
389
+ with open(results_file, 'r') as f: # create checkpoint
390
+ ckpt = {'epoch': epoch,
391
+ 'best_fitness': best_fitness,
392
+ 'best_fitness_p': best_fitness_p,
393
+ 'best_fitness_r': best_fitness_r,
394
+ 'best_fitness_ap50': best_fitness_ap50,
395
+ 'best_fitness_ap': best_fitness_ap,
396
+ 'best_fitness_f': best_fitness_f,
397
+ 'training_results': f.read(),
398
+ 'model': ema.ema.module.state_dict() if hasattr(ema, 'module') else ema.ema.state_dict(),
399
+ 'optimizer': None if final_epoch else optimizer.state_dict(),
400
+ 'wandb_id': wandb_run.id if wandb else None}
401
+
402
+ # Save last, best and delete
403
+ torch.save(ckpt, last)
404
+ if best_fitness == fi:
405
+ torch.save(ckpt, best)
406
+ if (best_fitness == fi) and (epoch >= 200):
407
+ torch.save(ckpt, wdir / 'best_{:03d}.pt'.format(epoch))
408
+ if best_fitness == fi:
409
+ torch.save(ckpt, wdir / 'best_overall.pt')
410
+ if best_fitness_p == fi_p:
411
+ torch.save(ckpt, wdir / 'best_p.pt')
412
+ if best_fitness_r == fi_r:
413
+ torch.save(ckpt, wdir / 'best_r.pt')
414
+ if best_fitness_ap50 == fi_ap50:
415
+ torch.save(ckpt, wdir / 'best_ap50.pt')
416
+ if best_fitness_ap == fi_ap:
417
+ torch.save(ckpt, wdir / 'best_ap.pt')
418
+ if best_fitness_f == fi_f:
419
+ torch.save(ckpt, wdir / 'best_f.pt')
420
+ if epoch == 0:
421
+ torch.save(ckpt, wdir / 'epoch_{:03d}.pt'.format(epoch))
422
+ if ((epoch+1) % 25) == 0:
423
+ torch.save(ckpt, wdir / 'epoch_{:03d}.pt'.format(epoch))
424
+ if epoch >= (epochs-5):
425
+ torch.save(ckpt, wdir / 'last_{:03d}.pt'.format(epoch))
426
+ elif epoch >= 420:
427
+ torch.save(ckpt, wdir / 'last_{:03d}.pt'.format(epoch))
428
+ del ckpt
429
+ # end epoch ----------------------------------------------------------------------------------------------------
430
+ # end training
431
+
432
+ if rank in [-1, 0]:
433
+ # Strip optimizers
434
+ n = opt.name if opt.name.isnumeric() else ''
435
+ fresults, flast, fbest = save_dir / f'results{n}.txt', wdir / f'last{n}.pt', wdir / f'best{n}.pt'
436
+ for f1, f2 in zip([wdir / 'last.pt', wdir / 'best.pt', results_file], [flast, fbest, fresults]):
437
+ if f1.exists():
438
+ os.rename(f1, f2) # rename
439
+ if str(f2).endswith('.pt'): # is *.pt
440
+ strip_optimizer(f2) # strip optimizer
441
+ os.system('gsutil cp %s gs://%s/weights' % (f2, opt.bucket)) if opt.bucket else None # upload
442
+ # Finish
443
+ if plots:
444
+ plot_results(save_dir=save_dir) # save as results.png
445
+ if wandb:
446
+ wandb.log({"Results": [wandb.Image(str(save_dir / x), caption=x) for x in
447
+ ['results.png', 'precision-recall_curve.png']]})
448
+ logger.info('%g epochs completed in %.3f hours.\n' % (epoch - start_epoch + 1, (time.time() - t0) / 3600))
449
+ else:
450
+ dist.destroy_process_group()
451
+
452
+ wandb.run.finish() if wandb and wandb.run else None
453
+ torch.cuda.empty_cache()
454
+ return results
455
+
456
+
457
+ if __name__ == '__main__':
458
+ parser = argparse.ArgumentParser()
459
+ parser.add_argument('--weights', type=str, default='yolor_p6.pt', help='initial weights path')
460
+ parser.add_argument('--cfg', type=str, default='', help='model.yaml path')
461
+ parser.add_argument('--data', type=str, default='data/coco.yaml', help='data.yaml path')
462
+ parser.add_argument('--hyp', type=str, default='data/hyp.scratch.1280.yaml', help='hyperparameters path')
463
+ parser.add_argument('--epochs', type=int, default=300)
464
+ parser.add_argument('--batch-size', type=int, default=8, help='total batch size for all GPUs')
465
+ parser.add_argument('--img-size', nargs='+', type=int, default=[1280, 1280], help='[train, test] image sizes')
466
+ parser.add_argument('--rect', action='store_true', help='rectangular training')
467
+ parser.add_argument('--resume', nargs='?', const=True, default=False, help='resume most recent training')
468
+ parser.add_argument('--nosave', action='store_true', help='only save final checkpoint')
469
+ parser.add_argument('--notest', action='store_true', help='only test final epoch')
470
+ parser.add_argument('--noautoanchor', action='store_true', help='disable autoanchor check')
471
+ parser.add_argument('--evolve', action='store_true', help='evolve hyperparameters')
472
+ parser.add_argument('--bucket', type=str, default='', help='gsutil bucket')
473
+ parser.add_argument('--cache-images', action='store_true', help='cache images for faster training')
474
+ parser.add_argument('--image-weights', action='store_true', help='use weighted image selection for training')
475
+ parser.add_argument('--device', default='', help='cuda device, i.e. 0 or 0,1,2,3 or cpu')
476
+ parser.add_argument('--multi-scale', action='store_true', help='vary img-size +/- 50%%')
477
+ parser.add_argument('--single-cls', action='store_true', help='train as single-class dataset')
478
+ parser.add_argument('--adam', action='store_true', help='use torch.optim.Adam() optimizer')
479
+ parser.add_argument('--sync-bn', action='store_true', help='use SyncBatchNorm, only available in DDP mode')
480
+ parser.add_argument('--local_rank', type=int, default=-1, help='DDP parameter, do not modify')
481
+ parser.add_argument('--log-imgs', type=int, default=16, help='number of images for W&B logging, max 100')
482
+ parser.add_argument('--workers', type=int, default=8, help='maximum number of dataloader workers')
483
+ parser.add_argument('--project', default='runs/train', help='save to project/name')
484
+ parser.add_argument('--name', default='exp', help='save to project/name')
485
+ parser.add_argument('--exist-ok', action='store_true', help='existing project/name ok, do not increment')
486
+ opt = parser.parse_args()
487
+
488
+ # Set DDP variables
489
+ opt.total_batch_size = opt.batch_size
490
+ opt.world_size = int(os.environ['WORLD_SIZE']) if 'WORLD_SIZE' in os.environ else 1
491
+ opt.global_rank = int(os.environ['RANK']) if 'RANK' in os.environ else -1
492
+ set_logging(opt.global_rank)
493
+ if opt.global_rank in [-1, 0]:
494
+ check_git_status()
495
+
496
+ # Resume
497
+ if opt.resume: # resume an interrupted run
498
+ ckpt = opt.resume if isinstance(opt.resume, str) else get_latest_run() # specified or most recent path
499
+ assert os.path.isfile(ckpt), 'ERROR: --resume checkpoint does not exist'
500
+ with open(Path(ckpt).parent.parent / 'opt.yaml') as f:
501
+ opt = argparse.Namespace(**yaml.load(f, Loader=yaml.FullLoader)) # replace
502
+ opt.cfg, opt.weights, opt.resume = '', ckpt, True
503
+ logger.info('Resuming training from %s' % ckpt)
504
+ else:
505
+ # opt.hyp = opt.hyp or ('hyp.finetune.yaml' if opt.weights else 'hyp.scratch.yaml')
506
+ opt.data, opt.cfg, opt.hyp = check_file(opt.data), check_file(opt.cfg), check_file(opt.hyp) # check files
507
+ assert len(opt.cfg) or len(opt.weights), 'either --cfg or --weights must be specified'
508
+ opt.img_size.extend([opt.img_size[-1]] * (2 - len(opt.img_size))) # extend to 2 sizes (train, test)
509
+ opt.name = 'evolve' if opt.evolve else opt.name
510
+ opt.save_dir = increment_path(Path(opt.project) / opt.name, exist_ok=opt.exist_ok | opt.evolve) # increment run
511
+
512
+ # DDP mode
513
+ device = select_device(opt.device, batch_size=opt.batch_size)
514
+ if opt.local_rank != -1:
515
+ assert torch.cuda.device_count() > opt.local_rank
516
+ torch.cuda.set_device(opt.local_rank)
517
+ device = torch.device('cuda', opt.local_rank)
518
+ dist.init_process_group(backend='nccl', init_method='env://') # distributed backend
519
+ assert opt.batch_size % opt.world_size == 0, '--batch-size must be multiple of CUDA device count'
520
+ opt.batch_size = opt.total_batch_size // opt.world_size
521
+
522
+ # Hyperparameters
523
+ with open(opt.hyp) as f:
524
+ hyp = yaml.load(f, Loader=yaml.FullLoader) # load hyps
525
+ if 'box' not in hyp:
526
+ warn('Compatibility: %s missing "box" which was renamed from "giou" in %s' %
527
+ (opt.hyp, 'https://github.com/ultralytics/yolov5/pull/1120'))
528
+ hyp['box'] = hyp.pop('giou')
529
+
530
+ # Train
531
+ logger.info(opt)
532
+ if not opt.evolve:
533
+ tb_writer = None # init loggers
534
+ if opt.global_rank in [-1, 0]:
535
+ logger.info(f'Start Tensorboard with "tensorboard --logdir {opt.project}", view at http://localhost:6006/')
536
+ tb_writer = SummaryWriter(opt.save_dir) # Tensorboard
537
+ train(hyp, opt, device, tb_writer, wandb)
538
+
539
+ # Evolve hyperparameters (optional)
540
+ else:
541
+ # Hyperparameter evolution metadata (mutation scale 0-1, lower_limit, upper_limit)
542
+ meta = {'lr0': (1, 1e-5, 1e-1), # initial learning rate (SGD=1E-2, Adam=1E-3)
543
+ 'lrf': (1, 0.01, 1.0), # final OneCycleLR learning rate (lr0 * lrf)
544
+ 'momentum': (0.3, 0.6, 0.98), # SGD momentum/Adam beta1
545
+ 'weight_decay': (1, 0.0, 0.001), # optimizer weight decay
546
+ 'warmup_epochs': (1, 0.0, 5.0), # warmup epochs (fractions ok)
547
+ 'warmup_momentum': (1, 0.0, 0.95), # warmup initial momentum
548
+ 'warmup_bias_lr': (1, 0.0, 0.2), # warmup initial bias lr
549
+ 'box': (1, 0.02, 0.2), # box loss gain
550
+ 'cls': (1, 0.2, 4.0), # cls loss gain
551
+ 'cls_pw': (1, 0.5, 2.0), # cls BCELoss positive_weight
552
+ 'obj': (1, 0.2, 4.0), # obj loss gain (scale with pixels)
553
+ 'obj_pw': (1, 0.5, 2.0), # obj BCELoss positive_weight
554
+ 'iou_t': (0, 0.1, 0.7), # IoU training threshold
555
+ 'anchor_t': (1, 2.0, 8.0), # anchor-multiple threshold
556
+ 'anchors': (2, 2.0, 10.0), # anchors per output grid (0 to ignore)
557
+ 'fl_gamma': (0, 0.0, 2.0), # focal loss gamma (efficientDet default gamma=1.5)
558
+ 'hsv_h': (1, 0.0, 0.1), # image HSV-Hue augmentation (fraction)
559
+ 'hsv_s': (1, 0.0, 0.9), # image HSV-Saturation augmentation (fraction)
560
+ 'hsv_v': (1, 0.0, 0.9), # image HSV-Value augmentation (fraction)
561
+ 'degrees': (1, 0.0, 45.0), # image rotation (+/- deg)
562
+ 'translate': (1, 0.0, 0.9), # image translation (+/- fraction)
563
+ 'scale': (1, 0.0, 0.9), # image scale (+/- gain)
564
+ 'shear': (1, 0.0, 10.0), # image shear (+/- deg)
565
+ 'perspective': (0, 0.0, 0.001), # image perspective (+/- fraction), range 0-0.001
566
+ 'flipud': (1, 0.0, 1.0), # image flip up-down (probability)
567
+ 'fliplr': (0, 0.0, 1.0), # image flip left-right (probability)
568
+ 'mosaic': (1, 0.0, 1.0), # image mixup (probability)
569
+ 'mixup': (1, 0.0, 1.0)} # image mixup (probability)
570
+
571
+ assert opt.local_rank == -1, 'DDP mode not implemented for --evolve'
572
+ opt.notest, opt.nosave = True, True # only test/save final epoch
573
+ # ei = [isinstance(x, (int, float)) for x in hyp.values()] # evolvable indices
574
+ yaml_file = Path(opt.save_dir) / 'hyp_evolved.yaml' # save best result here
575
+ if opt.bucket:
576
+ os.system('gsutil cp gs://%s/evolve.txt .' % opt.bucket) # download evolve.txt if exists
577
+
578
+ for _ in range(300): # generations to evolve
579
+ if Path('evolve.txt').exists(): # if evolve.txt exists: select best hyps and mutate
580
+ # Select parent(s)
581
+ parent = 'single' # parent selection method: 'single' or 'weighted'
582
+ x = np.loadtxt('evolve.txt', ndmin=2)
583
+ n = min(5, len(x)) # number of previous results to consider
584
+ x = x[np.argsort(-fitness(x))][:n] # top n mutations
585
+ w = fitness(x) - fitness(x).min() # weights
586
+ if parent == 'single' or len(x) == 1:
587
+ # x = x[random.randint(0, n - 1)] # random selection
588
+ x = x[random.choices(range(n), weights=w)[0]] # weighted selection
589
+ elif parent == 'weighted':
590
+ x = (x * w.reshape(n, 1)).sum(0) / w.sum() # weighted combination
591
+
592
+ # Mutate
593
+ mp, s = 0.8, 0.2 # mutation probability, sigma
594
+ npr = np.random
595
+ npr.seed(int(time.time()))
596
+ g = np.array([x[0] for x in meta.values()]) # gains 0-1
597
+ ng = len(meta)
598
+ v = np.ones(ng)
599
+ while all(v == 1): # mutate until a change occurs (prevent duplicates)
600
+ v = (g * (npr.random(ng) < mp) * npr.randn(ng) * npr.random() * s + 1).clip(0.3, 3.0)
601
+ for i, k in enumerate(hyp.keys()): # plt.hist(v.ravel(), 300)
602
+ hyp[k] = float(x[i + 7] * v[i]) # mutate
603
+
604
+ # Constrain to limits
605
+ for k, v in meta.items():
606
+ hyp[k] = max(hyp[k], v[1]) # lower limit
607
+ hyp[k] = min(hyp[k], v[2]) # upper limit
608
+ hyp[k] = round(hyp[k], 5) # significant digits
609
+
610
+ # Train mutation
611
+ results = train(hyp.copy(), opt, device, wandb=wandb)
612
+
613
+ # Write mutation results
614
+ print_mutation(hyp.copy(), results, yaml_file, opt.bucket)
615
+
616
+ # Plot results
617
+ plot_evolution(yaml_file)
618
+ print(f'Hyperparameter evolution complete. Best results saved as: {yaml_file}\n'
619
+ f'Command to train a new model with these hyperparameters: $ python train.py --hyp {yaml_file}')
utils/__init__.py ADDED
@@ -0,0 +1 @@
 
 
1
+
utils/__pycache__/__init__.cpython-37.pyc ADDED
Binary file (152 Bytes). View file
 
utils/__pycache__/__init__.cpython-38.pyc ADDED
Binary file (156 Bytes). View file
 
utils/__pycache__/datasets.cpython-37.pyc ADDED
Binary file (37.9 kB). View file
 
utils/__pycache__/datasets.cpython-38.pyc ADDED
Binary file (35.8 kB). View file
 
utils/__pycache__/general.cpython-37.pyc ADDED
Binary file (14 kB). View file
 
utils/__pycache__/google_utils.cpython-37.pyc ADDED
Binary file (2.94 kB). View file
 
utils/__pycache__/google_utils.cpython-38.pyc ADDED
Binary file (2.96 kB). View file
 
utils/__pycache__/layers.cpython-37.pyc ADDED
Binary file (24.3 kB). View file
 
utils/__pycache__/metrics.cpython-37.pyc ADDED
Binary file (4.04 kB). View file
 
utils/__pycache__/parse_config.cpython-37.pyc ADDED
Binary file (2.73 kB). View file
 
utils/__pycache__/plots.cpython-37.pyc ADDED
Binary file (13.9 kB). View file
 
utils/__pycache__/torch_utils.cpython-37.pyc ADDED
Binary file (9.17 kB). View file