zhengrongzhang commited on
Commit
1cff332
1 Parent(s): b5a92cd

init model

Browse files
Files changed (7) hide show
  1. README.md +116 -0
  2. coco.py +226 -0
  3. demo_utils.py +224 -0
  4. eval_onnx.py +444 -0
  5. infer_onnx.py +151 -0
  6. requirements.txt +9 -0
  7. yolox-s-int8.onnx +3 -0
README.md ADDED
@@ -0,0 +1,116 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - RyzenAI
5
+ - object-detection
6
+ - vision
7
+ - YOLO
8
+ - anchor-free
9
+ - pytorch
10
+ datasets:
11
+ - coco
12
+ metrics:
13
+ - mAP
14
+ ---
15
+
16
+ # YOLOX-small model trained on COCO
17
+
18
+ YOLOX-small is the small version of YOLOX model trained on COCO object detection (118k annotated images) at resolution 640x640. It was introduced in the paper [YOLOX: Exceeding YOLO Series in 2021](https://arxiv.org/abs/2107.08430) by Zheng Ge et al. and first released in [this repository](https://github.com/Megvii-BaseDetection/YOLOX).
19
+
20
+ We develop a modified version that could be supported by [AMD Ryzen AI](https://ryzenai.docs.amd.com).
21
+
22
+
23
+ ## Model description
24
+
25
+ Based on YOLO detector, the YOLOX model adopts anchor-free head and conducts other advanced detection techniques including decoupled head and the leading label assignment strategy SimOTA to achieve state-of-the-art results across a large scale range of models. The series of models were developed by Megvii Inc. and won the 1st Place on Streaming Perception Challenge (WAD at CVPR 2021).
26
+
27
+
28
+ ## Intended uses & limitations
29
+
30
+ You can use the raw model for object detection. See the [model hub](https://huggingface.co/models?search=amd/yolox) to look for all available YOLOX models.
31
+
32
+
33
+ ## How to use
34
+
35
+ ### Installation
36
+
37
+ Follow [Ryzen AI Installation](https://ryzenai.docs.amd.com/en/latest/inst.html) to prepare the environment for Ryzen AI.
38
+ Run the following script to install pre-requisites for this model.
39
+ ```sh
40
+ pip install -r requirements.txt
41
+ ```
42
+
43
+
44
+ ### Data Preparation (optional: for accuracy evaluation)
45
+
46
+ The dataset MSCOCO2017 contains 118287 images for training and 5000 images for validation.
47
+
48
+ Download the validation set of COCO dataset ([val2017.zip](http://images.cocodataset.org/zips/val2017.zip) and [annotations_trainval2017.zip](http://images.cocodataset.org/annotations/annotations_trainval2017.zip)).
49
+ Then unzip the files and move them to the following directories (or create soft links):
50
+
51
+ ```plain
52
+ └── data
53
+ └── COCO
54
+ ├── annotations
55
+ | ├── instances_val2017.json
56
+ | └── ...
57
+ └── val2017
58
+ ├── 000000000139.jpg
59
+ ├── 000000000285.jpg
60
+ └── ...
61
+ ```
62
+
63
+
64
+ ### Test & Evaluation
65
+
66
+ - Code snippet from [`infer_onnx.py`](infer_onnx.py) on how to use
67
+ ```python
68
+ args = make_parser().parse_args()
69
+ input_shape = tuple(map(int, args.input_shape.split(',')))
70
+ origin_img = cv2.imread(args.image_path)
71
+ img, ratio = preprocess(origin_img, input_shape)
72
+ if args.ipu:
73
+ providers = ["VitisAIExecutionProvider"]
74
+ provider_options = [{"config_file": args.provider_config}]
75
+ else:
76
+ providers = ['CUDAExecutionProvider', 'CPUExecutionProvider']
77
+ provider_options = None
78
+ session = ort.InferenceSession(args.model, providers=providers, provider_options=provider_options)
79
+ ort_inputs = {session.get_inputs()[0].name: img[None, :, :, :]}
80
+ outputs = session.run(None, ort_inputs)
81
+ dets = postprocess(outputs, input_shape, ratio)
82
+ if dets is not None:
83
+ final_boxes, final_scores, final_cls_inds = dets[:, :4], dets[:, 4], dets[:, 5]
84
+ origin_img = vis(origin_img, final_boxes, final_scores, final_cls_inds,
85
+ conf=args.score_thr, class_names=COCO_CLASSES)
86
+ mkdir(args.output_dir)
87
+ output_path = os.path.join(args.output_dir, os.path.basename(args.image_path))
88
+ cv2.imwrite(output_path, origin_img)
89
+ ```
90
+
91
+ - Run inference for a single image
92
+ ```sh
93
+ python infer_onnx.py -m yolox-s-int8.onnx -i Path\To\Your\Image --ipu --provider_config Path\To\vaip_config.json
94
+ ```
95
+ *Note: __vaip_config.json__ is located at the setup package of Ryzen AI (refer to [Installation](#installation))*
96
+
97
+ - Test accuracy of the quantized model
98
+ ```sh
99
+ python eval_onnx.py -m yolox-s-int8.onnx --ipu --provider_config Path\To\vaip_config.json
100
+ ```
101
+
102
+ ### Performance
103
+
104
+ |Metric | Accuracy on IPU|
105
+ | :----: | :----: |
106
+ |AP\@0.50:0.95|0.370|
107
+
108
+
109
+ ```bibtex
110
+ @article{yolox2021,
111
+ title={YOLOX: Exceeding YOLO Series in 2021},
112
+ author={Ge, Zheng and Liu, Songtao and Wang, Feng and Li, Zeming and Sun, Jian},
113
+ journal={arXiv preprint arXiv:2107.08430},
114
+ year={2021}
115
+ }
116
+ ```
coco.py ADDED
@@ -0,0 +1,226 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ # -*- coding:utf-8 -*-
3
+
4
+ import os
5
+ import cv2
6
+ import numpy as np
7
+ from loguru import logger
8
+ from functools import wraps
9
+ from pycocotools.coco import COCO
10
+ from torch.utils.data.dataset import Dataset as torchDataset
11
+
12
+ COCO_CLASSES = (
13
+ 'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'boat', 'traffic light', 'fire hydrant',
14
+ 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra',
15
+ 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball', 'kite',
16
+ 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', 'tennis racket', 'bottle', 'wine glass', 'cup', 'fork',
17
+ 'knife', 'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut',
18
+ 'cake', 'chair', 'couch', 'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote',
19
+ 'keyboard', 'cell phone', 'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors',
20
+ 'teddy bear', 'hair drier', 'toothbrush')
21
+
22
+
23
+ def remove_useless_info(coco):
24
+ """
25
+ Remove useless info in coco dataset. COCO object is modified inplace.
26
+ This function is mainly used for saving memory (save about 30% mem).
27
+ """
28
+ if isinstance(coco, COCO):
29
+ dataset = coco.dataset
30
+ dataset.pop("info", None)
31
+ dataset.pop("licenses", None)
32
+ for img in dataset["images"]:
33
+ img.pop("license", None)
34
+ img.pop("coco_url", None)
35
+ img.pop("date_captured", None)
36
+ img.pop("flickr_url", None)
37
+ if "annotations" in coco.dataset:
38
+ for anno in coco.dataset["annotations"]:
39
+ anno.pop("segmentation", None)
40
+
41
+
42
+ class Dataset(torchDataset):
43
+ """ This class is a subclass of the base :class:`torch.utils.data.Dataset`,
44
+ that enables on the fly resizing of the ``input_dim``.
45
+
46
+ Args:
47
+ input_dimension (tuple): (width,height) tuple with default dimensions of the network
48
+ """
49
+
50
+ def __init__(self, input_dimension, mosaic=True):
51
+ super().__init__()
52
+ self.__input_dim = input_dimension[:2]
53
+ self.enable_mosaic = mosaic
54
+
55
+ @property
56
+ def input_dim(self):
57
+ """
58
+ Dimension that can be used by transforms to set the correct image size, etc.
59
+ This allows transforms to have a single source of truth
60
+ for the input dimension of the network.
61
+
62
+ Return:
63
+ list: Tuple containing the current width,height
64
+ """
65
+ if hasattr(self, "_input_dim"):
66
+ return self._input_dim
67
+ return self.__input_dim
68
+
69
+ @staticmethod
70
+ def mosaic_getitem(getitem_fn):
71
+ """
72
+ Decorator method that needs to be used around the ``__getitem__`` method. |br|
73
+ This decorator enables the closing mosaic
74
+
75
+ Example:
76
+ >>> class CustomSet(ln.data.Dataset):
77
+ ... def __len__(self):
78
+ ... return 10
79
+ ... @ln.data.Dataset.mosaic_getitem
80
+ ... def __getitem__(self, index):
81
+ ... return self.enable_mosaic
82
+ """
83
+
84
+ @wraps(getitem_fn)
85
+ def wrapper(self, index):
86
+ if not isinstance(index, int):
87
+ self.enable_mosaic = index[0]
88
+ index = index[1]
89
+ ret_val = getitem_fn(self, index)
90
+ return ret_val
91
+
92
+ return wrapper
93
+
94
+
95
+ class COCODataset(Dataset):
96
+ """
97
+ COCO dataset class.
98
+ """
99
+
100
+ def __init__(
101
+ self,
102
+ data_dir='data/COCO',
103
+ json_file="instances_train2017.json",
104
+ name="train2017",
105
+ img_size=(416, 416),
106
+ preproc=None
107
+ ):
108
+ """
109
+ COCO dataset initialization. Annotation data are read into memory by COCO API.
110
+ Args:
111
+ data_dir (str): dataset root directory
112
+ json_file (str): COCO json file name
113
+ name (str): COCO data name (e.g. 'train2017' or 'val2017')
114
+ img_size (tuple(int)): target image size after pre-processing
115
+ preproc: data augmentation strategy
116
+ """
117
+ super().__init__(img_size)
118
+ self.data_dir = data_dir
119
+ self.json_file = json_file
120
+ self.coco = COCO(os.path.join(self.data_dir, "annotations", self.json_file))
121
+ remove_useless_info(self.coco)
122
+ self.ids = self.coco.getImgIds()
123
+ self.class_ids = sorted(self.coco.getCatIds())
124
+ self.cats = self.coco.loadCats(self.coco.getCatIds())
125
+ self._classes = tuple([c["name"] for c in self.cats])
126
+ self.imgs = None
127
+ self.name = name
128
+ self.img_size = img_size
129
+ self.preproc = preproc
130
+ self.annotations = self._load_coco_annotations()
131
+
132
+ def __len__(self):
133
+ return len(self.ids)
134
+
135
+ def __del__(self):
136
+ del self.imgs
137
+
138
+ def _load_coco_annotations(self):
139
+ return [self.load_anno_from_ids(_ids) for _ids in self.ids]
140
+
141
+ def load_anno_from_ids(self, id_):
142
+ im_ann = self.coco.loadImgs(id_)[0]
143
+ width = im_ann["width"]
144
+ height = im_ann["height"]
145
+ anno_ids = self.coco.getAnnIds(imgIds=[int(id_)], iscrowd=False)
146
+ annotations = self.coco.loadAnns(anno_ids)
147
+ objs = []
148
+ for obj in annotations:
149
+ x1 = np.max((0, obj["bbox"][0]))
150
+ y1 = np.max((0, obj["bbox"][1]))
151
+ x2 = np.min((width, x1 + np.max((0, obj["bbox"][2]))))
152
+ y2 = np.min((height, y1 + np.max((0, obj["bbox"][3]))))
153
+ if obj["area"] > 0 and x2 >= x1 and y2 >= y1:
154
+ obj["clean_bbox"] = [x1, y1, x2, y2]
155
+ objs.append(obj)
156
+ num_objs = len(objs)
157
+ res = np.zeros((num_objs, 5))
158
+ for ix, obj in enumerate(objs):
159
+ cls = self.class_ids.index(obj["category_id"])
160
+ res[ix, 0:4] = obj["clean_bbox"]
161
+ res[ix, 4] = cls
162
+ r = min(self.img_size[0] / height, self.img_size[1] / width)
163
+ res[:, :4] *= r
164
+ img_info = (height, width)
165
+ resized_info = (int(height * r), int(width * r))
166
+ file_name = (
167
+ im_ann["file_name"]
168
+ if "file_name" in im_ann
169
+ else "{:012}".format(id_) + ".jpg"
170
+ )
171
+ return res, img_info, resized_info, file_name
172
+
173
+ def load_anno(self, index):
174
+ return self.annotations[index][0]
175
+
176
+ def load_resized_img(self, index):
177
+ img = self.load_image(index)
178
+ r = min(self.img_size[0] / img.shape[0], self.img_size[1] / img.shape[1])
179
+ resized_img = cv2.resize(
180
+ img,
181
+ (int(img.shape[1] * r), int(img.shape[0] * r)),
182
+ interpolation=cv2.INTER_LINEAR,
183
+ ).astype(np.uint8)
184
+ return resized_img
185
+
186
+ def load_image(self, index):
187
+ file_name = self.annotations[index][3]
188
+ img_file = os.path.join(self.data_dir, self.name, file_name)
189
+ img = cv2.imread(img_file)
190
+ assert img is not None, f"file named {img_file} not found"
191
+ return img
192
+
193
+ def pull_item(self, index):
194
+ id_ = self.ids[index]
195
+ res, img_info, resized_info, _ = self.annotations[index]
196
+ if self.imgs is not None:
197
+ pad_img = self.imgs[index]
198
+ img = pad_img[: resized_info[0], : resized_info[1], :].copy()
199
+ else:
200
+ img = self.load_resized_img(index)
201
+ return img, res.copy(), img_info, np.array([id_])
202
+
203
+ @Dataset.mosaic_getitem
204
+ def __getitem__(self, index):
205
+ """
206
+ One image / label pair for the given index is picked up and pre-processed.
207
+
208
+ Args:
209
+ index (int): data index
210
+
211
+ Returns:
212
+ img (numpy.ndarray): pre-processed image
213
+ target (torch.Tensor): pre-processed label data.
214
+ The shape is :math:`[max_labels, 5]`.
215
+ each label consists of [class, xc, yc, w, h]:
216
+ class (float): class index.
217
+ xc, yc (float) : center of bbox whose values range from 0 to 1.
218
+ w, h (float) : size of bbox whose values range from 0 to 1.
219
+ img_info : tuple of h, w.
220
+ h, w (int): original shape of the image
221
+ img_id (int): same as the input index. Used for evaluation.
222
+ """
223
+ img, target, img_info, img_id = self.pull_item(index)
224
+ if self.preproc is not None:
225
+ img, target = self.preproc(img, target, self.input_dim)
226
+ return img, target, img_info, img_id
demo_utils.py ADDED
@@ -0,0 +1,224 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ # -*- coding:utf-8 -*-
3
+
4
+ import os
5
+ import cv2
6
+ import numpy as np
7
+
8
+
9
+ def mkdir(path):
10
+ if not os.path.exists(path):
11
+ os.makedirs(path)
12
+
13
+
14
+ def nms(boxes, scores, nms_thr):
15
+ """Single class NMS implemented in Numpy."""
16
+ x1 = boxes[:, 0]
17
+ y1 = boxes[:, 1]
18
+ x2 = boxes[:, 2]
19
+ y2 = boxes[:, 3]
20
+ areas = (x2 - x1 + 1) * (y2 - y1 + 1)
21
+ order = scores.argsort()[::-1]
22
+ keep = []
23
+ while order.size > 0:
24
+ i = order[0]
25
+ keep.append(i)
26
+ xx1 = np.maximum(x1[i], x1[order[1:]])
27
+ yy1 = np.maximum(y1[i], y1[order[1:]])
28
+ xx2 = np.minimum(x2[i], x2[order[1:]])
29
+ yy2 = np.minimum(y2[i], y2[order[1:]])
30
+ w = np.maximum(0.0, xx2 - xx1 + 1)
31
+ h = np.maximum(0.0, yy2 - yy1 + 1)
32
+ inter = w * h
33
+ ovr = inter / (areas[i] + areas[order[1:]] - inter)
34
+ inds = np.where(ovr <= nms_thr)[0]
35
+ order = order[inds + 1]
36
+ return keep
37
+
38
+
39
+ def multiclass_nms(boxes, scores, nms_thr, score_thr, class_agnostic=True):
40
+ """Multiclass NMS implemented in Numpy"""
41
+ if class_agnostic:
42
+ nms_method = multiclass_nms_class_agnostic
43
+ else:
44
+ nms_method = multiclass_nms_class_aware
45
+ return nms_method(boxes, scores, nms_thr, score_thr)
46
+
47
+
48
+ def multiclass_nms_class_aware(boxes, scores, nms_thr, score_thr):
49
+ """Multiclass NMS implemented in Numpy. Class-aware version."""
50
+ final_dets = []
51
+ num_classes = scores.shape[1]
52
+ for cls_ind in range(num_classes):
53
+ cls_scores = scores[:, cls_ind]
54
+ valid_score_mask = cls_scores > score_thr
55
+ if valid_score_mask.sum() == 0:
56
+ continue
57
+ else:
58
+ valid_scores = cls_scores[valid_score_mask]
59
+ valid_boxes = boxes[valid_score_mask]
60
+ keep = nms(valid_boxes, valid_scores, nms_thr)
61
+ if len(keep) > 0:
62
+ cls_inds = np.ones((len(keep), 1)) * cls_ind
63
+ dets = np.concatenate(
64
+ [valid_boxes[keep], valid_scores[keep, None], cls_inds], 1
65
+ )
66
+ final_dets.append(dets)
67
+ if len(final_dets) == 0:
68
+ return None
69
+ return np.concatenate(final_dets, 0)
70
+
71
+
72
+ def multiclass_nms_class_agnostic(boxes, scores, nms_thr, score_thr):
73
+ """Multiclass NMS implemented in Numpy. Class-agnostic version."""
74
+ cls_inds = scores.argmax(1)
75
+ cls_scores = scores[np.arange(len(cls_inds)), cls_inds]
76
+ valid_score_mask = cls_scores > score_thr
77
+ if valid_score_mask.sum() == 0:
78
+ return None
79
+ valid_scores = cls_scores[valid_score_mask]
80
+ valid_boxes = boxes[valid_score_mask]
81
+ valid_cls_inds = cls_inds[valid_score_mask]
82
+ keep = nms(valid_boxes, valid_scores, nms_thr)
83
+ if keep:
84
+ dets = np.concatenate(
85
+ [valid_boxes[keep], valid_scores[keep, None], valid_cls_inds[keep, None]], 1
86
+ )
87
+ return dets
88
+
89
+
90
+ def demo_postprocess(outputs, img_size, p6=False):
91
+ grids = []
92
+ expanded_strides = []
93
+ if not p6:
94
+ strides = [8, 16, 32]
95
+ else:
96
+ strides = [8, 16, 32, 64]
97
+ hsizes = [img_size[0] // stride for stride in strides]
98
+ wsizes = [img_size[1] // stride for stride in strides]
99
+ for hsize, wsize, stride in zip(hsizes, wsizes, strides):
100
+ xv, yv = np.meshgrid(np.arange(wsize), np.arange(hsize))
101
+ grid = np.stack((xv, yv), 2).reshape(1, -1, 2)
102
+ grids.append(grid)
103
+ shape = grid.shape[:2]
104
+ expanded_strides.append(np.full((*shape, 1), stride))
105
+ grids = np.concatenate(grids, 1)
106
+ expanded_strides = np.concatenate(expanded_strides, 1)
107
+ outputs[..., :2] = (outputs[..., :2] + grids) * expanded_strides
108
+ outputs[..., 2:4] = np.exp(outputs[..., 2:4]) * expanded_strides
109
+ return outputs
110
+
111
+
112
+ def vis(img, boxes, scores, cls_ids, conf=0.5, class_names=None):
113
+ for i in range(len(boxes)):
114
+ box = boxes[i]
115
+ cls_id = int(cls_ids[i])
116
+ score = scores[i]
117
+ if score < conf:
118
+ continue
119
+ x0 = int(box[0])
120
+ y0 = int(box[1])
121
+ x1 = int(box[2])
122
+ y1 = int(box[3])
123
+ color = (_COLORS[cls_id] * 255).astype(np.uint8).tolist()
124
+ text = '{}:{:.1f}%'.format(class_names[cls_id], score * 100)
125
+ txt_color = (0, 0, 0) if np.mean(_COLORS[cls_id]) > 0.5 else (255, 255, 255)
126
+ font = cv2.FONT_HERSHEY_SIMPLEX
127
+ txt_size = cv2.getTextSize(text, font, 0.4, 1)[0]
128
+ cv2.rectangle(img, (x0, y0), (x1, y1), color, 2)
129
+ txt_bk_color = (_COLORS[cls_id] * 255 * 0.7).astype(np.uint8).tolist()
130
+ cv2.rectangle(
131
+ img,
132
+ (x0, y0 + 1),
133
+ (x0 + txt_size[0] + 1, y0 + int(1.5*txt_size[1])),
134
+ txt_bk_color,
135
+ -1
136
+ )
137
+ cv2.putText(img, text, (x0, y0 + txt_size[1]), font, 0.4, txt_color, thickness=1)
138
+ return img
139
+
140
+
141
+ _COLORS = np.array(
142
+ [
143
+ 0.000, 0.447, 0.741,
144
+ 0.850, 0.325, 0.098,
145
+ 0.929, 0.694, 0.125,
146
+ 0.494, 0.184, 0.556,
147
+ 0.466, 0.674, 0.188,
148
+ 0.301, 0.745, 0.933,
149
+ 0.635, 0.078, 0.184,
150
+ 0.300, 0.300, 0.300,
151
+ 0.600, 0.600, 0.600,
152
+ 1.000, 0.000, 0.000,
153
+ 1.000, 0.500, 0.000,
154
+ 0.749, 0.749, 0.000,
155
+ 0.000, 1.000, 0.000,
156
+ 0.000, 0.000, 1.000,
157
+ 0.667, 0.000, 1.000,
158
+ 0.333, 0.333, 0.000,
159
+ 0.333, 0.667, 0.000,
160
+ 0.333, 1.000, 0.000,
161
+ 0.667, 0.333, 0.000,
162
+ 0.667, 0.667, 0.000,
163
+ 0.667, 1.000, 0.000,
164
+ 1.000, 0.333, 0.000,
165
+ 1.000, 0.667, 0.000,
166
+ 1.000, 1.000, 0.000,
167
+ 0.000, 0.333, 0.500,
168
+ 0.000, 0.667, 0.500,
169
+ 0.000, 1.000, 0.500,
170
+ 0.333, 0.000, 0.500,
171
+ 0.333, 0.333, 0.500,
172
+ 0.333, 0.667, 0.500,
173
+ 0.333, 1.000, 0.500,
174
+ 0.667, 0.000, 0.500,
175
+ 0.667, 0.333, 0.500,
176
+ 0.667, 0.667, 0.500,
177
+ 0.667, 1.000, 0.500,
178
+ 1.000, 0.000, 0.500,
179
+ 1.000, 0.333, 0.500,
180
+ 1.000, 0.667, 0.500,
181
+ 1.000, 1.000, 0.500,
182
+ 0.000, 0.333, 1.000,
183
+ 0.000, 0.667, 1.000,
184
+ 0.000, 1.000, 1.000,
185
+ 0.333, 0.000, 1.000,
186
+ 0.333, 0.333, 1.000,
187
+ 0.333, 0.667, 1.000,
188
+ 0.333, 1.000, 1.000,
189
+ 0.667, 0.000, 1.000,
190
+ 0.667, 0.333, 1.000,
191
+ 0.667, 0.667, 1.000,
192
+ 0.667, 1.000, 1.000,
193
+ 1.000, 0.000, 1.000,
194
+ 1.000, 0.333, 1.000,
195
+ 1.000, 0.667, 1.000,
196
+ 0.333, 0.000, 0.000,
197
+ 0.500, 0.000, 0.000,
198
+ 0.667, 0.000, 0.000,
199
+ 0.833, 0.000, 0.000,
200
+ 1.000, 0.000, 0.000,
201
+ 0.000, 0.167, 0.000,
202
+ 0.000, 0.333, 0.000,
203
+ 0.000, 0.500, 0.000,
204
+ 0.000, 0.667, 0.000,
205
+ 0.000, 0.833, 0.000,
206
+ 0.000, 1.000, 0.000,
207
+ 0.000, 0.000, 0.167,
208
+ 0.000, 0.000, 0.333,
209
+ 0.000, 0.000, 0.500,
210
+ 0.000, 0.000, 0.667,
211
+ 0.000, 0.000, 0.833,
212
+ 0.000, 0.000, 1.000,
213
+ 0.000, 0.000, 0.000,
214
+ 0.143, 0.143, 0.143,
215
+ 0.286, 0.286, 0.286,
216
+ 0.429, 0.429, 0.429,
217
+ 0.571, 0.571, 0.571,
218
+ 0.714, 0.714, 0.714,
219
+ 0.857, 0.857, 0.857,
220
+ 0.000, 0.447, 0.741,
221
+ 0.314, 0.717, 0.741,
222
+ 0.50, 0.5, 0
223
+ ]
224
+ ).astype(np.float32).reshape(-1, 3)
eval_onnx.py ADDED
@@ -0,0 +1,444 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ # -*- coding:utf-8 -*-
3
+
4
+ import io
5
+ import sys
6
+ import cv2
7
+ import json
8
+ import time
9
+ import pathlib
10
+ import argparse
11
+ import tempfile
12
+ import itertools
13
+ import contextlib
14
+ import torch
15
+ import torchvision
16
+ import numpy as np
17
+ import onnxruntime as ort
18
+ from tqdm import tqdm
19
+ from loguru import logger
20
+ from tabulate import tabulate
21
+ from collections import defaultdict
22
+ from pycocotools.cocoeval import COCOeval
23
+
24
+ CURRENT_DIR = pathlib.Path(__file__).parent
25
+ sys.path.append(str(CURRENT_DIR))
26
+
27
+ from coco import COCO_CLASSES
28
+
29
+
30
+ class COCOEvaluator:
31
+ """
32
+ COCO AP Evaluation class. All the data in the val2017 dataset are processed
33
+ and evaluated by COCO API.
34
+ """
35
+
36
+ def __init__(
37
+ self,
38
+ dataloader,
39
+ img_size: int,
40
+ confthre: float,
41
+ nmsthre: float,
42
+ num_classes: int,
43
+ testdev: bool = False,
44
+ per_class_AP: bool = False,
45
+ per_class_AR: bool = False,
46
+ ):
47
+ """
48
+ Args:
49
+ dataloader (Dataloader): evaluate dataloader.
50
+ img_size: image size after preprocess. images are resized
51
+ to squares whose shape is (img_size, img_size).
52
+ confthre: confidence threshold ranging from 0 to 1, which
53
+ is defined in the config file.
54
+ nmsthre: IoU threshold of non-max supression ranging from 0 to 1.
55
+ num_classes: number of all classes of interest.
56
+ testdev: whether run on the testdev set of COCO.
57
+ per_class_AP: Show per class AP during evalution or not. Default to False.
58
+ per_class_AR: Show per class AR during evalution or not. Default to False.
59
+ """
60
+ self.dataloader = dataloader
61
+ self.img_size = img_size
62
+ self.confthre = confthre
63
+ self.nmsthre = nmsthre
64
+ self.num_classes = num_classes
65
+ self.testdev = testdev
66
+ self.per_class_AP = per_class_AP
67
+ self.per_class_AR = per_class_AR
68
+
69
+ def evaluate(self, ort_sess, return_outputs=False):
70
+ """
71
+ COCO average precision (AP) Evaluation. Iterate inference on the test dataset
72
+ and the results are evaluated by COCO API.
73
+
74
+ NOTE: This function will change training mode to False, please save states if needed.
75
+
76
+ Args:
77
+ ort_sess (onnxruntime.InferenceSession): onnxruntime session to evaluate.
78
+ return_outputs (bool): flag indicates whether return image-wise result or not
79
+
80
+ Returns:
81
+ eval_results (tuple): summary of metrics for evaluation
82
+ output_data (defaultdict): image-wise result
83
+ """
84
+ data_list = []
85
+ output_data = defaultdict()
86
+ inference_time = 0
87
+ nms_time = 0
88
+ n_samples = max(len(self.dataloader) - 1, 1)
89
+ input_name = ort_sess.get_inputs()[0].name
90
+ for cur_iter, (imgs, _, info_imgs, ids) in enumerate(tqdm(self.dataloader)):
91
+ # with torch.no_grad():
92
+ # skip the last iters since batchsize might be not enough for batch inference
93
+ is_time_record = cur_iter < len(self.dataloader) - 1
94
+ if is_time_record:
95
+ start = time.time()
96
+ outputs = ort_sess.run(None, {input_name: imgs.numpy()})
97
+ outputs = [torch.Tensor(out) for out in outputs]
98
+ outputs = head_postprocess(outputs)
99
+ if is_time_record:
100
+ infer_end = time.time()
101
+ inference_time += infer_end - start
102
+ outputs = postprocess(outputs, self.num_classes, self.confthre, self.nmsthre)
103
+ if is_time_record:
104
+ nms_end = time.time()
105
+ nms_time += nms_end - infer_end
106
+ data_list_elem, image_wise_data = self.convert_to_coco_format(
107
+ outputs, info_imgs, ids, return_outputs=True)
108
+ data_list.extend(data_list_elem)
109
+ output_data.update(image_wise_data)
110
+ statistics = [inference_time, nms_time, n_samples]
111
+ eval_results = self.evaluate_prediction(data_list, statistics)
112
+ if return_outputs:
113
+ return eval_results, output_data
114
+ return eval_results
115
+
116
+ def convert_to_coco_format(self, outputs, info_imgs, ids, return_outputs=False):
117
+ data_list = []
118
+ image_wise_data = defaultdict(dict)
119
+ for (output, img_h, img_w, img_id) in zip(
120
+ outputs, info_imgs[0], info_imgs[1], ids
121
+ ):
122
+ if output is None:
123
+ continue
124
+ output = output.cpu()
125
+ bboxes = output[:, 0:4]
126
+ # preprocessing: resize
127
+ scale = min(
128
+ self.img_size[0] / float(img_h), self.img_size[1] / float(img_w)
129
+ )
130
+ bboxes /= scale
131
+ cls = output[:, 6]
132
+ scores = output[:, 4] * output[:, 5]
133
+ image_wise_data.update({
134
+ int(img_id): {
135
+ "bboxes": [box.numpy().tolist() for box in bboxes],
136
+ "scores": [score.numpy().item() for score in scores],
137
+ "categories": [
138
+ self.dataloader.dataset.class_ids[int(cls[ind])]
139
+ for ind in range(bboxes.shape[0])
140
+ ],
141
+ }
142
+ })
143
+ bboxes = xyxy2xywh(bboxes)
144
+ for ind in range(bboxes.shape[0]):
145
+ label = self.dataloader.dataset.class_ids[int(cls[ind])]
146
+ pred_data = {
147
+ "image_id": int(img_id),
148
+ "category_id": label,
149
+ "bbox": bboxes[ind].numpy().tolist(),
150
+ "score": scores[ind].numpy().item(),
151
+ "segmentation": [],
152
+ } # COCO json format
153
+ data_list.append(pred_data)
154
+ if return_outputs:
155
+ return data_list, image_wise_data
156
+ return data_list
157
+
158
+ def evaluate_prediction(self, data_dict, statistics):
159
+ # if not is_main_process():
160
+ # return 0, 0, None
161
+ logger.info("Evaluate in main process...")
162
+ annType = ["segm", "bbox", "keypoints"]
163
+ inference_time = statistics[0]
164
+ nms_time = statistics[1]
165
+ n_samples = statistics[2]
166
+ a_infer_time = 1000 * inference_time / (n_samples * self.dataloader.batch_size)
167
+ a_nms_time = 1000 * nms_time / (n_samples * self.dataloader.batch_size)
168
+ time_info = ", ".join(
169
+ [
170
+ "Average {} time: {:.2f} ms".format(k, v)
171
+ for k, v in zip(
172
+ ["forward", "NMS", "inference"],
173
+ [a_infer_time, a_nms_time, (a_infer_time + a_nms_time)],
174
+ )
175
+ ]
176
+ )
177
+ info = time_info + "\n"
178
+ # Evaluate the Dt (detection) json comparing with the ground truth
179
+ if len(data_dict) > 0:
180
+ cocoGt = self.dataloader.dataset.coco
181
+ if self.testdev:
182
+ json.dump(data_dict, open("./yolox_testdev_2017.json", "w"))
183
+ cocoDt = cocoGt.loadRes("./yolox_testdev_2017.json")
184
+ else:
185
+ _, tmp = tempfile.mkstemp()
186
+ json.dump(data_dict, open(tmp, "w"))
187
+ cocoDt = cocoGt.loadRes(tmp)
188
+ logger.info("Use standard COCOeval.")
189
+ cocoEval = COCOeval(cocoGt, cocoDt, annType[1])
190
+ cocoEval.evaluate()
191
+ cocoEval.accumulate()
192
+ redirect_string = io.StringIO()
193
+ with contextlib.redirect_stdout(redirect_string):
194
+ cocoEval.summarize()
195
+ info += redirect_string.getvalue()
196
+ cat_ids = list(cocoGt.cats.keys())
197
+ cat_names = [cocoGt.cats[catId]['name'] for catId in sorted(cat_ids)]
198
+ if self.per_class_AP:
199
+ AP_table = per_class_AP_table(cocoEval, class_names=cat_names)
200
+ info += "per class AP:\n" + AP_table + "\n"
201
+ if self.per_class_AR:
202
+ AR_table = per_class_AR_table(cocoEval, class_names=cat_names)
203
+ info += "per class AR:\n" + AR_table + "\n"
204
+ return cocoEval.stats[0], cocoEval.stats[1], info
205
+ else:
206
+ return 0, 0, info
207
+
208
+
209
+ class ValTransform:
210
+ """
211
+ Defines the transformations that should be applied to test PIL image
212
+ for input into the network
213
+ """
214
+
215
+ def __init__(self, swap=(2, 0, 1), legacy=False):
216
+ self.swap = swap
217
+ self.legacy = legacy
218
+
219
+ # assume input is cv2 img for now
220
+ def __call__(self, img, res, input_size):
221
+ img, _ = preproc(img, input_size, self.swap)
222
+ if self.legacy:
223
+ img = img[::-1, :, :].copy()
224
+ img /= 255.0
225
+ img -= np.array([0.485, 0.456, 0.406]).reshape(3, 1, 1)
226
+ img /= np.array([0.229, 0.224, 0.225]).reshape(3, 1, 1)
227
+ return img, np.zeros((1, 5))
228
+
229
+
230
+ def preproc(img, input_size, swap=(2, 0, 1)):
231
+ """Preprocess function for preparing input for the network"""
232
+ if len(img.shape) == 3:
233
+ padded_img = np.ones((input_size[0], input_size[1], 3), dtype=np.uint8) * 114
234
+ else:
235
+ padded_img = np.ones(input_size, dtype=np.uint8) * 114
236
+ r = min(input_size[0] / img.shape[0], input_size[1] / img.shape[1])
237
+ resized_img = cv2.resize(
238
+ img,
239
+ (int(img.shape[1] * r), int(img.shape[0] * r)),
240
+ interpolation=cv2.INTER_LINEAR,
241
+ ).astype(np.uint8)
242
+ padded_img[: int(img.shape[0] * r), : int(img.shape[1] * r)] = resized_img
243
+ padded_img = padded_img.transpose(swap)
244
+ padded_img = np.ascontiguousarray(padded_img, dtype=np.float32)
245
+ return padded_img, r
246
+
247
+
248
+ def postprocess(prediction, num_classes, conf_thre=0.7, nms_thre=0.45, class_agnostic=False):
249
+ """Post-processing part after the prediction heads with NMS"""
250
+ box_corner = prediction.new(prediction.shape)
251
+ box_corner[:, :, 0] = prediction[:, :, 0] - prediction[:, :, 2] / 2
252
+ box_corner[:, :, 1] = prediction[:, :, 1] - prediction[:, :, 3] / 2
253
+ box_corner[:, :, 2] = prediction[:, :, 0] + prediction[:, :, 2] / 2
254
+ box_corner[:, :, 3] = prediction[:, :, 1] + prediction[:, :, 3] / 2
255
+ prediction[:, :, :4] = box_corner[:, :, :4]
256
+ output = [None for _ in range(len(prediction))]
257
+ for i, image_pred in enumerate(prediction):
258
+ # If none are remaining => process next image
259
+ if not image_pred.size(0):
260
+ continue
261
+ # Get score and class with the highest confidence
262
+ class_conf, class_pred = torch.max(image_pred[:, 5: 5 + num_classes], 1, keepdim=True)
263
+ conf_mask = (image_pred[:, 4] * class_conf.squeeze() >= conf_thre).squeeze()
264
+ # Detections ordered as (x1, y1, x2, y2, obj_conf, class_conf, class_pred)
265
+ detections = torch.cat((image_pred[:, :5], class_conf, class_pred.float()), 1)
266
+ detections = detections[conf_mask]
267
+ if not detections.size(0):
268
+ continue
269
+ if class_agnostic:
270
+ nms_out_index = torchvision.ops.nms(
271
+ detections[:, :4],
272
+ detections[:, 4] * detections[:, 5],
273
+ nms_thre,
274
+ )
275
+ else:
276
+ nms_out_index = torchvision.ops.batched_nms(
277
+ detections[:, :4],
278
+ detections[:, 4] * detections[:, 5],
279
+ detections[:, 6],
280
+ nms_thre,
281
+ )
282
+ detections = detections[nms_out_index]
283
+ if output[i] is None:
284
+ output[i] = detections
285
+ else:
286
+ output[i] = torch.cat((output[i], detections))
287
+ return output
288
+
289
+
290
+ def head_postprocess(outputs, strides=[8, 16, 32]):
291
+ """Decode outputs from predictions of the detection heads"""
292
+ hw = [x.shape[-2:] for x in outputs]
293
+ # [batch, n_anchors_all, 85]
294
+ outputs = torch.cat([x.flatten(start_dim=2) for x in outputs], dim=2).permute(0, 2, 1)
295
+ outputs[..., 4:] = outputs[..., 4:].sigmoid()
296
+ return decode_outputs(outputs, outputs[0].type(), hw, strides)
297
+
298
+
299
+ def decode_outputs(outputs, dtype, ori_hw, ori_strides):
300
+ grids = []
301
+ strides = []
302
+ for (hsize, wsize), stride in zip(ori_hw, ori_strides):
303
+ yv, xv = meshgrid([torch.arange(hsize), torch.arange(wsize)])
304
+ grid = torch.stack((xv, yv), 2).view(1, -1, 2)
305
+ grids.append(grid)
306
+ shape = grid.shape[:2]
307
+ strides.append(torch.full((*shape, 1), stride))
308
+ grids = torch.cat(grids, dim=1).type(dtype)
309
+ strides = torch.cat(strides, dim=1).type(dtype)
310
+ outputs[..., :2] = (outputs[..., :2] + grids) * strides
311
+ outputs[..., 2:4] = torch.exp(outputs[..., 2:4]) * strides
312
+ return outputs
313
+
314
+
315
+ def xyxy2xywh(bboxes):
316
+ bboxes[:, 2] = bboxes[:, 2] - bboxes[:, 0]
317
+ bboxes[:, 3] = bboxes[:, 3] - bboxes[:, 1]
318
+ return bboxes
319
+
320
+
321
+ def meshgrid(*tensors):
322
+ _TORCH_VER = [int(x) for x in torch.__version__.split(".")[:2]]
323
+ if _TORCH_VER >= [1, 10]:
324
+ return torch.meshgrid(*tensors, indexing="ij")
325
+ else:
326
+ return torch.meshgrid(*tensors)
327
+
328
+
329
+ def per_class_AR_table(coco_eval, class_names=COCO_CLASSES, headers=["class", "AR"], colums=6):
330
+ """Format the recall of each class"""
331
+ per_class_AR = {}
332
+ recalls = coco_eval.eval["recall"]
333
+ # dimension of recalls: [TxKxAxM]
334
+ # recall has dims (iou, cls, area range, max dets)
335
+ assert len(class_names) == recalls.shape[1]
336
+ for idx, name in enumerate(class_names):
337
+ recall = recalls[:, idx, 0, -1]
338
+ recall = recall[recall > -1]
339
+ ar = np.mean(recall) if recall.size else float("nan")
340
+ per_class_AR[name] = float(ar * 100)
341
+ num_cols = min(colums, len(per_class_AR) * len(headers))
342
+ result_pair = [x for pair in per_class_AR.items() for x in pair]
343
+ row_pair = itertools.zip_longest(*[result_pair[i::num_cols] for i in range(num_cols)])
344
+ table_headers = headers * (num_cols // len(headers))
345
+ table = tabulate(
346
+ row_pair, tablefmt="pipe", floatfmt=".3f", headers=table_headers, numalign="left",
347
+ )
348
+ return table
349
+
350
+
351
+ def per_class_AP_table(coco_eval, class_names=COCO_CLASSES, headers=["class", "AP"], colums=6):
352
+ """Format the precision of each class"""
353
+ per_class_AP = {}
354
+ precisions = coco_eval.eval["precision"]
355
+ # dimension of precisions: [TxRxKxAxM]
356
+ # precision has dims (iou, recall, cls, area range, max dets)
357
+ assert len(class_names) == precisions.shape[2]
358
+ for idx, name in enumerate(class_names):
359
+ # area range index 0: all area ranges
360
+ # max dets index -1: typically 100 per image
361
+ precision = precisions[:, :, idx, 0, -1]
362
+ precision = precision[precision > -1]
363
+ ap = np.mean(precision) if precision.size else float("nan")
364
+ per_class_AP[name] = float(ap * 100)
365
+ num_cols = min(colums, len(per_class_AP) * len(headers))
366
+ result_pair = [x for pair in per_class_AP.items() for x in pair]
367
+ row_pair = itertools.zip_longest(*[result_pair[i::num_cols] for i in range(num_cols)])
368
+ table_headers = headers * (num_cols // len(headers))
369
+ table = tabulate(
370
+ row_pair, tablefmt="pipe", floatfmt=".3f", headers=table_headers, numalign="left",
371
+ )
372
+ return table
373
+
374
+
375
+ def get_eval_loader(batch_size, test_size=(640, 640), data_dir='data/COCO', data_num_workers=0, testdev=False, legacy=False):
376
+ from coco import COCODataset
377
+ valdataset = COCODataset(
378
+ data_dir=data_dir,
379
+ json_file='instances_val2017.json' if not testdev else 'instances_test2017.json',
380
+ name="val2017" if not testdev else "test2017",
381
+ img_size=test_size,
382
+ preproc=ValTransform(legacy=legacy),
383
+ )
384
+ sampler = torch.utils.data.SequentialSampler(valdataset)
385
+ dataloader_kwargs = {
386
+ "num_workers": data_num_workers,
387
+ "pin_memory": True,
388
+ "sampler": sampler,
389
+ "batch_size": batch_size
390
+ }
391
+ val_loader = torch.utils.data.DataLoader(valdataset, **dataloader_kwargs)
392
+ return val_loader
393
+
394
+
395
+ def make_parser():
396
+ parser = argparse.ArgumentParser("onnxruntime inference sample")
397
+ parser.add_argument(
398
+ "-m",
399
+ "--model",
400
+ type=str,
401
+ default="yolox-s-int8.onnx",
402
+ help="Input your onnx model.",
403
+ )
404
+ parser.add_argument(
405
+ "-b",
406
+ "--batch_size",
407
+ type=int,
408
+ default=1,
409
+ help="Batch size for inference..",
410
+ )
411
+ parser.add_argument(
412
+ "--input_shape",
413
+ type=str,
414
+ default="640,640",
415
+ help="Specify an input shape for inference.",
416
+ )
417
+ parser.add_argument(
418
+ "--ipu",
419
+ action="store_true",
420
+ help="Use IPU for inference.",
421
+ )
422
+ parser.add_argument(
423
+ "--provider_config",
424
+ type=str,
425
+ default="vaip_config.json",
426
+ help="Path of the config file for setting provider_options.",
427
+ )
428
+ return parser
429
+
430
+
431
+ if __name__ == '__main__':
432
+ args = make_parser().parse_args()
433
+ input_shape = tuple(map(int, args.input_shape.split(',')))
434
+ if args.ipu:
435
+ providers = ["VitisAIExecutionProvider"]
436
+ provider_options = [{"config_file": args.provider_config}]
437
+ else:
438
+ providers = ['CUDAExecutionProvider', 'CPUExecutionProvider']
439
+ provider_options = None
440
+ session = ort.InferenceSession(args.model, providers=providers, provider_options=provider_options)
441
+ val_loader = get_eval_loader(args.batch_size)
442
+ evaluator = COCOEvaluator(dataloader=val_loader, img_size=input_shape, confthre=0.01, nmsthre=0.65, num_classes=80, testdev=False)
443
+ *_, summary = evaluator.evaluate(session)
444
+ logger.info("\n" + summary)
infer_onnx.py ADDED
@@ -0,0 +1,151 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ # -*- coding: utf-8 -*-
3
+
4
+ import os
5
+ import sys
6
+ import cv2
7
+ import pathlib
8
+ import argparse
9
+ import numpy as np
10
+ import onnxruntime as ort
11
+
12
+ CURRENT_DIR = pathlib.Path(__file__).parent
13
+ sys.path.append(str(CURRENT_DIR))
14
+
15
+ from coco import COCO_CLASSES
16
+ from demo_utils import mkdir, multiclass_nms, demo_postprocess, vis
17
+
18
+
19
+ def make_parser():
20
+ parser = argparse.ArgumentParser("onnxruntime inference sample")
21
+ parser.add_argument(
22
+ "-m",
23
+ "--model",
24
+ type=str,
25
+ default="yolox-s-int8.onnx",
26
+ help="Input your onnx model.",
27
+ )
28
+ parser.add_argument(
29
+ "-i",
30
+ "--image_path",
31
+ type=str,
32
+ default='test_image.png',
33
+ help="Path to your input image.",
34
+ )
35
+ parser.add_argument(
36
+ "-o",
37
+ "--output_dir",
38
+ type=str,
39
+ default='demo_output',
40
+ help="Path to your output directory.",
41
+ )
42
+ parser.add_argument(
43
+ "-s",
44
+ "--score_thr",
45
+ type=float,
46
+ default=0.3,
47
+ help="Score threshold to filter the result.",
48
+ )
49
+ parser.add_argument(
50
+ "--input_shape",
51
+ type=str,
52
+ default="640,640",
53
+ help="Specify an input shape for inference.",
54
+ )
55
+ parser.add_argument(
56
+ "--ipu",
57
+ action="store_true",
58
+ help="Use IPU for inference.",
59
+ )
60
+ parser.add_argument(
61
+ "--provider_config",
62
+ type=str,
63
+ default="vaip_config.json",
64
+ help="Path of the config file for setting provider_options.",
65
+ )
66
+ return parser
67
+
68
+
69
+ def preprocess(img, input_shape, swap=(2, 0, 1)):
70
+ """
71
+ Preprocessing part of YOLOX for scaling and padding image as input to the network.
72
+
73
+ Args:
74
+ img (numpy.ndarray): H x W x C, image read with OpenCV
75
+ input_shape (tuple(int)): input shape of the network for inference
76
+ swap (tuple(int)): new order of axes to transpose the input image
77
+
78
+ Returns:
79
+ padded_img (numpy.ndarray): preprocessed image to be fed to the network
80
+ ratio (float): ratio for scaling the image to the input shape
81
+ """
82
+ if len(img.shape) == 3:
83
+ padded_img = np.ones((input_shape[0], input_shape[1], 3), dtype=np.uint8) * 114
84
+ else:
85
+ padded_img = np.ones(input_shape, dtype=np.uint8) * 114
86
+ ratio = min(input_shape[0] / img.shape[0], input_shape[1] / img.shape[1])
87
+ resized_img = cv2.resize(
88
+ img,
89
+ (int(img.shape[1] * ratio), int(img.shape[0] * ratio)),
90
+ interpolation=cv2.INTER_LINEAR,
91
+ ).astype(np.uint8)
92
+ padded_img[: int(img.shape[0] * ratio), : int(img.shape[1] * ratio)] = resized_img
93
+ padded_img = padded_img.transpose(swap)
94
+ padded_img = np.ascontiguousarray(padded_img, dtype=np.float32)
95
+ return padded_img, ratio
96
+
97
+
98
+ def postprocess(outputs, input_shape, ratio):
99
+ """
100
+ Post-processing part of YOLOX for generating final results from outputs of the network.
101
+
102
+ Args:
103
+ outputs (tuple(numpy.ndarray)): outputs of the detection heads with onnxruntime session
104
+ input_shape (tuple(int)): input shape of the network for inference
105
+ ratio (float): ratio for scaling the image to the input shape
106
+
107
+ Returns:
108
+ dets (numpy.ndarray): n x 6, dets[:,:4] -> boxes, dets[:,4] -> scores, dets[:,5] -> class indices
109
+ """
110
+ outputs = [out.reshape(*out.shape[:2], -1).transpose(0,2,1) for out in outputs]
111
+ outputs = np.concatenate(outputs, axis=1)
112
+ outputs[..., 4:] = sigmoid(outputs[..., 4:])
113
+ predictions = demo_postprocess(outputs, input_shape, p6=False)[0]
114
+ boxes = predictions[:, :4]
115
+ scores = predictions[:, 4:5] * predictions[:, 5:]
116
+ boxes_xyxy = np.ones_like(boxes)
117
+ boxes_xyxy[:, 0] = boxes[:, 0] - boxes[:, 2]/2.
118
+ boxes_xyxy[:, 1] = boxes[:, 1] - boxes[:, 3]/2.
119
+ boxes_xyxy[:, 2] = boxes[:, 0] + boxes[:, 2]/2.
120
+ boxes_xyxy[:, 3] = boxes[:, 1] + boxes[:, 3]/2.
121
+ boxes_xyxy /= ratio
122
+ dets = multiclass_nms(boxes_xyxy, scores, nms_thr=0.45, score_thr=0.1)
123
+ return dets
124
+
125
+
126
+ def sigmoid(x):
127
+ return 1.0 / (1.0 + np.exp(-x))
128
+
129
+
130
+ if __name__ == '__main__':
131
+ args = make_parser().parse_args()
132
+ input_shape = tuple(map(int, args.input_shape.split(',')))
133
+ origin_img = cv2.imread(args.image_path)
134
+ img, ratio = preprocess(origin_img, input_shape)
135
+ if args.ipu:
136
+ providers = ["VitisAIExecutionProvider"]
137
+ provider_options = [{"config_file": args.provider_config}]
138
+ else:
139
+ providers = ['CUDAExecutionProvider', 'CPUExecutionProvider']
140
+ provider_options = None
141
+ session = ort.InferenceSession(args.model, providers=providers, provider_options=provider_options)
142
+ ort_inputs = {session.get_inputs()[0].name: img[None, :, :, :]}
143
+ outputs = session.run(None, ort_inputs)
144
+ dets = postprocess(outputs, input_shape, ratio)
145
+ if dets is not None:
146
+ final_boxes, final_scores, final_cls_inds = dets[:, :4], dets[:, 4], dets[:, 5]
147
+ origin_img = vis(origin_img, final_boxes, final_scores, final_cls_inds,
148
+ conf=args.score_thr, class_names=COCO_CLASSES)
149
+ mkdir(args.output_dir)
150
+ output_path = os.path.join(args.output_dir, os.path.basename(args.image_path))
151
+ cv2.imwrite(output_path, origin_img)
requirements.txt ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ torch>=1.12.0
2
+ torchvision>=0.13.0
3
+ opencv_python
4
+ numpy
5
+ loguru
6
+ tqdm
7
+ tabulate
8
+ pycocotools>=2.0.2
9
+ # onnxruntime
yolox-s-int8.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:87154c9d3bd7ce411b03e2ff7c124a6f2f8bf2b6191049d633d2332659fb0d41
3
+ size 35988727