mamechin commited on
Commit
7df6ce7
1 Parent(s): d953805
deploy/triton-inference-server/README.md ADDED
@@ -0,0 +1,164 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # YOLOv7 on Triton Inference Server
2
+
3
+ Instructions to deploy YOLOv7 as TensorRT engine to [Triton Inference Server](https://github.com/NVIDIA/triton-inference-server).
4
+
5
+ Triton Inference Server takes care of model deployment with many out-of-the-box benefits, like a GRPC and HTTP interface, automatic scheduling on multiple GPUs, shared memory (even on GPU), dynamic server-side batching, health metrics and memory resource management.
6
+
7
+ There are no additional dependencies needed to run this deployment, except a working docker daemon with GPU support.
8
+
9
+ ## Export TensorRT
10
+
11
+ See https://github.com/WongKinYiu/yolov7#export for more info.
12
+
13
+ ```bash
14
+ #install onnx-simplifier not listed in general yolov7 requirements.txt
15
+ pip3 install onnx-simplifier
16
+
17
+ # Pytorch Yolov7 -> ONNX with grid, EfficientNMS plugin and dynamic batch size
18
+ python export.py --weights ./yolov7.pt --grid --end2end --dynamic-batch --simplify --topk-all 100 --iou-thres 0.65 --conf-thres 0.35 --img-size 640 640
19
+ # ONNX -> TensorRT with trtexec and docker
20
+ docker run -it --rm --gpus=all nvcr.io/nvidia/tensorrt:22.06-py3
21
+ # Copy onnx -> container: docker cp yolov7.onnx <container-id>:/workspace/
22
+ # Export with FP16 precision, min batch 1, opt batch 8 and max batch 8
23
+ ./tensorrt/bin/trtexec --onnx=yolov7.onnx --minShapes=images:1x3x640x640 --optShapes=images:8x3x640x640 --maxShapes=images:8x3x640x640 --fp16 --workspace=4096 --saveEngine=yolov7-fp16-1x8x8.engine --timingCacheFile=timing.cache
24
+ # Test engine
25
+ ./tensorrt/bin/trtexec --loadEngine=yolov7-fp16-1x8x8.engine
26
+ # Copy engine -> host: docker cp <container-id>:/workspace/yolov7-fp16-1x8x8.engine .
27
+ ```
28
+
29
+ Example output of test with RTX 3090.
30
+
31
+ ```
32
+ [I] === Performance summary ===
33
+ [I] Throughput: 73.4985 qps
34
+ [I] Latency: min = 14.8578 ms, max = 15.8344 ms, mean = 15.07 ms, median = 15.0422 ms, percentile(99%) = 15.7443 ms
35
+ [I] End-to-End Host Latency: min = 25.8715 ms, max = 28.4102 ms, mean = 26.672 ms, median = 26.6082 ms, percentile(99%) = 27.8314 ms
36
+ [I] Enqueue Time: min = 0.793701 ms, max = 1.47144 ms, mean = 1.2008 ms, median = 1.28644 ms, percentile(99%) = 1.38965 ms
37
+ [I] H2D Latency: min = 1.50073 ms, max = 1.52454 ms, mean = 1.51225 ms, median = 1.51404 ms, percentile(99%) = 1.51941 ms
38
+ [I] GPU Compute Time: min = 13.3386 ms, max = 14.3186 ms, mean = 13.5448 ms, median = 13.5178 ms, percentile(99%) = 14.2151 ms
39
+ [I] D2H Latency: min = 0.00878906 ms, max = 0.0172729 ms, mean = 0.0128844 ms, median = 0.0125732 ms, percentile(99%) = 0.0166016 ms
40
+ [I] Total Host Walltime: 3.04768 s
41
+ [I] Total GPU Compute Time: 3.03404 s
42
+ [I] Explanations of the performance metrics are printed in the verbose logs.
43
+ ```
44
+ Note: 73.5 qps x batch 8 = 588 fps @ ~15ms latency.
45
+
46
+ ## Model Repository
47
+
48
+ See [Triton Model Repository Documentation](https://github.com/triton-inference-server/server/blob/main/docs/model_repository.md#model-repository) for more info.
49
+
50
+ ```bash
51
+ # Create folder structure
52
+ mkdir -p triton-deploy/models/yolov7/1/
53
+ touch triton-deploy/models/yolov7/config.pbtxt
54
+ # Place model
55
+ mv yolov7-fp16-1x8x8.engine triton-deploy/models/yolov7/1/model.plan
56
+ ```
57
+
58
+ ## Model Configuration
59
+
60
+ See [Triton Model Configuration Documentation](https://github.com/triton-inference-server/server/blob/main/docs/model_configuration.md#model-configuration) for more info.
61
+
62
+ Minimal configuration for `triton-deploy/models/yolov7/config.pbtxt`:
63
+
64
+ ```
65
+ name: "yolov7"
66
+ platform: "tensorrt_plan"
67
+ max_batch_size: 8
68
+ dynamic_batching { }
69
+ ```
70
+
71
+ Example repository:
72
+
73
+ ```bash
74
+ $ tree triton-deploy/
75
+ triton-deploy/
76
+ └── models
77
+ └── yolov7
78
+ ├── 1
79
+ │   └── model.plan
80
+ └── config.pbtxt
81
+
82
+ 3 directories, 2 files
83
+ ```
84
+
85
+ ## Start Triton Inference Server
86
+
87
+ ```
88
+ docker run --gpus all --rm --ipc=host --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 -p8000:8000 -p8001:8001 -p8002:8002 -v$(pwd)/triton-deploy/models:/models nvcr.io/nvidia/tritonserver:22.06-py3 tritonserver --model-repository=/models --strict-model-config=false --log-verbose 1
89
+ ```
90
+
91
+ In the log you should see:
92
+
93
+ ```
94
+ +--------+---------+--------+
95
+ | Model | Version | Status |
96
+ +--------+---------+--------+
97
+ | yolov7 | 1 | READY |
98
+ +--------+---------+--------+
99
+ ```
100
+
101
+ ## Performance with Model Analyzer
102
+
103
+ See [Triton Model Analyzer Documentation](https://github.com/triton-inference-server/server/blob/main/docs/model_analyzer.md#model-analyzer) for more info.
104
+
105
+ Performance numbers @ RTX 3090 + AMD Ryzen 9 5950X
106
+
107
+ Example test for 16 concurrent clients using shared memory, each with batch size 1 requests:
108
+
109
+ ```bash
110
+ docker run -it --ipc=host --net=host nvcr.io/nvidia/tritonserver:22.06-py3-sdk /bin/bash
111
+
112
+ ./install/bin/perf_analyzer -m yolov7 -u 127.0.0.1:8001 -i grpc --shared-memory system --concurrency-range 16
113
+
114
+ # Result (truncated)
115
+ Concurrency: 16, throughput: 590.119 infer/sec, latency 27080 usec
116
+ ```
117
+
118
+ Throughput for 16 clients with batch size 1 is the same as for a single thread running the engine at 16 batch size locally thanks to Triton [Dynamic Batching Strategy](https://github.com/triton-inference-server/server/blob/main/docs/model_configuration.md#dynamic-batcher). Result without dynamic batching (disable in model configuration) considerably worse:
119
+
120
+ ```bash
121
+ # Result (truncated)
122
+ Concurrency: 16, throughput: 335.587 infer/sec, latency 47616 usec
123
+ ```
124
+
125
+ ## How to run model in your code
126
+
127
+ Example client can be found in client.py. It can run dummy input, images and videos.
128
+
129
+ ```bash
130
+ pip3 install tritonclient[all] opencv-python
131
+ python3 client.py image data/dog.jpg
132
+ ```
133
+
134
+ ![exemplary output result](data/dog_result.jpg)
135
+
136
+ ```
137
+ $ python3 client.py --help
138
+ usage: client.py [-h] [-m MODEL] [--width WIDTH] [--height HEIGHT] [-u URL] [-o OUT] [-f FPS] [-i] [-v] [-t CLIENT_TIMEOUT] [-s] [-r ROOT_CERTIFICATES] [-p PRIVATE_KEY] [-x CERTIFICATE_CHAIN] {dummy,image,video} [input]
139
+
140
+ positional arguments:
141
+ {dummy,image,video} Run mode. 'dummy' will send an emtpy buffer to the server to test if inference works. 'image' will process an image. 'video' will process a video.
142
+ input Input file to load from in image or video mode
143
+
144
+ optional arguments:
145
+ -h, --help show this help message and exit
146
+ -m MODEL, --model MODEL
147
+ Inference model name, default yolov7
148
+ --width WIDTH Inference model input width, default 640
149
+ --height HEIGHT Inference model input height, default 640
150
+ -u URL, --url URL Inference server URL, default localhost:8001
151
+ -o OUT, --out OUT Write output into file instead of displaying it
152
+ -f FPS, --fps FPS Video output fps, default 24.0 FPS
153
+ -i, --model-info Print model status, configuration and statistics
154
+ -v, --verbose Enable verbose client output
155
+ -t CLIENT_TIMEOUT, --client-timeout CLIENT_TIMEOUT
156
+ Client timeout in seconds, default no timeout
157
+ -s, --ssl Enable SSL encrypted channel to the server
158
+ -r ROOT_CERTIFICATES, --root-certificates ROOT_CERTIFICATES
159
+ File holding PEM-encoded root certificates, default none
160
+ -p PRIVATE_KEY, --private-key PRIVATE_KEY
161
+ File holding PEM-encoded private key, default is none
162
+ -x CERTIFICATE_CHAIN, --certificate-chain CERTIFICATE_CHAIN
163
+ File holding PEM-encoded certicate chain default is none
164
+ ```
deploy/triton-inference-server/boundingbox.py ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ class BoundingBox:
2
+ def __init__(self, classID, confidence, x1, x2, y1, y2, image_width, image_height):
3
+ self.classID = classID
4
+ self.confidence = confidence
5
+ self.x1 = x1
6
+ self.x2 = x2
7
+ self.y1 = y1
8
+ self.y2 = y2
9
+ self.u1 = x1 / image_width
10
+ self.u2 = x2 / image_width
11
+ self.v1 = y1 / image_height
12
+ self.v2 = y2 / image_height
13
+
14
+ def box(self):
15
+ return (self.x1, self.y1, self.x2, self.y2)
16
+
17
+ def width(self):
18
+ return self.x2 - self.x1
19
+
20
+ def height(self):
21
+ return self.y2 - self.y1
22
+
23
+ def center_absolute(self):
24
+ return (0.5 * (self.x1 + self.x2), 0.5 * (self.y1 + self.y2))
25
+
26
+ def center_normalized(self):
27
+ return (0.5 * (self.u1 + self.u2), 0.5 * (self.v1 + self.v2))
28
+
29
+ def size_absolute(self):
30
+ return (self.x2 - self.x1, self.y2 - self.y1)
31
+
32
+ def size_normalized(self):
33
+ return (self.u2 - self.u1, self.v2 - self.v1)
deploy/triton-inference-server/client.py ADDED
@@ -0,0 +1,334 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python
2
+
3
+ import argparse
4
+ import numpy as np
5
+ import sys
6
+ import cv2
7
+
8
+ import tritonclient.grpc as grpcclient
9
+ from tritonclient.utils import InferenceServerException
10
+
11
+ from processing import preprocess, postprocess
12
+ from render import render_box, render_filled_box, get_text_size, render_text, RAND_COLORS
13
+ from labels import COCOLabels
14
+
15
+ INPUT_NAMES = ["images"]
16
+ OUTPUT_NAMES = ["num_dets", "det_boxes", "det_scores", "det_classes"]
17
+
18
+ if __name__ == '__main__':
19
+ parser = argparse.ArgumentParser()
20
+ parser.add_argument('mode',
21
+ choices=['dummy', 'image', 'video'],
22
+ default='dummy',
23
+ help='Run mode. \'dummy\' will send an emtpy buffer to the server to test if inference works. \'image\' will process an image. \'video\' will process a video.')
24
+ parser.add_argument('input',
25
+ type=str,
26
+ nargs='?',
27
+ help='Input file to load from in image or video mode')
28
+ parser.add_argument('-m',
29
+ '--model',
30
+ type=str,
31
+ required=False,
32
+ default='yolov7',
33
+ help='Inference model name, default yolov7')
34
+ parser.add_argument('--width',
35
+ type=int,
36
+ required=False,
37
+ default=640,
38
+ help='Inference model input width, default 640')
39
+ parser.add_argument('--height',
40
+ type=int,
41
+ required=False,
42
+ default=640,
43
+ help='Inference model input height, default 640')
44
+ parser.add_argument('-u',
45
+ '--url',
46
+ type=str,
47
+ required=False,
48
+ default='localhost:8001',
49
+ help='Inference server URL, default localhost:8001')
50
+ parser.add_argument('-o',
51
+ '--out',
52
+ type=str,
53
+ required=False,
54
+ default='',
55
+ help='Write output into file instead of displaying it')
56
+ parser.add_argument('-f',
57
+ '--fps',
58
+ type=float,
59
+ required=False,
60
+ default=24.0,
61
+ help='Video output fps, default 24.0 FPS')
62
+ parser.add_argument('-i',
63
+ '--model-info',
64
+ action="store_true",
65
+ required=False,
66
+ default=False,
67
+ help='Print model status, configuration and statistics')
68
+ parser.add_argument('-v',
69
+ '--verbose',
70
+ action="store_true",
71
+ required=False,
72
+ default=False,
73
+ help='Enable verbose client output')
74
+ parser.add_argument('-t',
75
+ '--client-timeout',
76
+ type=float,
77
+ required=False,
78
+ default=None,
79
+ help='Client timeout in seconds, default no timeout')
80
+ parser.add_argument('-s',
81
+ '--ssl',
82
+ action="store_true",
83
+ required=False,
84
+ default=False,
85
+ help='Enable SSL encrypted channel to the server')
86
+ parser.add_argument('-r',
87
+ '--root-certificates',
88
+ type=str,
89
+ required=False,
90
+ default=None,
91
+ help='File holding PEM-encoded root certificates, default none')
92
+ parser.add_argument('-p',
93
+ '--private-key',
94
+ type=str,
95
+ required=False,
96
+ default=None,
97
+ help='File holding PEM-encoded private key, default is none')
98
+ parser.add_argument('-x',
99
+ '--certificate-chain',
100
+ type=str,
101
+ required=False,
102
+ default=None,
103
+ help='File holding PEM-encoded certicate chain default is none')
104
+
105
+ FLAGS = parser.parse_args()
106
+
107
+ # Create server context
108
+ try:
109
+ triton_client = grpcclient.InferenceServerClient(
110
+ url=FLAGS.url,
111
+ verbose=FLAGS.verbose,
112
+ ssl=FLAGS.ssl,
113
+ root_certificates=FLAGS.root_certificates,
114
+ private_key=FLAGS.private_key,
115
+ certificate_chain=FLAGS.certificate_chain)
116
+ except Exception as e:
117
+ print("context creation failed: " + str(e))
118
+ sys.exit()
119
+
120
+ # Health check
121
+ if not triton_client.is_server_live():
122
+ print("FAILED : is_server_live")
123
+ sys.exit(1)
124
+
125
+ if not triton_client.is_server_ready():
126
+ print("FAILED : is_server_ready")
127
+ sys.exit(1)
128
+
129
+ if not triton_client.is_model_ready(FLAGS.model):
130
+ print("FAILED : is_model_ready")
131
+ sys.exit(1)
132
+
133
+ if FLAGS.model_info:
134
+ # Model metadata
135
+ try:
136
+ metadata = triton_client.get_model_metadata(FLAGS.model)
137
+ print(metadata)
138
+ except InferenceServerException as ex:
139
+ if "Request for unknown model" not in ex.message():
140
+ print("FAILED : get_model_metadata")
141
+ print("Got: {}".format(ex.message()))
142
+ sys.exit(1)
143
+ else:
144
+ print("FAILED : get_model_metadata")
145
+ sys.exit(1)
146
+
147
+ # Model configuration
148
+ try:
149
+ config = triton_client.get_model_config(FLAGS.model)
150
+ if not (config.config.name == FLAGS.model):
151
+ print("FAILED: get_model_config")
152
+ sys.exit(1)
153
+ print(config)
154
+ except InferenceServerException as ex:
155
+ print("FAILED : get_model_config")
156
+ print("Got: {}".format(ex.message()))
157
+ sys.exit(1)
158
+
159
+ # DUMMY MODE
160
+ if FLAGS.mode == 'dummy':
161
+ print("Running in 'dummy' mode")
162
+ print("Creating emtpy buffer filled with ones...")
163
+ inputs = []
164
+ outputs = []
165
+ inputs.append(grpcclient.InferInput(INPUT_NAMES[0], [1, 3, FLAGS.width, FLAGS.height], "FP32"))
166
+ inputs[0].set_data_from_numpy(np.ones(shape=(1, 3, FLAGS.width, FLAGS.height), dtype=np.float32))
167
+ outputs.append(grpcclient.InferRequestedOutput(OUTPUT_NAMES[0]))
168
+ outputs.append(grpcclient.InferRequestedOutput(OUTPUT_NAMES[1]))
169
+ outputs.append(grpcclient.InferRequestedOutput(OUTPUT_NAMES[2]))
170
+ outputs.append(grpcclient.InferRequestedOutput(OUTPUT_NAMES[3]))
171
+
172
+ print("Invoking inference...")
173
+ results = triton_client.infer(model_name=FLAGS.model,
174
+ inputs=inputs,
175
+ outputs=outputs,
176
+ client_timeout=FLAGS.client_timeout)
177
+ if FLAGS.model_info:
178
+ statistics = triton_client.get_inference_statistics(model_name=FLAGS.model)
179
+ if len(statistics.model_stats) != 1:
180
+ print("FAILED: get_inference_statistics")
181
+ sys.exit(1)
182
+ print(statistics)
183
+ print("Done")
184
+
185
+ for output in OUTPUT_NAMES:
186
+ result = results.as_numpy(output)
187
+ print(f"Received result buffer \"{output}\" of size {result.shape}")
188
+ print(f"Naive buffer sum: {np.sum(result)}")
189
+
190
+ # IMAGE MODE
191
+ if FLAGS.mode == 'image':
192
+ print("Running in 'image' mode")
193
+ if not FLAGS.input:
194
+ print("FAILED: no input image")
195
+ sys.exit(1)
196
+
197
+ inputs = []
198
+ outputs = []
199
+ inputs.append(grpcclient.InferInput(INPUT_NAMES[0], [1, 3, FLAGS.width, FLAGS.height], "FP32"))
200
+ outputs.append(grpcclient.InferRequestedOutput(OUTPUT_NAMES[0]))
201
+ outputs.append(grpcclient.InferRequestedOutput(OUTPUT_NAMES[1]))
202
+ outputs.append(grpcclient.InferRequestedOutput(OUTPUT_NAMES[2]))
203
+ outputs.append(grpcclient.InferRequestedOutput(OUTPUT_NAMES[3]))
204
+
205
+ print("Creating buffer from image file...")
206
+ input_image = cv2.imread(str(FLAGS.input))
207
+ if input_image is None:
208
+ print(f"FAILED: could not load input image {str(FLAGS.input)}")
209
+ sys.exit(1)
210
+ input_image_buffer = preprocess(input_image, [FLAGS.width, FLAGS.height])
211
+ input_image_buffer = np.expand_dims(input_image_buffer, axis=0)
212
+
213
+ inputs[0].set_data_from_numpy(input_image_buffer)
214
+
215
+ print("Invoking inference...")
216
+ results = triton_client.infer(model_name=FLAGS.model,
217
+ inputs=inputs,
218
+ outputs=outputs,
219
+ client_timeout=FLAGS.client_timeout)
220
+ if FLAGS.model_info:
221
+ statistics = triton_client.get_inference_statistics(model_name=FLAGS.model)
222
+ if len(statistics.model_stats) != 1:
223
+ print("FAILED: get_inference_statistics")
224
+ sys.exit(1)
225
+ print(statistics)
226
+ print("Done")
227
+
228
+ for output in OUTPUT_NAMES:
229
+ result = results.as_numpy(output)
230
+ print(f"Received result buffer \"{output}\" of size {result.shape}")
231
+ print(f"Naive buffer sum: {np.sum(result)}")
232
+
233
+ num_dets = results.as_numpy(OUTPUT_NAMES[0])
234
+ det_boxes = results.as_numpy(OUTPUT_NAMES[1])
235
+ det_scores = results.as_numpy(OUTPUT_NAMES[2])
236
+ det_classes = results.as_numpy(OUTPUT_NAMES[3])
237
+ detected_objects = postprocess(num_dets, det_boxes, det_scores, det_classes, input_image.shape[1], input_image.shape[0], [FLAGS.width, FLAGS.height])
238
+ print(f"Detected objects: {len(detected_objects)}")
239
+
240
+ for box in detected_objects:
241
+ print(f"{COCOLabels(box.classID).name}: {box.confidence}")
242
+ input_image = render_box(input_image, box.box(), color=tuple(RAND_COLORS[box.classID % 64].tolist()))
243
+ size = get_text_size(input_image, f"{COCOLabels(box.classID).name}: {box.confidence:.2f}", normalised_scaling=0.6)
244
+ input_image = render_filled_box(input_image, (box.x1 - 3, box.y1 - 3, box.x1 + size[0], box.y1 + size[1]), color=(220, 220, 220))
245
+ input_image = render_text(input_image, f"{COCOLabels(box.classID).name}: {box.confidence:.2f}", (box.x1, box.y1), color=(30, 30, 30), normalised_scaling=0.5)
246
+
247
+ if FLAGS.out:
248
+ cv2.imwrite(FLAGS.out, input_image)
249
+ print(f"Saved result to {FLAGS.out}")
250
+ else:
251
+ cv2.imshow('image', input_image)
252
+ cv2.waitKey(0)
253
+ cv2.destroyAllWindows()
254
+
255
+ # VIDEO MODE
256
+ if FLAGS.mode == 'video':
257
+ print("Running in 'video' mode")
258
+ if not FLAGS.input:
259
+ print("FAILED: no input video")
260
+ sys.exit(1)
261
+
262
+ inputs = []
263
+ outputs = []
264
+ inputs.append(grpcclient.InferInput(INPUT_NAMES[0], [1, 3, FLAGS.width, FLAGS.height], "FP32"))
265
+ outputs.append(grpcclient.InferRequestedOutput(OUTPUT_NAMES[0]))
266
+ outputs.append(grpcclient.InferRequestedOutput(OUTPUT_NAMES[1]))
267
+ outputs.append(grpcclient.InferRequestedOutput(OUTPUT_NAMES[2]))
268
+ outputs.append(grpcclient.InferRequestedOutput(OUTPUT_NAMES[3]))
269
+
270
+ print("Opening input video stream...")
271
+ cap = cv2.VideoCapture(FLAGS.input)
272
+ if not cap.isOpened():
273
+ print(f"FAILED: cannot open video {FLAGS.input}")
274
+ sys.exit(1)
275
+
276
+ counter = 0
277
+ out = None
278
+ print("Invoking inference...")
279
+ while True:
280
+ ret, frame = cap.read()
281
+ if not ret:
282
+ print("failed to fetch next frame")
283
+ break
284
+
285
+ if counter == 0 and FLAGS.out:
286
+ print("Opening output video stream...")
287
+ fourcc = cv2.VideoWriter_fourcc('M', 'P', '4', 'V')
288
+ out = cv2.VideoWriter(FLAGS.out, fourcc, FLAGS.fps, (frame.shape[1], frame.shape[0]))
289
+
290
+ input_image_buffer = preprocess(frame, [FLAGS.width, FLAGS.height])
291
+ input_image_buffer = np.expand_dims(input_image_buffer, axis=0)
292
+
293
+ inputs[0].set_data_from_numpy(input_image_buffer)
294
+
295
+ results = triton_client.infer(model_name=FLAGS.model,
296
+ inputs=inputs,
297
+ outputs=outputs,
298
+ client_timeout=FLAGS.client_timeout)
299
+
300
+ num_dets = results.as_numpy("num_dets")
301
+ det_boxes = results.as_numpy("det_boxes")
302
+ det_scores = results.as_numpy("det_scores")
303
+ det_classes = results.as_numpy("det_classes")
304
+ detected_objects = postprocess(num_dets, det_boxes, det_scores, det_classes, frame.shape[1], frame.shape[0], [FLAGS.width, FLAGS.height])
305
+ print(f"Frame {counter}: {len(detected_objects)} objects")
306
+ counter += 1
307
+
308
+ for box in detected_objects:
309
+ print(f"{COCOLabels(box.classID).name}: {box.confidence}")
310
+ frame = render_box(frame, box.box(), color=tuple(RAND_COLORS[box.classID % 64].tolist()))
311
+ size = get_text_size(frame, f"{COCOLabels(box.classID).name}: {box.confidence:.2f}", normalised_scaling=0.6)
312
+ frame = render_filled_box(frame, (box.x1 - 3, box.y1 - 3, box.x1 + size[0], box.y1 + size[1]), color=(220, 220, 220))
313
+ frame = render_text(frame, f"{COCOLabels(box.classID).name}: {box.confidence:.2f}", (box.x1, box.y1), color=(30, 30, 30), normalised_scaling=0.5)
314
+
315
+ if FLAGS.out:
316
+ out.write(frame)
317
+ else:
318
+ cv2.imshow('image', frame)
319
+ if cv2.waitKey(1) == ord('q'):
320
+ break
321
+
322
+ if FLAGS.model_info:
323
+ statistics = triton_client.get_inference_statistics(model_name=FLAGS.model)
324
+ if len(statistics.model_stats) != 1:
325
+ print("FAILED: get_inference_statistics")
326
+ sys.exit(1)
327
+ print(statistics)
328
+ print("Done")
329
+
330
+ cap.release()
331
+ if FLAGS.out:
332
+ out.release()
333
+ else:
334
+ cv2.destroyAllWindows()
deploy/triton-inference-server/labels.py ADDED
@@ -0,0 +1,83 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from enum import Enum
2
+
3
+ class COCOLabels(Enum):
4
+ PERSON = 0
5
+ BICYCLE = 1
6
+ CAR = 2
7
+ MOTORBIKE = 3
8
+ AEROPLANE = 4
9
+ BUS = 5
10
+ TRAIN = 6
11
+ TRUCK = 7
12
+ BOAT = 8
13
+ TRAFFIC_LIGHT = 9
14
+ FIRE_HYDRANT = 10
15
+ STOP_SIGN = 11
16
+ PARKING_METER = 12
17
+ BENCH = 13
18
+ BIRD = 14
19
+ CAT = 15
20
+ DOG = 16
21
+ HORSE = 17
22
+ SHEEP = 18
23
+ COW = 19
24
+ ELEPHANT = 20
25
+ BEAR = 21
26
+ ZEBRA = 22
27
+ GIRAFFE = 23
28
+ BACKPACK = 24
29
+ UMBRELLA = 25
30
+ HANDBAG = 26
31
+ TIE = 27
32
+ SUITCASE = 28
33
+ FRISBEE = 29
34
+ SKIS = 30
35
+ SNOWBOARD = 31
36
+ SPORTS_BALL = 32
37
+ KITE = 33
38
+ BASEBALL_BAT = 34
39
+ BASEBALL_GLOVE = 35
40
+ SKATEBOARD = 36
41
+ SURFBOARD = 37
42
+ TENNIS_RACKET = 38
43
+ BOTTLE = 39
44
+ WINE_GLASS = 40
45
+ CUP = 41
46
+ FORK = 42
47
+ KNIFE = 43
48
+ SPOON = 44
49
+ BOWL = 45
50
+ BANANA = 46
51
+ APPLE = 47
52
+ SANDWICH = 48
53
+ ORANGE = 49
54
+ BROCCOLI = 50
55
+ CARROT = 51
56
+ HOT_DOG = 52
57
+ PIZZA = 53
58
+ DONUT = 54
59
+ CAKE = 55
60
+ CHAIR = 56
61
+ SOFA = 57
62
+ POTTEDPLANT = 58
63
+ BED = 59
64
+ DININGTABLE = 60
65
+ TOILET = 61
66
+ TVMONITOR = 62
67
+ LAPTOP = 63
68
+ MOUSE = 64
69
+ REMOTE = 65
70
+ KEYBOARD = 66
71
+ CELL_PHONE = 67
72
+ MICROWAVE = 68
73
+ OVEN = 69
74
+ TOASTER = 70
75
+ SINK = 71
76
+ REFRIGERATOR = 72
77
+ BOOK = 73
78
+ CLOCK = 74
79
+ VASE = 75
80
+ SCISSORS = 76
81
+ TEDDY_BEAR = 77
82
+ HAIR_DRIER = 78
83
+ TOOTHBRUSH = 79
deploy/triton-inference-server/processing.py ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from boundingbox import BoundingBox
2
+
3
+ import cv2
4
+ import numpy as np
5
+
6
+ def preprocess(img, input_shape, letter_box=True):
7
+ if letter_box:
8
+ img_h, img_w, _ = img.shape
9
+ new_h, new_w = input_shape[0], input_shape[1]
10
+ offset_h, offset_w = 0, 0
11
+ if (new_w / img_w) <= (new_h / img_h):
12
+ new_h = int(img_h * new_w / img_w)
13
+ offset_h = (input_shape[0] - new_h) // 2
14
+ else:
15
+ new_w = int(img_w * new_h / img_h)
16
+ offset_w = (input_shape[1] - new_w) // 2
17
+ resized = cv2.resize(img, (new_w, new_h))
18
+ img = np.full((input_shape[0], input_shape[1], 3), 127, dtype=np.uint8)
19
+ img[offset_h:(offset_h + new_h), offset_w:(offset_w + new_w), :] = resized
20
+ else:
21
+ img = cv2.resize(img, (input_shape[1], input_shape[0]))
22
+
23
+ img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
24
+ img = img.transpose((2, 0, 1)).astype(np.float32)
25
+ img /= 255.0
26
+ return img
27
+
28
+ def postprocess(num_dets, det_boxes, det_scores, det_classes, img_w, img_h, input_shape, letter_box=True):
29
+ boxes = det_boxes[0, :num_dets[0][0]] / np.array([input_shape[0], input_shape[1], input_shape[0], input_shape[1]], dtype=np.float32)
30
+ scores = det_scores[0, :num_dets[0][0]]
31
+ classes = det_classes[0, :num_dets[0][0]].astype(np.int)
32
+
33
+ old_h, old_w = img_h, img_w
34
+ offset_h, offset_w = 0, 0
35
+ if letter_box:
36
+ if (img_w / input_shape[1]) >= (img_h / input_shape[0]):
37
+ old_h = int(input_shape[0] * img_w / input_shape[1])
38
+ offset_h = (old_h - img_h) // 2
39
+ else:
40
+ old_w = int(input_shape[1] * img_h / input_shape[0])
41
+ offset_w = (old_w - img_w) // 2
42
+
43
+ boxes = boxes * np.array([old_w, old_h, old_w, old_h], dtype=np.float32)
44
+ if letter_box:
45
+ boxes -= np.array([offset_w, offset_h, offset_w, offset_h], dtype=np.float32)
46
+ boxes = boxes.astype(np.int)
47
+
48
+ detected_objects = []
49
+ for box, score, label in zip(boxes, scores, classes):
50
+ detected_objects.append(BoundingBox(label, score, box[0], box[2], box[1], box[3], img_w, img_h))
51
+ return detected_objects
deploy/triton-inference-server/render.py ADDED
@@ -0,0 +1,110 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import numpy as np
2
+
3
+ import cv2
4
+
5
+ from math import sqrt
6
+
7
+ _LINE_THICKNESS_SCALING = 500.0
8
+
9
+ np.random.seed(0)
10
+ RAND_COLORS = np.random.randint(50, 255, (64, 3), "int") # used for class visu
11
+ RAND_COLORS[0] = [220, 220, 220]
12
+
13
+ def render_box(img, box, color=(200, 200, 200)):
14
+ """
15
+ Render a box. Calculates scaling and thickness automatically.
16
+ :param img: image to render into
17
+ :param box: (x1, y1, x2, y2) - box coordinates
18
+ :param color: (b, g, r) - box color
19
+ :return: updated image
20
+ """
21
+ x1, y1, x2, y2 = box
22
+ thickness = int(
23
+ round(
24
+ (img.shape[0] * img.shape[1])
25
+ / (_LINE_THICKNESS_SCALING * _LINE_THICKNESS_SCALING)
26
+ )
27
+ )
28
+ thickness = max(1, thickness)
29
+ img = cv2.rectangle(
30
+ img,
31
+ (int(x1), int(y1)),
32
+ (int(x2), int(y2)),
33
+ color,
34
+ thickness=thickness
35
+ )
36
+ return img
37
+
38
+ def render_filled_box(img, box, color=(200, 200, 200)):
39
+ """
40
+ Render a box. Calculates scaling and thickness automatically.
41
+ :param img: image to render into
42
+ :param box: (x1, y1, x2, y2) - box coordinates
43
+ :param color: (b, g, r) - box color
44
+ :return: updated image
45
+ """
46
+ x1, y1, x2, y2 = box
47
+ img = cv2.rectangle(
48
+ img,
49
+ (int(x1), int(y1)),
50
+ (int(x2), int(y2)),
51
+ color,
52
+ thickness=cv2.FILLED
53
+ )
54
+ return img
55
+
56
+ _TEXT_THICKNESS_SCALING = 700.0
57
+ _TEXT_SCALING = 520.0
58
+
59
+
60
+ def get_text_size(img, text, normalised_scaling=1.0):
61
+ """
62
+ Get calculated text size (as box width and height)
63
+ :param img: image reference, used to determine appropriate text scaling
64
+ :param text: text to display
65
+ :param normalised_scaling: additional normalised scaling. Default 1.0.
66
+ :return: (width, height) - width and height of text box
67
+ """
68
+ thickness = int(
69
+ round(
70
+ (img.shape[0] * img.shape[1])
71
+ / (_TEXT_THICKNESS_SCALING * _TEXT_THICKNESS_SCALING)
72
+ )
73
+ * normalised_scaling
74
+ )
75
+ thickness = max(1, thickness)
76
+ scaling = img.shape[0] / _TEXT_SCALING * normalised_scaling
77
+ return cv2.getTextSize(text, cv2.FONT_HERSHEY_SIMPLEX, scaling, thickness)[0]
78
+
79
+
80
+ def render_text(img, text, pos, color=(200, 200, 200), normalised_scaling=1.0):
81
+ """
82
+ Render a text into the image. Calculates scaling and thickness automatically.
83
+ :param img: image to render into
84
+ :param text: text to display
85
+ :param pos: (x, y) - upper left coordinates of render position
86
+ :param color: (b, g, r) - text color
87
+ :param normalised_scaling: additional normalised scaling. Default 1.0.
88
+ :return: updated image
89
+ """
90
+ x, y = pos
91
+ thickness = int(
92
+ round(
93
+ (img.shape[0] * img.shape[1])
94
+ / (_TEXT_THICKNESS_SCALING * _TEXT_THICKNESS_SCALING)
95
+ )
96
+ * normalised_scaling
97
+ )
98
+ thickness = max(1, thickness)
99
+ scaling = img.shape[0] / _TEXT_SCALING * normalised_scaling
100
+ size = get_text_size(img, text, normalised_scaling)
101
+ cv2.putText(
102
+ img,
103
+ text,
104
+ (int(x), int(y + size[1])),
105
+ cv2.FONT_HERSHEY_SIMPLEX,
106
+ scaling,
107
+ color,
108
+ thickness=thickness,
109
+ )
110
+ return img