RoundtTble commited on
Commit
bdd3916
β€’
1 Parent(s): 89d26a5
Files changed (1) hide show
  1. README.md +42 -13
README.md CHANGED
@@ -1,3 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
 
2
 
3
  ## Perf
@@ -7,7 +37,6 @@ make perf
7
  ```
8
 
9
  ```
10
- make perf
11
  docker run --gpus all --rm -it --net host nvcr.io/nvidia/tritonserver:23.04-py3-sdk perf_analyzer -m dinov2_vitl14 --percentile=95 -i grpc -u 0.0.0.0:6001 --concurrency-range 16:16 --shape input:3,560,560
12
 
13
  =================================
@@ -40,19 +69,19 @@ NOTE: CUDA Forward Compatibility mode ENABLED.
40
 
41
  Request concurrency: 16
42
  Client:
43
- Request count: 1124
44
- Throughput: 62.4339 infer/sec
45
- p50 latency: 257390 usec
46
- p90 latency: 287307 usec
47
- p95 latency: 295432 usec
48
- p99 latency: 305031 usec
49
- Avg gRPC time: 254273 usec ((un)marshal request/response 801 usec + response wait 253472 usec)
50
  Server:
51
- Inference count: 1124
52
- Execution count: 202
53
- Successful request count: 1124
54
- Avg request latency: 248791 usec (overhead 9381 usec + queue 68460 usec + compute input 39 usec + compute infer 94051 usec + compute output 76859 usec)
55
 
56
  Inferences/Second vs. Client p95 Batch Latency
57
- Concurrency: 16, throughput: 62.4339 infer/sec, latency 295432 usec
58
  ```
 
1
+ # dinov2_vitl14_trt_a4000_fp16
2
+
3
+
4
+ ## Triton
5
+
6
+ ```
7
+ make triton
8
+ ```
9
+
10
+ ## Build TensorRT Model
11
+
12
+ ```
13
+ make model
14
+ ```
15
+
16
+
17
+ ```
18
+ make trt
19
+ ```
20
+
21
+ ```
22
+ tree model_repository
23
+ ```
24
+ ```
25
+ model_repository/
26
+ └── dinov2_vitl14
27
+ β”œβ”€β”€ 1
28
+ β”‚Β Β  └── model.plan
29
+ └── config.pbtxt
30
+ ```
31
 
32
 
33
  ## Perf
 
37
  ```
38
 
39
  ```
 
40
  docker run --gpus all --rm -it --net host nvcr.io/nvidia/tritonserver:23.04-py3-sdk perf_analyzer -m dinov2_vitl14 --percentile=95 -i grpc -u 0.0.0.0:6001 --concurrency-range 16:16 --shape input:3,560,560
41
 
42
  =================================
 
69
 
70
  Request concurrency: 16
71
  Client:
72
+ Request count: 4009
73
+ Throughput: 222.66 infer/sec
74
+ p50 latency: 70762 usec
75
+ p90 latency: 83940 usec
76
+ p95 latency: 90235 usec
77
+ p99 latency: 102226 usec
78
+ Avg gRPC time: 71655 usec ((un)marshal request/response 741 usec + response wait 70914 usec)
79
  Server:
80
+ Inference count: 4009
81
+ Execution count: 728
82
+ Successful request count: 4009
83
+ Avg request latency: 66080 usec (overhead 8949 usec + queue 16114 usec + compute input 1163 usec + compute infer 24751 usec + compute output 15103 usec)
84
 
85
  Inferences/Second vs. Client p95 Batch Latency
86
+ Concurrency: 16, throughput: 222.66 infer/sec, latency 90235 usec
87
  ```