RoundtTble
/

dinov2_vitl14_onnx

ONNX

Model card Files Files and versions Community

RoundtTble commited on Jul 15, 2023

Commit

c360541

•

1 Parent(s): 355e44f

Add README

Browse files

Files changed (1) hide show

README.md +138 -0

README.md ADDED Viewed

	@@ -0,0 +1,138 @@

+# dinov2_vitl14_onnx
+## Run Triton
+```
+make triton
+```
+```
+=============================
+== Triton Inference Server ==
+=============================
+NVIDIA Release 23.04 (build 58408265)
+Triton Server Version 2.33.0
+Copyright (c) 2018-2023, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.
+Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES.  All rights reserved.
+This container image and its contents are governed by the NVIDIA Deep Learning Container License.
+By pulling and using the container, you accept the terms and conditions of this license:
+https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
+NOTE: CUDA Forward Compatibility mode ENABLED.
+  Using CUDA 12.1 driver version 530.30.02 with kernel driver version 525.125.06.
+  See https://docs.nvidia.com/deploy/cuda-compatibility/ for details.
+I0715 04:13:59.173070 1 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7f1a70000000' with size 268435456
+I0715 04:13:59.173293 1 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
+I0715 04:13:59.175108 1 model_lifecycle.cc:459] loading: dinov2_vitl14:1
+I0715 04:13:59.177471 1 onnxruntime.cc:2504] TRITONBACKEND_Initialize: onnxruntime
+I0715 04:13:59.177510 1 onnxruntime.cc:2514] Triton TRITONBACKEND API version: 1.12
+I0715 04:13:59.177518 1 onnxruntime.cc:2520] 'onnxruntime' TRITONBACKEND API version: 1.12
+I0715 04:13:59.177525 1 onnxruntime.cc:2550] backend configuration:
+{"cmdline":{"auto-complete-config":"true","backend-directory":"/opt/tritonserver/backends","min-compute-capability":"6.000000","default-max-batch-size":"4"}}
+I0715 04:13:59.233419 1 onnxruntime.cc:2608] TRITONBACKEND_ModelInitialize: dinov2_vitl14 (version 1)
+I0715 04:13:59.233847 1 onnxruntime.cc:666] skipping model configuration auto-complete for 'dinov2_vitl14': inputs and outputs already specified
+I0715 04:13:59.234233 1 onnxruntime.cc:2651] TRITONBACKEND_ModelInstanceInitialize: dinov2_vitl14_0 (GPU device 0)
+2023-07-15 04:13:59.546824126 [W:onnxruntime:, session_state.cc:1136 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
+2023-07-15 04:13:59.546847104 [W:onnxruntime:, session_state.cc:1138 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
+I0715 04:14:00.851748 1 model_lifecycle.cc:694] successfully loaded 'dinov2_vitl14' version 1
+I0715 04:14:00.851859 1 server.cc:583]
++------------------+------+
+| Repository Agent | Path |
++------------------+------+
++------------------+------+
+I0715 04:14:00.851944 1 server.cc:610]
++-------------+-----------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
+| Backend     | Path                                                            | Config                                                                                                                                                        |
++-------------+-----------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
+| onnxruntime | /opt/tritonserver/backends/onnxruntime/libtriton_onnxruntime.so | {"cmdline":{"auto-complete-config":"true","backend-directory":"/opt/tritonserver/backends","min-compute-capability":"6.000000","default-max-batch-size":"4"}} |
++-------------+-----------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
+I0715 04:14:00.852005 1 server.cc:653]
++---------------+---------+--------+
+| Model         | Version | Status |
++---------------+---------+--------+
+| dinov2_vitl14 | 1       | READY  |
++---------------+---------+--------+
+I0715 04:14:00.872645 1 metrics.cc:808] Collecting metrics for GPU 0: NVIDIA RTX A4000
+I0715 04:14:00.873026 1 metrics.cc:701] Collecting CPU metrics
+I0715 04:14:00.873315 1 tritonserver.cc:2387]
++----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
+| Option                           | Value                                                                                                                                                                                                           |
++----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
+| server_id                        | triton                                                                                                                                                                                                          |
+| server_version                   | 2.33.0                                                                                                                                                                                                          |
+| server_extensions                | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data parameters statistics trace logging |
+| model_repository_path[0]         | /models                                                                                                                                                                                                         |
+| model_control_mode               | MODE_NONE                                                                                                                                                                                                       |
+| strict_model_config              | 0                                                                                                                                                                                                               |
+| rate_limit                       | OFF                                                                                                                                                                                                             |
+| pinned_memory_pool_byte_size     | 268435456                                                                                                                                                                                                       |
+| cuda_memory_pool_byte_size{0}    | 67108864                                                                                                                                                                                                        |
+| min_supported_compute_capability | 6.0                                                                                                                                                                                                             |
+| strict_readiness                 | 1                                                                                                                                                                                                               |
+| exit_timeout                     | 30                                                                                                                                                                                                              |
+| cache_enabled                    | 0                                                                                                                                                                                                               |
++----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
+I0715 04:14:00.875498 1 grpc_server.cc:2450] Started GRPCInferenceService at 0.0.0.0:8001
+I0715 04:14:00.875964 1 http_server.cc:3555] Started HTTPService at 0.0.0.0:8000
+I0715 04:14:00.917871 1 http_server.cc:185] Started Metrics Service at 0.0.0.0:8002
+```
+## Perf Analyzer
+```
+docker run --gpus all --rm -it --net host nvcr.io/nvidia/tritonserver:23.04-py3-sdk perf_analyzer -m dinov2_vitl14 --percentile=95 -i grpc -u 0.0.0.0:8001 --concurrency-range 16:16 --shape input:3,560,560
+=================================
+== Triton Inference Server SDK ==
+=================================
+NVIDIA Release 23.04 (build 58408269)
+Copyright (c) 2018-2023, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.
+Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES.  All rights reserved.
+This container image and its contents are governed by the NVIDIA Deep Learning Container License.
+By pulling and using the container, you accept the terms and conditions of this license:
+https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
+NOTE: CUDA Forward Compatibility mode ENABLED.
+  Using CUDA 12.1 driver version 530.30.02 with kernel driver version 525.125.06.
+  See https://docs.nvidia.com/deploy/cuda-compatibility/ for details.
+*** Measurement Settings ***
+  Batch size: 1
+  Service Kind: Triton
+  Using "time_windows" mode for stabilization
+  Measurement window: 5000 msec
+  Latency limit: 0 msec
+  Concurrency limit: 16 concurrent requests
+  Using synchronous calls for inference
+  Stabilizing using p95 latency
+Request concurrency: 16
+  Client:
+    Request count: 881
+    Throughput: 48.927 infer/sec
+    p50 latency: 324015 usec
+    p90 latency: 330275 usec
+    p95 latency: 331952 usec
+    p99 latency: 336638 usec
+    Avg gRPC time: 323066 usec ((un)marshal request/response 953 usec + response wait 322113 usec)
+  Server:
+    Inference count: 881
+    Execution count: 111
+    Successful request count: 881
+    Avg request latency: 313673 usec (overhead 7065 usec + queue 151785 usec + compute input 7582 usec + compute infer 143162 usec + compute output 4077 usec)
+Inferences/Second vs. Client p95 Batch Latency
+Concurrency: 16, throughput: 48.927 infer/sec, latency 331952 usec
+```