Update README.md
Browse files
README.md
CHANGED
@@ -63,7 +63,7 @@ H100, A100 80GB, A100 40GB
|
|
63 |
|
64 |
## Steps to run inference:
|
65 |
|
66 |
-
We demonstrate inference using NVIDIA NeMo Framework, which allows
|
67 |
|
68 |
Pre-requisite: you would need at least a machine with 4 40GB or 2 80GB NVIDIA GPUs, and 300GB of free disk space.
|
69 |
|
|
|
63 |
|
64 |
## Steps to run inference:
|
65 |
|
66 |
+
We demonstrate inference using NVIDIA NeMo Framework, which allows hassle-free model deployment based on [NVIDIA TRT-LLM](https://github.com/NVIDIA/TensorRT-LLM), a highly optimized inference solution focussing on high throughput and low latency.
|
67 |
|
68 |
Pre-requisite: you would need at least a machine with 4 40GB or 2 80GB NVIDIA GPUs, and 300GB of free disk space.
|
69 |
|