nvidia
/

Llama2-70B-SteerLM-Chat

Text Generation

Model card Files Files and versions Community

zhilinw commited on Nov 24, 2023

Commit

6c49012

•

1 Parent(s): cd31078

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -63,7 +63,7 @@ H100, A100 80GB, A100 40GB
 ## Steps to run inference:
-We demonstrate inference using NVIDIA NeMo Framework, which allows easy model deployment based on [NVIDIA TRT-LLM](https://github.com/NVIDIA/TensorRT-LLM), a highly optimized inference solution focussing on high throughput and low latency.
 Pre-requisite: you would need at least a machine with 4 40GB or 2 80GB NVIDIA GPUs, and 300GB of free disk space.

 ## Steps to run inference:
+We demonstrate inference using NVIDIA NeMo Framework, which allows hassle-free model deployment based on [NVIDIA TRT-LLM](https://github.com/NVIDIA/TensorRT-LLM), a highly optimized inference solution focussing on high throughput and low latency.
 Pre-requisite: you would need at least a machine with 4 40GB or 2 80GB NVIDIA GPUs, and 300GB of free disk space.