Update README.md
Browse files
README.md
CHANGED
|
@@ -284,16 +284,16 @@ print(f"Peak Memory Usage: {mem:.02f} GB")
|
|
| 284 |
Note the result of latency (benchmark_latency) is in seconds, and serving (benchmark_serving) is in number of requests per second.
|
| 285 |
|
| 286 |
## Setup
|
| 287 |
-
Need to install vllm nightly to get some recent changes
|
| 288 |
-
```Shell
|
| 289 |
-
pip install vllm --pre --extra-index-url https://wheels.vllm.ai/nightly
|
| 290 |
-
```
|
| 291 |
-
|
| 292 |
Get vllm source code:
|
| 293 |
```Shell
|
| 294 |
git clone git@github.com:vllm-project/vllm.git
|
| 295 |
```
|
| 296 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 297 |
Run the benchmarks under `vllm` root folder:
|
| 298 |
|
| 299 |
## benchmark_latency
|
|
|
|
| 284 |
Note the result of latency (benchmark_latency) is in seconds, and serving (benchmark_serving) is in number of requests per second.
|
| 285 |
|
| 286 |
## Setup
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 287 |
Get vllm source code:
|
| 288 |
```Shell
|
| 289 |
git clone git@github.com:vllm-project/vllm.git
|
| 290 |
```
|
| 291 |
|
| 292 |
+
Install vllm
|
| 293 |
+
```
|
| 294 |
+
VLLM_USE_PRECOMPILED=1 pip install --editable .
|
| 295 |
+
```
|
| 296 |
+
|
| 297 |
Run the benchmarks under `vllm` root folder:
|
| 298 |
|
| 299 |
## benchmark_latency
|