Update README.md
Browse files
README.md
CHANGED
@@ -82,7 +82,7 @@ Tested serving this model via vLLM using an Nvidia T4 (16GB VRAM).
|
|
82 |
|
83 |
Tested with the below command
|
84 |
```
|
85 |
-
python -m vllm.entrypoints.openai.api_server --model Llama-3-8B-Instruct-GPTQ-8-Bit --
|
86 |
```
|
87 |
For the non-stop token generation bug, make sure to send requests with `stop_token_ids":[128001, 128009]` to vLLM endpoint
|
88 |
Example:
|
|
|
82 |
|
83 |
Tested with the below command
|
84 |
```
|
85 |
+
python -m vllm.entrypoints.openai.api_server --model astronomer-io/Llama-3-8B-Instruct-GPTQ-8-Bit --max-model-len 8192 --dtype float16
|
86 |
```
|
87 |
For the non-stop token generation bug, make sure to send requests with `stop_token_ids":[128001, 128009]` to vLLM endpoint
|
88 |
Example:
|