davidxmle commited on
Commit
67deaf4
1 Parent(s): 42ee933

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -82,7 +82,7 @@ Tested serving this model via vLLM using an Nvidia T4 (16GB VRAM).
82
 
83
  Tested with the below command
84
  ```
85
- python -m vllm.entrypoints.openai.api_server --model Llama-3-8B-Instruct-GPTQ-8-Bit --port 8123 --max-model-len 8192 --dtype float16
86
  ```
87
  For the non-stop token generation bug, make sure to send requests with `stop_token_ids":[128001, 128009]` to vLLM endpoint
88
  Example:
 
82
 
83
  Tested with the below command
84
  ```
85
+ python -m vllm.entrypoints.openai.api_server --model astronomer-io/Llama-3-8B-Instruct-GPTQ-8-Bit --max-model-len 8192 --dtype float16
86
  ```
87
  For the non-stop token generation bug, make sure to send requests with `stop_token_ids":[128001, 128009]` to vLLM endpoint
88
  Example: