astronomer
/

Llama-3-8B-Instruct-GPTQ-8-Bit

Text Generation

Inference Endpoints

text-generation-inference

8-bit precision

Model card Files Files and versions Community

davidxmle commited on Apr 19

Commit

e467c88

•

1 Parent(s): 816fdf2

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -90,9 +90,9 @@ python -m vllm.entrypoints.openai.api_server --model astronomer-io/Llama-3-8B-In
 ```
 For the non-stop token generation bug, make sure to send requests with `stop_token_ids":[128001, 128009]` to vLLM endpoint
 Example:
-```
 {
-    "model": "Llama-3-8B-Instruct-GPTQ-8-Bit",
     "messages": [
         {"role": "system", "content": "You are a helpful assistant."},
         {"role": "user", "content": "Who created Llama 3?"}

 ```
 For the non-stop token generation bug, make sure to send requests with `stop_token_ids":[128001, 128009]` to vLLM endpoint
 Example:
+```json
 {
+    "model": "astronomer-io/Llama-3-8B-Instruct-GPTQ-8-Bit",
     "messages": [
         {"role": "system", "content": "You are a helpful assistant."},
         {"role": "user", "content": "Who created Llama 3?"}