Moses25
/

Mistral-7B-Instruct-V0.4

Text Generation

Inference Endpoints

text-generation-inference

Model card Files Files and versions Community

Moses25 commited on Apr 12

Commit

bac51cb

•

1 Parent(s): 820797a

Update README.md

Files changed (1) hide show

README.md +32 -0

README.md CHANGED Viewed

@@ -64,4 +64,36 @@ python  -m vllm.entrypoints.openai.api_server --model=$model_path \
         --gpu-memory-utilization 0.8 \
         --max-model-len 8192 --chat-template llama2-chat-template.jinja \
         --tensor-parallel-size 1 --served-model-name chatbot
 ```

         --gpu-memory-utilization 0.8 \
         --max-model-len 8192 --chat-template llama2-chat-template.jinja \
         --tensor-parallel-size 1 --served-model-name chatbot
+```
+```
+from openai import OpenAI
+# Set OpenAI's API key and API base to use vLLM's API server.
+openai_api_key = "EMPTY"
+openai_api_base = "http://localhost:7777/v1"
+client = OpenAI(
+    api_key=openai_api_key,
+    base_url=openai_api_base,
+)
+call_args = {
+        'temperature': 0.7,
+        'top_p': 0.9,
+        'top_k': 40,
+        'max_tokens': 2048, # output-len
+        'presence_penalty': 1.0,
+        'frequency_penalty': 0.0,
+        "repetition_penalty":1.0,
+        "stop":["</s>"],
+    }
+chat_response = client.chat.completions.create(
+    model="llama",
+    messages=[
+        {"role": "user", "content": "你好"},
+    ],
+    extra_body=call_args
+)
+print("Chat response:", chat_response)
 ```