bofenghuang commited on
Commit
7397478
1 Parent(s): ca0c8d6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +42 -2
README.md CHANGED
@@ -93,8 +93,7 @@ def chat(
93
  top_k=top_k,
94
  repetition_penalty=repetition_penalty,
95
  max_new_tokens=max_new_tokens,
96
- eos_token_id=tokenizer.eos_token_id,
97
- pad_token_id=tokenizer.pad_token_id,
98
  **kwargs,
99
  ),
100
  streamer=streamer,
@@ -144,6 +143,47 @@ You can also use the Google Colab Notebook provided below.
144
 
145
  <a href="https://colab.research.google.com/github/bofenghuang/vigogne/blob/main/notebooks/infer_chat.ipynb" target="_blank"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>
146
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
147
  ## Limitations
148
 
149
  Vigogne is still under development, and there are many limitations that have to be addressed. Please note that it is possible that the model generates harmful or biased content, incorrect information or generally unhelpful answers.
 
93
  top_k=top_k,
94
  repetition_penalty=repetition_penalty,
95
  max_new_tokens=max_new_tokens,
96
+ pad_token_id=tokenizer.eos_token_id,
 
97
  **kwargs,
98
  ),
99
  streamer=streamer,
 
143
 
144
  <a href="https://colab.research.google.com/github/bofenghuang/vigogne/blob/main/notebooks/infer_chat.ipynb" target="_blank"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>
145
 
146
+ ### Inference using the unquantized model with vLLM
147
+
148
+ Set up an OpenAI-compatible server with the following command:
149
+
150
+ ```bash
151
+ # Install vLLM
152
+ # This may take 5-10 minutes.
153
+ # pip install vllm
154
+
155
+ # Start server for Vigostral-Chat models
156
+ python -m vllm.entrypoints.openai.api_server --model bofenghuang/vigostral-7b-chat
157
+
158
+ # List models
159
+ # curl http://localhost:8000/v1/models
160
+ ```
161
+
162
+ Query the model using the openai python package.
163
+
164
+ ```python
165
+ import openai
166
+
167
+ # Modify OpenAI's API key and API base to use vLLM's API server.
168
+ openai.api_key = "EMPTY"
169
+ openai.api_base = "http://localhost:8000/v1"
170
+
171
+ # First model
172
+ models = openai.Model.list()
173
+ model = models["data"][0]["id"]
174
+
175
+ # Chat completion API
176
+ chat_completion = openai.ChatCompletion.create(
177
+ model=model,
178
+ messages=[
179
+ {"role": "user", "content": "Parle-moi de toi-même."},
180
+ ],
181
+ max_tokens=1024,
182
+ temperature=0.7,
183
+ )
184
+ print("Chat completion results:", chat_completion)
185
+ ```
186
+
187
  ## Limitations
188
 
189
  Vigogne is still under development, and there are many limitations that have to be addressed. Please note that it is possible that the model generates harmful or biased content, incorrect information or generally unhelpful answers.