bofenghuang
/

vigostral-7b-chat

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

bofenghuang commited on Oct 20, 2023

Commit

7397478

•

1 Parent(s): ca0c8d6

Update README.md

Files changed (1) hide show

README.md +42 -2

README.md CHANGED Viewed

@@ -93,8 +93,7 @@ def chat(
             top_k=top_k,
             repetition_penalty=repetition_penalty,
             max_new_tokens=max_new_tokens,
-            eos_token_id=tokenizer.eos_token_id,
-            pad_token_id=tokenizer.pad_token_id,
             **kwargs,
         ),
         streamer=streamer,
@@ -144,6 +143,47 @@ You can also use the Google Colab Notebook provided below.
 <a href="https://colab.research.google.com/github/bofenghuang/vigogne/blob/main/notebooks/infer_chat.ipynb" target="_blank"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>
 ## Limitations
 Vigogne is still under development, and there are many limitations that have to be addressed. Please note that it is possible that the model generates harmful or biased content, incorrect information or generally unhelpful answers.

             top_k=top_k,
             repetition_penalty=repetition_penalty,
             max_new_tokens=max_new_tokens,
+            pad_token_id=tokenizer.eos_token_id,
             **kwargs,
         ),
         streamer=streamer,
 <a href="https://colab.research.google.com/github/bofenghuang/vigogne/blob/main/notebooks/infer_chat.ipynb" target="_blank"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>
+### Inference using the unquantized model with vLLM
+Set up an OpenAI-compatible server with the following command:
+```bash
+# Install vLLM
+# This may take 5-10 minutes.
+# pip install vllm
+# Start server for Vigostral-Chat models
+python -m vllm.entrypoints.openai.api_server --model bofenghuang/vigostral-7b-chat
+# List models
+# curl http://localhost:8000/v1/models
+```
+Query the model using the openai python package.
+```python
+import openai
+# Modify OpenAI's API key and API base to use vLLM's API server.
+openai.api_key = "EMPTY"
+openai.api_base = "http://localhost:8000/v1"
+# First model
+models = openai.Model.list()
+model = models["data"][0]["id"]
+# Chat completion API
+chat_completion = openai.ChatCompletion.create(
+    model=model,
+    messages=[
+        {"role": "user", "content": "Parle-moi de toi-même."},
+    ],
+    max_tokens=1024,
+    temperature=0.7,
+)
+print("Chat completion results:", chat_completion)
+```
 ## Limitations
 Vigogne is still under development, and there are many limitations that have to be addressed. Please note that it is possible that the model generates harmful or biased content, incorrect information or generally unhelpful answers.