|
--- |
|
license: apache-2.0 |
|
--- |
|
|
|
## qwen-chat-7B-ggml |
|
|
|
This repo contains GGML format model files for qwen-chat-7B. |
|
|
|
### Example code |
|
|
|
#### Install packages |
|
```bash |
|
pip install xinference[ggml]>=0.4.3 |
|
pip install qwen-cpp |
|
``` |
|
If you want to run with GPU acceleration, refer to [installation](https://github.com/xorbitsai/inference#installation). |
|
|
|
#### Start a local instance of Xinference |
|
```bash |
|
xinference -p 9997 |
|
``` |
|
|
|
#### Launch and inference |
|
```python |
|
from xinference.client import Client |
|
|
|
client = Client("http://localhost:9997") |
|
model_uid = client.launch_model( |
|
model_name="qwen-chat", |
|
model_format="ggmlv3", |
|
model_size_in_billions=7, |
|
quantization="q4_0", |
|
) |
|
model = client.get_model(model_uid) |
|
|
|
chat_history = [] |
|
prompt = "最大的动物是什么?" |
|
model.chat( |
|
prompt, |
|
chat_history, |
|
generate_config={"max_tokens": 1024} |
|
) |
|
``` |
|
|
|
### More information |
|
|
|
[Xinference](https://github.com/xorbitsai/inference) Replace OpenAI GPT with another LLM in your app |
|
by changing a single line of code. Xinference gives you the freedom to use any LLM you need. |
|
With Xinference, you are empowered to run inference with any open-source language models, |
|
speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop. |
|
|
|
<i><a href="https://join.slack.com/t/xorbitsio/shared_invite/zt-1z3zsm9ep-87yI9YZ_B79HLB2ccTq4WA">👉 Join our Slack community!</a></i> |