Qwen
/

Qwen2-72B-Instruct-AWQ

Text Generation

text-generation-inference

Inference Endpoints

4-bit precision

Model card Files Files and versions Community

feihu.hf commited on Jun 6

Commit

d30e8dd

•

1 Parent(s): 5baac52

update README.md

Files changed (1) hide show

README.md +7 -1

README.md CHANGED Viewed

@@ -79,7 +79,13 @@ To handle extensive inputs exceeding 32,768 tokens, we utilize [YARN](https://ar
 For deployment, we recommend using vLLM. You can enable the long-context capabilities by following these steps:
-1. **Install vLLM**: Ensure you have the latest version from the main branch of [vLLM](https://github.com/vllm-project/vllm).
 2. **Configure Model Settings**: After downloading the model weights, modify the `config.json` file by including the below snippet:
     ```json

 For deployment, we recommend using vLLM. You can enable the long-context capabilities by following these steps:
+1. **Install vLLM**: You can install vLLM by running the following command.
+```bash
+pip install "vllm>=0.4.3"
+```
+Or you can install vLLM from [source](https://github.com/vllm-project/vllm/).
 2. **Configure Model Settings**: After downloading the model weights, modify the `config.json` file by including the below snippet:
     ```json