|
--- |
|
license: apache-2.0 |
|
tags: |
|
- QAnything |
|
- RAG |
|
--- |
|
A bilingual instruction-tuned model of Qwen-7B(https://huggingface.co/Qwen/Qwen-7B) for QAnything(https://github.com/netease-youdao/QAnything). |
|
|
|
1. Run Qwen-7B-QAnything using FastChat API with Huggingface transformers runtime backend |
|
|
|
```bash |
|
## Step 1. Prepare the QAnything project and download local Embedding/Rerank models. |
|
|
|
git clone https://github.com/netease-youdao/QAnything.git |
|
cd /path/to/QAnything && mkdir -p tmp && cd tmp |
|
git lfs install |
|
git clone https://huggingface.co/netease-youdao/QAnything |
|
unzip QAnything/models.zip |
|
cd - && mv tmp/models . |
|
|
|
## Step 2. Download the public LLM model (e.g., Qwen-7B-QAnything) and save to "/path/to/QAnything/assets/custom_models" |
|
cd /path/to/QAnything/assets/custom_models |
|
git clone https://huggingface.co/netease-youdao/Qwen-7B-QAnything |
|
|
|
## Step 3. Execute the service startup command. Here we use "-b hf" to specify the Huggingface transformers backend. |
|
## Here we use "-b hf" to specify the transformers backend that will load model in 8 bits but do bf16 inference as default for saving VRAM. |
|
cd /path/to/QAnything |
|
bash ./run.sh -c local -i 0 -b hf -m Qwen-7B-QAnything -t qwen-7b-qanything |
|
``` |
|
|
|
2. Run Qwen-7B-QAnything using FastChat API with vllm runtime backend |
|
|
|
```bash |
|
|
|
## Step 1. Prepare the QAnything project and download local Embedding/Rerank models. |
|
|
|
git clone https://github.com/netease-youdao/QAnything.git |
|
cd /path/to/QAnything && mkdir -p tmp && cd tmp |
|
git lfs install |
|
git clone https://huggingface.co/netease-youdao/QAnything |
|
unzip QAnything/models.zip |
|
cd - && mv tmp/models . |
|
|
|
## Step 2. Download the public LLM model (e.g., Qwen-7B-QAnything) and save to "/path/to/QAnything/assets/custom_models" |
|
cd /path/to/QAnything/assets/custom_models |
|
git clone https://huggingface.co/netease-youdao/Qwen-7B-QAnything |
|
|
|
## Step 3. Execute the service startup command. Here we use "-b vllm" to specify the Huggingface transformers backend. |
|
## Here we use "-b vllm" to specify the vllm backend that will do bf16 inference as default. |
|
## Note you should adjust the gpu_memory_utilization yourself according to the model size to avoid out of memory (e.g., gpu_memory_utilization=0.81 is set default for 7B. Here, gpu_memory_utilization is set to 0.85 by "-r 0.85"). |
|
cd /path/to/QAnything |
|
bash ./run.sh -c local -i 0 -b vllm -m Qwen-7B-QAnything -t qwen-7b-qanything -p 1 -r 0.85 |
|
|
|
``` |
|
|
|
--- |
|
license: apache-2.0 |
|
|
|
License Agreement |
|
This project is open source under the Tongyi Qianwen Research License Agreement. You can view the complete license agreement in this link: [https://github.com/QwenLM/Qwen/blob/main/Tongyi%20Qianwen%20RESEARCH%20LICENSE%20AGREEMENT]. |
|
|
|
During the use of this project, please ensure that your usage behavior complies with the terms and conditions of the license agreement. |
|
--- |