AI4Bread
/

XiXi_Qwen_base_14b

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

sperfu commited on Jul 1

Commit

ba90ee4

•

1 Parent(s): 3c16da3

Update README.md

Files changed (1) hide show

README.md +46 -1

README.md CHANGED Viewed

@@ -196,7 +196,7 @@ lmdeploy serve api_server ./workspace \
 	--tp 1
 ```
-In the above parameters, `server_name` and `server_port` indicate the service address and port, respectively. The `tp` parameter, as mentioned earlier, stands for Tensor Parallelism. The remaining parameter, instance_num, represents the number of instances and can be understood as the batch size. After execution, it will appear as shown below.
 After this, users can start the Web Service as described in [TurboMind Service as the Backend](#--turbomind-service-as-the-backend).
@@ -382,6 +382,51 @@ curl http://localhost:8000/v1/chat/completions \
 更多信息请查看 [vLLM 文档](https://docs.vllm.ai/en/latest/index.html)
 ## 网页服务启动方式1:

 	--tp 1
 ```
+In the above parameters, `server_name` and `server_port` indicate the service address and port, respectively. The `tp` parameter, as mentioned earlier, stands for Tensor Parallelism.
 After this, users can start the Web Service as described in [TurboMind Service as the Backend](#--turbomind-service-as-the-backend).
 更多信息请查看 [vLLM 文档](https://docs.vllm.ai/en/latest/index.html)
+## 使用本地训练模型
+### 第一步：转换为 lmdeploy TurboMind 格式
+这里，我们将使用预训练的模型文件，并在用户的根目录下执行转换，如下所示。
+```bash
+# 将模型转换为 TurboMind (FastTransformer 格式)
+lmdeploy convert internlm2-chat-7b /root/autodl-tmp/agri_intern/XiXiLM --tokenizer-path ./GouMang/tokenizer.json
+```
+执行完毕后，当前目录下将生成一个 workspace 文件夹。
+这个文件夹包含 TurboMind 和 Triton “模型推理”所需的文件，如下所示：
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/658a3c4cbbb04840e3ce7e2c/CqdwhshIL8xxjog_WD_St.png)
+### 第二步：本地聊天
+```bash
+lmdeploy chat turbomind ./workspace
+```
+### 第三步（可选）：TurboMind 推理 + API 服务
+在前一部分中，我们尝试通过命令行直接启动客户端。现在，我们将尝试使用 lmdeploy 进行服务部署。
+“模型推理/服务”目前提供两种服务部署方式：TurboMind 和 TritonServer。在这种情况下，服务器可以是 TurboMind 或 TritonServer，而 API 服务器可以提供外部 API 服务。我们推荐使用 TurboMind。
+首先，使用以下命令启动服务：
+```bash
+# ApiServer+Turbomind   api_server => AsyncEngine => TurboMind
+lmdeploy serve api_server ./workspace \
+	--server-name 0.0.0.0 \
+	--server-port 23333 \
+	--tp 1
+```
+在上述参数中，server_name 和 server_port 分别表示服务地址和端口。tp 参数如前所述代表 Tensor 并行性。
+之后，用户可以按照[TurboMind Service as the Backend](#--turbomind-service-as-the-backend) 中描述的启动 Web 服务。
 ## 网页服务启动方式1: