Update README.md
Browse files
README.md
CHANGED
@@ -39,7 +39,7 @@ LMDeploy supports the following NVIDIA GPU for W4A16 inference:
|
|
39 |
Before proceeding with the quantization and inference, please ensure that lmdeploy is installed.
|
40 |
|
41 |
```shell
|
42 |
-
pip install lmdeploy>=0.
|
43 |
```
|
44 |
|
45 |
This article comprises the following sections:
|
@@ -74,7 +74,7 @@ For more information about the pipeline parameters, please refer to [here](https
|
|
74 |
LMDeploy's `api_server` enables models to be easily packed into services with a single command. The provided RESTful APIs are compatible with OpenAI's interfaces. Below are an example of service startup:
|
75 |
|
76 |
```shell
|
77 |
-
lmdeploy serve api_server OpenGVLab/InternVL-Chat-V1-5-AWQ --
|
78 |
```
|
79 |
|
80 |
To use the OpenAI-style interface, you need to install OpenAI:
|
|
|
39 |
Before proceeding with the quantization and inference, please ensure that lmdeploy is installed.
|
40 |
|
41 |
```shell
|
42 |
+
pip install lmdeploy>=0.6.4
|
43 |
```
|
44 |
|
45 |
This article comprises the following sections:
|
|
|
74 |
LMDeploy's `api_server` enables models to be easily packed into services with a single command. The provided RESTful APIs are compatible with OpenAI's interfaces. Below are an example of service startup:
|
75 |
|
76 |
```shell
|
77 |
+
lmdeploy serve api_server OpenGVLab/InternVL-Chat-V1-5-AWQ --server-port 23333 --model-format awq
|
78 |
```
|
79 |
|
80 |
To use the OpenAI-style interface, you need to install OpenAI:
|