unsubscribe commited on
Commit
406a7cf
1 Parent(s): 003fce4
Files changed (1) hide show
  1. README.md +4 -3
README.md CHANGED
@@ -33,11 +33,11 @@ You can download the pre-quantized 4-bit weight models from LMDeploy's [model zo
33
 
34
  Alternatively, you can quantize 16-bit weights to 4-bit weights following the ["4-bit Weight Quantization"](#4-bit-weight-quantization) section, and then perform inference as per the below instructions.
35
 
36
- Take the 4-bit Llama-2-7B model from the model zoo as an example:
37
 
38
  ```shell
39
  git-lfs install
40
- git clone https://huggingface.co/lmdeploy/llama2-chat-7b-w4
41
  ```
42
 
43
  As demonstrated in the command below, first convert the model's layout using `turbomind.deploy`, and then you can interact with the AI assistant in the terminal
@@ -47,7 +47,7 @@ As demonstrated in the command below, first convert the model's layout using `tu
47
  ## Convert the model's layout and store it in the default path, ./workspace.
48
  python3 -m lmdeploy.serve.turbomind.deploy \
49
  --model-name llama2 \
50
- --model-path ./llama2-chat-7b-w4 \
51
  --model-format awq \
52
  --group-size 128
53
 
@@ -104,6 +104,7 @@ LMDeploy employs AWQ algorithm for model weight quantization.
104
 
105
  ```shell
106
  python3 -m lmdeploy.lite.apis.auto_awq \
 
107
  --w_bits 4 \ # Bit number for weight quantization
108
  --w_sym False \ # Whether to use symmetric quantization for weights
109
  --w_group_size 128 \ # Group size for weight quantization statistics
 
33
 
34
  Alternatively, you can quantize 16-bit weights to 4-bit weights following the ["4-bit Weight Quantization"](#4-bit-weight-quantization) section, and then perform inference as per the below instructions.
35
 
36
+ Take the 4-bit Llama-2-13B model from the model zoo as an example:
37
 
38
  ```shell
39
  git-lfs install
40
+ git clone https://huggingface.co/lmdeploy/llama2-chat-13b-w4
41
  ```
42
 
43
  As demonstrated in the command below, first convert the model's layout using `turbomind.deploy`, and then you can interact with the AI assistant in the terminal
 
47
  ## Convert the model's layout and store it in the default path, ./workspace.
48
  python3 -m lmdeploy.serve.turbomind.deploy \
49
  --model-name llama2 \
50
+ --model-path ./llama2-chat-13b-w4 \
51
  --model-format awq \
52
  --group-size 128
53
 
 
104
 
105
  ```shell
106
  python3 -m lmdeploy.lite.apis.auto_awq \
107
+ --model $HF_MODEL \
108
  --w_bits 4 \ # Bit number for weight quantization
109
  --w_sym False \ # Whether to use symmetric quantization for weights
110
  --w_group_size 128 \ # Group size for weight quantization statistics