haoyang-amd commited on
Commit
243be4a
·
verified ·
1 Parent(s): a5c42b0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -9
README.md CHANGED
@@ -1,10 +1,12 @@
1
  ---
2
- base_model: Qwen2-7B
 
3
  license: other
4
  license_name: llama2
5
  license_link: https://github.com/meta-llama/llama-models/blob/main/models/llama2/LICENSE
6
  ---
7
- # Qwen2-7B-Weight-INT4-Per-Group-AWQ-Bfloat16
 
8
  - ## Introduction
9
  This model was created by applying [Quark](https://quark.docs.amd.com/latest/index.html) with calibration samples from Pile dataset.
10
  - ## Quantization Stragegy
@@ -15,7 +17,7 @@ license_link: https://github.com/meta-llama/llama-models/blob/main/models/llama2
15
  1. [Download and install Quark](https://quark.docs.amd.com/latest/install.html)
16
  2. Run the quantization script in the example folder using the following command line:
17
  ```sh
18
- export MODEL_DIR = [local model checkpoint folder] or Qwen/Qwen2-7B
19
  # single GPU
20
  python3 quantize_quark.py --model_dir $MODEL_DIR \
21
  --data_type bfloat16 \
@@ -24,7 +26,7 @@ license_link: https://github.com/meta-llama/llama-models/blob/main/models/llama2
24
  --quant_algo awq \
25
  --dataset pileval_for_awq_benchmark \
26
  --seq_len 512 \
27
- --output_dir Qwen2-7B-W_Int4-Per_Group-AWQ-BFloat16 \
28
  --model_export quark_safetensors
29
  # cpu
30
  python3 quantize_quark.py --model_dir $MODEL_DIR \
@@ -34,7 +36,7 @@ license_link: https://github.com/meta-llama/llama-models/blob/main/models/llama2
34
  --quant_algo awq \
35
  --dataset pileval_for_awq_benchmark \
36
  --seq_len 512 \
37
- --output_dir Qwen2-7B-W_Int4-Per_Group-AWQ-BFloat16 \
38
  --model_export quark_safetensors \
39
  --device cpu
40
  ```
@@ -48,17 +50,17 @@ The quantization evaluation results are conducted in pseudo-quantization mode, w
48
  <tr>
49
  <td><strong>Benchmark</strong>
50
  </td>
51
- <td><strong>Qwen2-7B(Bfloat16) </strong>
52
  </td>
53
- <td><strong>Qwen2-7B-Weight-INT4-Per-Group-AWQ-Bfloat16(this model)</strong>
54
  </td>
55
  </tr>
56
  <tr>
57
  <td>Perplexity-wikitext2
58
  </td>
59
- <td>7.9935
60
  </td>
61
- <td>8.1179
62
  </td>
63
  </tr>
64
  </table>
 
1
  ---
2
+ base_model:
3
+ - microsoft/Phi-3-mini-4k-instruct
4
  license: other
5
  license_name: llama2
6
  license_link: https://github.com/meta-llama/llama-models/blob/main/models/llama2/LICENSE
7
  ---
8
+
9
+ # Phi-3-mini-4k-instruct-Weight-INT4-Per-Group-AWQ-Bfloat16
10
  - ## Introduction
11
  This model was created by applying [Quark](https://quark.docs.amd.com/latest/index.html) with calibration samples from Pile dataset.
12
  - ## Quantization Stragegy
 
17
  1. [Download and install Quark](https://quark.docs.amd.com/latest/install.html)
18
  2. Run the quantization script in the example folder using the following command line:
19
  ```sh
20
+ export MODEL_DIR = [local model checkpoint folder] or microsoft/Phi-3-mini-4k-instruct
21
  # single GPU
22
  python3 quantize_quark.py --model_dir $MODEL_DIR \
23
  --data_type bfloat16 \
 
26
  --quant_algo awq \
27
  --dataset pileval_for_awq_benchmark \
28
  --seq_len 512 \
29
+ --output_dir Phi-3-mini-4k-instruct-W_Int4-Per_Group-AWQ-BFloat16 \
30
  --model_export quark_safetensors
31
  # cpu
32
  python3 quantize_quark.py --model_dir $MODEL_DIR \
 
36
  --quant_algo awq \
37
  --dataset pileval_for_awq_benchmark \
38
  --seq_len 512 \
39
+ --output_dir Phi-3-mini-4k-instruct-W_Int4-Per_Group-AWQ-BFloat16 \
40
  --model_export quark_safetensors \
41
  --device cpu
42
  ```
 
50
  <tr>
51
  <td><strong>Benchmark</strong>
52
  </td>
53
+ <td><strong>Phi-3-mini-4k-instruct(Bfloat16) </strong>
54
  </td>
55
+ <td><strong>Phi-3-mini-4k-instruct-Weight-INT4-Per-Group-AWQ-Bfloat16(this model)</strong>
56
  </td>
57
  </tr>
58
  <tr>
59
  <td>Perplexity-wikitext2
60
  </td>
61
+ <td>6.0164
62
  </td>
63
+ <td>6.5575
64
  </td>
65
  </tr>
66
  </table>