add deployment description

#5
Files changed (1) hide show
  1. README.md +7 -1
README.md CHANGED
@@ -2,6 +2,7 @@
2
  license: llama3.1
3
  ---
4
  # Meta-Llama-3.1-8B-Instruct-FP8-KV
 
5
  This model was created by applying [Quark](https://quark.docs.amd.com/latest/index.html) with calibration samples from Pile dataset.
6
  - ## Quantization Stragegy
7
  - ***Quantized Layers***:All linear layers excluding "lm_head"
@@ -32,9 +33,12 @@ python3 quantize_quark.py \
32
  --multi_gpu \
33
  --model_export quark_safetensors
34
  ```
 
 
 
35
  ## Evaluation
36
  Quark currently uses perplexity(PPL) as the evaluation metric for accuracy loss before and after quantization.The specific PPL algorithm can be referenced in the quantize_quark.py.
37
-
38
 
39
  #### Evaluation scores
40
  <table>
@@ -57,6 +61,8 @@ Quark currently uses perplexity(PPL) as the evaluation metric for accuracy loss
57
 
58
  </table>
59
 
 
 
60
  #### License
61
  Copyright (c) 2018-2024 Advanced Micro Devices, Inc. All Rights Reserved.
62
 
 
2
  license: llama3.1
3
  ---
4
  # Meta-Llama-3.1-8B-Instruct-FP8-KV
5
+ - ## Introduction
6
  This model was created by applying [Quark](https://quark.docs.amd.com/latest/index.html) with calibration samples from Pile dataset.
7
  - ## Quantization Stragegy
8
  - ***Quantized Layers***:All linear layers excluding "lm_head"
 
33
  --multi_gpu \
34
  --model_export quark_safetensors
35
  ```
36
+ ## Deployment
37
+ Quark has its own export format and allows FP8 quantized models to be efficiently deployed using the vLLM backend(vllm-compatible).
38
+
39
  ## Evaluation
40
  Quark currently uses perplexity(PPL) as the evaluation metric for accuracy loss before and after quantization.The specific PPL algorithm can be referenced in the quantize_quark.py.
41
+ The quantization evaluation results are conducted in pseudo-quantization mode, which may slightly differ from the actual quantized inference accuracy. These results are provided for reference only.
42
 
43
  #### Evaluation scores
44
  <table>
 
61
 
62
  </table>
63
 
64
+
65
+
66
  #### License
67
  Copyright (c) 2018-2024 Advanced Micro Devices, Inc. All Rights Reserved.
68