bowenbaoamd commited on
Commit
1ac52b5
1 Parent(s): a2b8b36

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +4 -2
README.md CHANGED
@@ -25,7 +25,8 @@ python3 quantize_quark.py \
25
  --kv_cache_dtype fp8 \
26
  --num_calib_data 128 \
27
  --model_export quark_safetensors \
28
- --no_weight_matrix_merge
 
29
  # If model size is too large for single GPU, please use multi GPU instead.
30
  python3 quantize_quark.py \
31
  --model_dir $MODEL_DIR \
@@ -35,7 +36,8 @@ python3 quantize_quark.py \
35
  --num_calib_data 128 \
36
  --model_export quark_safetensors \
37
  --no_weight_matrix_merge \
38
- --multi_gpu
 
39
  ```
40
  ## Deployment
41
  Quark has its own export format and allows FP8 quantized models to be efficiently deployed using the vLLM backend(vLLM-compatible).
 
25
  --kv_cache_dtype fp8 \
26
  --num_calib_data 128 \
27
  --model_export quark_safetensors \
28
+ --no_weight_matrix_merge \
29
+ --custom_mode fp8
30
  # If model size is too large for single GPU, please use multi GPU instead.
31
  python3 quantize_quark.py \
32
  --model_dir $MODEL_DIR \
 
36
  --num_calib_data 128 \
37
  --model_export quark_safetensors \
38
  --no_weight_matrix_merge \
39
+ --multi_gpu \
40
+ --custom_mode fp8
41
  ```
42
  ## Deployment
43
  Quark has its own export format and allows FP8 quantized models to be efficiently deployed using the vLLM backend(vLLM-compatible).