sh2orc commited on
Commit
fe80b54
·
verified ·
1 Parent(s): 74b63bf

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -4
README.md CHANGED
@@ -37,16 +37,16 @@ For reference, the Dense model can be used after compression with FP8 Dynamic.
37
  export NCCL_P2P_DISABLE=1
38
  ```
39
 
40
- In GPU 2 units,
41
 
42
  ```
43
  vllm serve BCCard/kanana-1.5-8b-instruct-2505-FP8-Dynamic \
44
  --tensor-parallel-size 2 \
45
  --gpu-memory-utilization 0.9 \
46
- --max-model-len 8192 \
47
- --enforce-eager \
48
  --api-key bccard \
49
- --served-model-name kanana-1.5-8b-instruct
50
  ```
51
 
52
  ## 3. Quantization Code Walk‑Through (Shared Knowledges)
 
37
  export NCCL_P2P_DISABLE=1
38
  ```
39
 
40
+ In GPU 2 units, with KV Cache 90%, Max token 32768
41
 
42
  ```
43
  vllm serve BCCard/kanana-1.5-8b-instruct-2505-FP8-Dynamic \
44
  --tensor-parallel-size 2 \
45
  --gpu-memory-utilization 0.9 \
46
+ --max-model-len 32768 \
47
+ --enforce-eager \
48
  --api-key bccard \
49
+ --served-model-name kanana-1.5-8b-instruct
50
  ```
51
 
52
  ## 3. Quantization Code Walk‑Through (Shared Knowledges)