Update README.md
Browse files
README.md
CHANGED
|
@@ -37,16 +37,16 @@ For reference, the Dense model can be used after compression with FP8 Dynamic.
|
|
| 37 |
export NCCL_P2P_DISABLE=1
|
| 38 |
```
|
| 39 |
|
| 40 |
-
In GPU 2 units,
|
| 41 |
|
| 42 |
```
|
| 43 |
vllm serve BCCard/kanana-1.5-8b-instruct-2505-FP8-Dynamic \
|
| 44 |
--tensor-parallel-size 2 \
|
| 45 |
--gpu-memory-utilization 0.9 \
|
| 46 |
-
--max-model-len
|
| 47 |
-
--enforce-eager \
|
| 48 |
--api-key bccard \
|
| 49 |
-
--served-model-name kanana-1.5-8b-instruct
|
| 50 |
```
|
| 51 |
|
| 52 |
## 3. Quantization Code Walk‑Through (Shared Knowledges)
|
|
|
|
| 37 |
export NCCL_P2P_DISABLE=1
|
| 38 |
```
|
| 39 |
|
| 40 |
+
In GPU 2 units, with KV Cache 90%, Max token 32768
|
| 41 |
|
| 42 |
```
|
| 43 |
vllm serve BCCard/kanana-1.5-8b-instruct-2505-FP8-Dynamic \
|
| 44 |
--tensor-parallel-size 2 \
|
| 45 |
--gpu-memory-utilization 0.9 \
|
| 46 |
+
--max-model-len 32768 \
|
| 47 |
+
--enforce-eager \
|
| 48 |
--api-key bccard \
|
| 49 |
+
--served-model-name kanana-1.5-8b-instruct
|
| 50 |
```
|
| 51 |
|
| 52 |
## 3. Quantization Code Walk‑Through (Shared Knowledges)
|