leafspark commited on
Commit
3734c9a
1 Parent(s): 30dc34b

readme: update quant info; minor changes to hyperlinks and terms

Browse files
Files changed (1) hide show
  1. README.md +11 -13
README.md CHANGED
@@ -18,9 +18,9 @@ language:
18
 
19
  Quantizised from [https://huggingface.co/deepseek-ai/DeepSeek-V2-Chat](https://huggingface.co/deepseek-ai/DeepSeek-V2-Chat)
20
 
21
- Using llama.cpp b3026 for quantizisation
22
 
23
- # Warning: This will not work unless you set metadata KV overrides, nor will it in LM Studio/similar wrapper apps!
24
 
25
  # How to use:
26
 
@@ -38,11 +38,11 @@ Using llama.cpp b3026 for quantizisation
38
 
39
  **Running in llama.cpp:**
40
 
41
- To start in command line interactive mode (text completion):
42
  ```
43
- main -m DeepSeek-V2-Chat.{quant}.gguf -c {context length} --color -i
44
  ```
45
- To use llama.cpp OpenAI compatible server:
46
  ```
47
  server \
48
  -m DeepSeek-V2-Chat.{quant}.gguf \
@@ -78,11 +78,11 @@ quantize \
78
  # Quants:
79
  ```
80
  - bf16 [size: 439gb]
81
- - q8_0 [estimated size: 233.27gb]
82
  - q4_k_m [size: 132gb]
83
  - q2_k [size: 80gb]
84
  - iq2_xxs [size: 61.5gb]
85
- - iq3_xs (uploading) [size: 89.6gb]
86
  - iq1_m (uploading) [size: 27.3gb]
87
  - q3_k_m (uploading) [size: 92.6gb]
88
  ```
@@ -94,18 +94,14 @@ Note: Use iMatrix quants only if you can fully offload to GPU, otherwise speed w
94
  - q5_k_m
95
  - q5_k_s
96
  - q6_k
97
- - iq4_nl
98
  - iq4_xs
99
  - iq2_xs
100
  - iq2_s
101
  - iq2_m
102
- - iq3_xxs
103
  - iq1_s (note: for fun only, this quant is likely useless)
104
  ```
105
 
106
- Note: the model files do not have some DeepSeek v2 specific parameters, will look into adding them
107
-
108
- Please use commit `039896407afd40e54321d47c5063c46a52da3e01`, otherwise use these metadata KV overrides:
109
  ```
110
  deepseek2.attention.q_lora_rank=int:1536
111
  deepseek2.attention.kv_lora_rank=int:512
@@ -116,10 +112,12 @@ deepseek2.leading_dense_block_count=int:1
116
  deepseek2.rope.scaling.yarn_log_multiplier=float:0.0707
117
  ```
118
 
 
 
119
  A precompiled AVX2 version is avaliable at `llama.cpp-039896407afd40e54321d47c5063c46a52da3e01.zip` in the root of this repo.
120
 
121
  # License:
122
- - DeepSeek license for model weights
123
  - MIT license for any repo code
124
 
125
  # Performance:
 
18
 
19
  Quantizised from [https://huggingface.co/deepseek-ai/DeepSeek-V2-Chat](https://huggingface.co/deepseek-ai/DeepSeek-V2-Chat)
20
 
21
+ Using llama.cpp [b3026](https://github.com/ggerganov/llama.cpp/releases/tag/b3026) for quantizisation. Given the rapid release of llama.cpp builds, this will likely change over time.
22
 
23
+ # Warning: This will not work unless you set metadata KV overrides, nor will it in LM Studio/similar wrapper apps (except supported ones, see below)!
24
 
25
  # How to use:
26
 
 
38
 
39
  **Running in llama.cpp:**
40
 
41
+ To start in command line chat mode (chat completion):
42
  ```
43
+ main -m DeepSeek-V2-Chat.{quant}.gguf -c {context length} --color -c (-i)
44
  ```
45
+ To use llama.cpp's OpenAI compatible server:
46
  ```
47
  server \
48
  -m DeepSeek-V2-Chat.{quant}.gguf \
 
78
  # Quants:
79
  ```
80
  - bf16 [size: 439gb]
81
+ - q8_0 (uploading) [size: 233.27gb]
82
  - q4_k_m [size: 132gb]
83
  - q2_k [size: 80gb]
84
  - iq2_xxs [size: 61.5gb]
85
+ - iq3_xs [size: 89.6gb]
86
  - iq1_m (uploading) [size: 27.3gb]
87
  - q3_k_m (uploading) [size: 92.6gb]
88
  ```
 
94
  - q5_k_m
95
  - q5_k_s
96
  - q6_k
 
97
  - iq4_xs
98
  - iq2_xs
99
  - iq2_s
100
  - iq2_m
 
101
  - iq1_s (note: for fun only, this quant is likely useless)
102
  ```
103
 
104
+ Use these metadata KV overrides (pass them using `--override-kv`, can be specified multiple times):
 
 
105
  ```
106
  deepseek2.attention.q_lora_rank=int:1536
107
  deepseek2.attention.kv_lora_rank=int:512
 
112
  deepseek2.rope.scaling.yarn_log_multiplier=float:0.0707
113
  ```
114
 
115
+ The Q8_0 quant contains these parameters, along with future ones, so as long as you're running a supported build of llama.cpp no `--override-kv` parameters are required.
116
+
117
  A precompiled AVX2 version is avaliable at `llama.cpp-039896407afd40e54321d47c5063c46a52da3e01.zip` in the root of this repo.
118
 
119
  # License:
120
+ - DeepSeek license for model weights, which can be found in the `LICENSE` file in the root of this repo
121
  - MIT license for any repo code
122
 
123
  # Performance: