leafspark commited on
Commit
4e6f75c
1 Parent(s): 951c875

readme: edit formatting & add banner

Browse files
Files changed (1) hide show
  1. README.md +36 -32
README.md CHANGED
@@ -16,13 +16,15 @@ language:
16
 
17
  # DeepSeek-V2-Chat-GGUF
18
 
 
 
19
  Quantizised from [https://huggingface.co/deepseek-ai/DeepSeek-V2-Chat](https://huggingface.co/deepseek-ai/DeepSeek-V2-Chat)
20
 
21
  Using llama.cpp [b3026](https://github.com/ggerganov/llama.cpp/releases/tag/b3026) for quantizisation. Given the rapid release of llama.cpp builds, this will likely change over time.
22
 
23
- # Warning: This will not work unless you set metadata KV overrides, nor will it in LM Studio/similar wrapper apps (except supported ones, see below)!
24
 
25
- # How to use:
26
 
27
  **Downloading the bf16:**
28
 
@@ -75,33 +77,35 @@ quantize \
75
  (--imatrix [file])
76
  ```
77
 
78
- # Quants:
79
- ```
80
- - bf16 [size: 439gb]
81
- - q8_0 (uploading) [size: 233.27gb]
82
- - q4_k_m [size: 132gb]
83
- - q2_k [size: 80gb]
84
- - iq2_xxs [size: 61.5gb]
85
- - iq3_xs [size: 89.6gb]
86
- - iq1_m (uploading) [size: 27.3gb]
87
- - q3_k_m (uploading) [size: 92.6gb]
88
- ```
89
-
90
- Note: Use iMatrix quants only if you can fully offload to GPU, otherwise speed will be affected a lot.
91
 
92
- # Planned Quants (weighted/imatrix):
93
- ```
94
- - q5_k_m
95
- - q5_k_s
96
- - q6_k
97
- - iq4_xs
98
- - iq2_xs
99
- - iq2_s
100
- - iq2_m
101
- - iq1_s (note: for fun only, this quant is likely useless)
102
- ```
103
 
104
- Use these metadata KV overrides (pass them using `--override-kv`, can be specified multiple times):
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
105
  ```
106
  deepseek2.attention.q_lora_rank=int:1536
107
  deepseek2.attention.kv_lora_rank=int:512
@@ -112,7 +116,7 @@ deepseek2.leading_dense_block_count=int:1
112
  deepseek2.rope.scaling.yarn_log_multiplier=float:0.0707
113
  ```
114
 
115
- The Q8_0 quant contains these parameters, along with future ones, so as long as you're running a supported build of llama.cpp no `--override-kv` parameters are required.
116
 
117
  A precompiled AVX2 version is avaliable at `llama.cpp-039896407afd40e54321d47c5063c46a52da3e01.zip` in the root of this repo.
118
 
@@ -121,13 +125,13 @@ A precompiled AVX2 version is avaliable at `llama.cpp-039896407afd40e54321d47c50
121
  - MIT license for any repo code
122
 
123
  # Performance:
124
- ~1.5t/s with Ryzen 3 3700x (96gb 3200mhz) [Q2_K]
125
 
126
  # iMatrix:
127
- Find imatrix.dat in the root of this repo, made with a Q2_K quant (see here for info: [https://github.com/ggerganov/llama.cpp/issues/5153#issuecomment-1913185693](https://github.com/ggerganov/llama.cpp/issues/5153#issuecomment-1913185693))
128
 
129
- Using groups_merged.txt, find it here: [https://github.com/ggerganov/llama.cpp/discussions/5263#discussioncomment-8395384](https://github.com/ggerganov/llama.cpp/discussions/5263#discussioncomment-8395384)
130
 
131
  # Censorship:
132
 
133
- This model is quite censored, finetuning on toxic DPO might help.
 
16
 
17
  # DeepSeek-V2-Chat-GGUF
18
 
19
+ ![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/6604e5b21eb292d6df393365/j_LWkNdegeMjQXuAOFZ1N.jpeg)
20
+
21
  Quantizised from [https://huggingface.co/deepseek-ai/DeepSeek-V2-Chat](https://huggingface.co/deepseek-ai/DeepSeek-V2-Chat)
22
 
23
  Using llama.cpp [b3026](https://github.com/ggerganov/llama.cpp/releases/tag/b3026) for quantizisation. Given the rapid release of llama.cpp builds, this will likely change over time.
24
 
25
+ **If you are using an older quant, please set the metadata KV overrides below.**
26
 
27
+ # Usage:
28
 
29
  **Downloading the bf16:**
30
 
 
77
  (--imatrix [file])
78
  ```
79
 
80
+ Note: Use iMatrix quants only if you can fully offload to GPU, otherwise speed will be affected negatively.
 
 
 
 
 
 
 
 
 
 
 
 
81
 
82
+ # Quants:
 
 
 
 
 
 
 
 
 
 
83
 
84
+ | Quant | Status | Size | Description | KV Metadata | Weighted | Notes |
85
+ |----------|-------------|-----------|--------------------------------------------|-------------|----------|-------|
86
+ | BF16 | Available | 439 GB | Lossless :) | Old | No | Q8_0 is sufficient for most cases |
87
+ | Q8_0 | Uploading | 233.27 GB | High quality *recommended* | Updated | Yes | |
88
+ | Q4_K_M | Available | 132 GB | Medium quality *recommended* | Old | No | |
89
+ | Q3_K_M | Uploading | 92.6 GB | Medium-low quality | Updated | Yes | |
90
+ | IQ3_XS | Available | 89.6 GB | Better than Q3_K_M | Old | Yes | |
91
+ | Q2_K | Available | 80.0 GB | Low quality **not recommended** | Old | No | |
92
+ | IQ2_XXS | Available | 61.5 GB | Lower quality **not recommended** | Old | Yes | |
93
+ | IQ1_M | Uploading | 27.3 GB | Extremely low quality **not recommended** | Old | Yes | Testing purposes; use IQ2 at least |
94
+
95
+
96
+ # Planned Quants (weighted/iMatrix):
97
+
98
+ | Planned Quant | Notes |
99
+ |-------------------|---------|
100
+ | Q5_K_M | |
101
+ | Q5_K_M | |
102
+ | Q6_K | |
103
+ | IQ4_XS | |
104
+ | IQ2_XS | |
105
+ | IQ2_S | |
106
+ | IQ2_M | |
107
+
108
+ Metadata KV overrides (pass them using `--override-kv`, can be specified multiple times):
109
  ```
110
  deepseek2.attention.q_lora_rank=int:1536
111
  deepseek2.attention.kv_lora_rank=int:512
 
116
  deepseek2.rope.scaling.yarn_log_multiplier=float:0.0707
117
  ```
118
 
119
+ The `Q8_0` quant contains these parameters, along with future ones, so as long as you're running a supported build of llama.cpp no `--override-kv` parameters are required.
120
 
121
  A precompiled AVX2 version is avaliable at `llama.cpp-039896407afd40e54321d47c5063c46a52da3e01.zip` in the root of this repo.
122
 
 
125
  - MIT license for any repo code
126
 
127
  # Performance:
128
+ *~1.5t/s* with Ryzen 3 3700x (96gb 3200mhz) `[Q2_K]`
129
 
130
  # iMatrix:
131
+ Find `imatrix.dat` in the root of this repo, made with a `Q2_K` quant (see here for info: [https://github.com/ggerganov/llama.cpp/issues/5153#issuecomment-1913185693](https://github.com/ggerganov/llama.cpp/issues/5153#issuecomment-1913185693))
132
 
133
+ Using `groups_merged.txt`, find it here: [https://github.com/ggerganov/llama.cpp/discussions/5263#discussioncomment-8395384](https://github.com/ggerganov/llama.cpp/discussions/5263#discussioncomment-8395384)
134
 
135
  # Censorship:
136
 
137
+ This model is a bit censored, finetuning on toxic DPO might help.