leafspark commited on
Commit
f83b696
1 Parent(s): 418c797

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +40 -4
README.md CHANGED
@@ -19,7 +19,7 @@ Quantizised from [https://huggingface.co/deepseek-ai/DeepSeek-V2-Chat](https://h
19
 
20
  Using llama.cpp fork: [https://github.com/fairydreaming/llama.cpp/tree/deepseek-v2](https://github.com/fairydreaming/llama.cpp/tree/deepseek-v2)
21
 
22
- # Warning: This will not work unless you compile llama.cpp from the repo provided!
23
 
24
  # How to use:
25
 
@@ -29,19 +29,55 @@ Using llama.cpp fork: [https://github.com/fairydreaming/llama.cpp/tree/deepseek-
29
  - Merged GGUF should appear
30
 
31
  # Quants:
 
32
  - bf16 [size: 439gb]
33
  - q8_0 (after q2_k) [estimated size: 233.27gb]
34
  - q4_k_m [size: 132gb]
35
  - q2_k (uploading) [size: 80gb]
36
- - q3_k_s (generating) [estimated size: 96.05gb]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
37
 
38
  Note: the model files do not have some DeepSeek v2 specific parameters, will look into adding them
39
 
40
- Please use commit 039896407afd40e54321d47c5063c46a52da3e01, otherwise use these metadata KV overrides:
41
  ```
42
  deepseek2.attention.q_lora_rank=int:1536
43
  deepseek2.attention.kv_lora_rank=int:512
44
  deepseek2.expert_shared_count=int:2
45
  deepseek2.expert_feed_forward_length=int:1536
46
  deepseek2.leading_dense_block_count=int:1
47
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19
 
20
  Using llama.cpp fork: [https://github.com/fairydreaming/llama.cpp/tree/deepseek-v2](https://github.com/fairydreaming/llama.cpp/tree/deepseek-v2)
21
 
22
+ # Warning: This will not work unless you compile llama.cpp from the repo provided (and set metadata KV overrides)!
23
 
24
  # How to use:
25
 
 
29
  - Merged GGUF should appear
30
 
31
  # Quants:
32
+ ```
33
  - bf16 [size: 439gb]
34
  - q8_0 (after q2_k) [estimated size: 233.27gb]
35
  - q4_k_m [size: 132gb]
36
  - q2_k (uploading) [size: 80gb]
37
+ - q3_k_s (generating, using importance matrix) [estimated size: 96.05gb]
38
+ ```
39
+
40
+ # Planned Quants (using importance matrix):
41
+ ```
42
+ - q5_k_m
43
+ - q5_k_s
44
+ - q3_k_m
45
+ - q6_k
46
+ - iq4_nl
47
+ - iq4_xs
48
+ - iq2_xxs
49
+ - iq2_xs
50
+ - iq2_s
51
+ - iq2_m
52
+ - iq1_s
53
+ - iq1_m
54
+ ```
55
 
56
  Note: the model files do not have some DeepSeek v2 specific parameters, will look into adding them
57
 
58
+ Please use commit `039896407afd40e54321d47c5063c46a52da3e01`, otherwise use these metadata KV overrides:
59
  ```
60
  deepseek2.attention.q_lora_rank=int:1536
61
  deepseek2.attention.kv_lora_rank=int:512
62
  deepseek2.expert_shared_count=int:2
63
  deepseek2.expert_feed_forward_length=int:1536
64
  deepseek2.leading_dense_block_count=int:1
65
+ ```
66
+
67
+ A precompiled AVX2 version is avaliable at `llama.cpp-039896407afd40e54321d47c5063c46a52da3e01.zip` in the root of this repo.
68
+
69
+ # License:
70
+ - DeepSeek license for model weights
71
+ - MIT license for any repo code
72
+
73
+ # Performance:
74
+ ~1.5t/s with Ryzen 3 3700x (96gb 3200mhz) [Q2_K]
75
+
76
+ # iMatrix:
77
+ Find imatrix.dat in the root of this repo, made with a Q2_K quant (see here for info: [https://github.com/ggerganov/llama.cpp/issues/5153#issuecomment-1913185693](https://github.com/ggerganov/llama.cpp/issues/5153#issuecomment-1913185693))
78
+
79
+ Using groups_merged.txt, find it here: [https://github.com/ggerganov/llama.cpp/discussions/5263#discussioncomment-8395384](https://github.com/ggerganov/llama.cpp/discussions/5263#discussioncomment-8395384)
80
+
81
+ # Censorship:
82
+
83
+ This model is quite censored, finetuning on toxic DPO might help.