leafspark
/

DeepSeek-V2-Chat-GGUF

Text Generation

Inference Endpoints

Model card Files Files and versions Community

leafspark commited on May 23

Commit

3ef8d77

•

1 Parent(s): 04c1e99

readme: update quants

Files changed (1) hide show

README.md +6 -4

README.md CHANGED Viewed

@@ -82,10 +82,13 @@ quantize \
 - q8_0 (later, please use q4_k_m for now) [estimated size: 233.27gb]
 - q4_k_m [size: 132gb]
 - q2_k [size: 80gb]
-- iq2_xxs (generating, using importance matrix)
-- q3_k_s (generating, using importance matrix) [estimated size: 96.05gb]
 ```
 # Planned Quants (using importance matrix):
 ```
 - q5_k_m
@@ -97,8 +100,7 @@ quantize \
 - iq2_xs
 - iq2_s
 - iq2_m
-- iq1_s
-- iq1_m
 ```
 Note: the model files do not have some DeepSeek v2 specific parameters, will look into adding them

 - q8_0 (later, please use q4_k_m for now) [estimated size: 233.27gb]
 - q4_k_m [size: 132gb]
 - q2_k [size: 80gb]
+- iq2_xxs [size: 61.5gb]
+- iq3_xs (uploading) [size: 89.6gb]
+- iq1_m [size: 27.3gb]
 ```
+Note: Use iMatrix quants only if you can fully offload to GPU, otherwise speed will be affected a lot.
 # Planned Quants (using importance matrix):
 ```
 - q5_k_m
 - iq2_xs
 - iq2_s
 - iq2_m
+- iq1_s (note: for fun only, this quant is likely useless)
 ```
 Note: the model files do not have some DeepSeek v2 specific parameters, will look into adding them