ikawrakow commited on
Commit
b608e0f
1 Parent(s): f84fda6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +17 -0
README.md CHANGED
@@ -1,3 +1,20 @@
1
  ---
2
  license: apache-2.0
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
  ---
4
+
5
+ This repository contains alternative Mistral-instruct-7B (https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) quantized models in GGUF format for use with `llama.cpp`.
6
+ The models are fully compatible with the oficial `llama.cpp` release and can be used out-of-the-box.
7
+
8
+ I'm carefull to say "alternative" rather than "better" or "improved" as I have not put any effort into evaluating performance
9
+ differences in actual usage. Perplexity is lower compared to the "official" `llama.cpp` quantization, but perplexity is not
10
+ necessarily a good measure for real world performance. Nevertheless, perplexity does measure quantization error, so below is a table
11
+ comparing perplexities of these quantized models to the current `llama.cpp` quantization approach on Wikitext for a context length of 512 tokens.
12
+ The "Quantization Error" columns in the table are defined as `(PPL(quantized model) - PPL(fp16))/PPL(fp16)`.
13
+
14
+ | Quantization | Model file | PPL(llama.cpp) | Quantization Error | PPL(new quants) | Quantization Error |
15
+ |--:|--:|--:|--:|--:|--:|
16
+ |Q3_K_S| mistral-instruct-7b-q3k-small.gguf | 6.9959 | 4.27% | 6.8920 | 2.72% |
17
+ |Q3_K_M| mistral-instruct-7b-q3k-medium.gguf| 6.8892 | 2.68% | 6.8089 | 1.48% |
18
+ |Q4_K_S| mistral-instruct-7b-q4k-small.gguf | 6.7649 | 0.82% | 6.7351 | 0.38% |
19
+ |Q5_K_S| mistral-instruct-7b-q5k-small.gguf | 6.7197 | 0.15% | 6.7186 | 0.13% |
20
+ |Q4_0 | mistral-instruct-7b-q40.gguf | 6.7728 | 0.94% | 6.7191 | 0.14% |