BramVanroy
commited on
Commit
•
71efc4c
1
Parent(s):
fc00a49
Update README.md
Browse files
README.md
CHANGED
@@ -6,18 +6,19 @@ tags:
|
|
6 |
- gguf
|
7 |
---
|
8 |
|
9 |
-
This repository contains quantized versions of [BramVanroy/fietje-2b-chat](https://huggingface.co/BramVanroy/fietje-2b-chat)
|
10 |
|
11 |
-
|
12 |
-
- `-q8_0` (3.0GB): minimal quality loss, smaller
|
13 |
-
- `-q5_k_m` (2.0GB): users have reported considerable quality loss in the chat `q5_k_m` version so you may want to avoid it
|
14 |
|
15 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
16 |
|
17 |
-
|
18 |
-
|
19 |
-
|
20 |
-
ollama run bramvanroy/fietje-2b-chat:f16
|
21 |
-
ollama run bramvanroy/fietje-2b-chat:q8_0
|
22 |
-
ollama run bramvanroy/fietje-2b-chat:q5_k_m
|
23 |
-
```
|
|
|
6 |
- gguf
|
7 |
---
|
8 |
|
9 |
+
This repository contains quantized versions of [BramVanroy/fietje-2b-chat](https://huggingface.co/BramVanroy/fietje-2b-chat).
|
10 |
|
11 |
+
Available quantization types and expected performance differences compared to base `f16`, higher perplexity=worse (from llama.cpp):
|
|
|
|
|
12 |
|
13 |
+
```
|
14 |
+
Q3_K_M : 3.07G, +0.2496 ppl @ LLaMA-v1-7B
|
15 |
+
Q4_K_M : 3.80G, +0.0532 ppl @ LLaMA-v1-7B
|
16 |
+
Q5_K_M : 4.45G, +0.0122 ppl @ LLaMA-v1-7B
|
17 |
+
Q6_K : 5.15G, +0.0008 ppl @ LLaMA-v1-7B
|
18 |
+
Q8_0 : 6.70G, +0.0004 ppl @ LLaMA-v1-7B
|
19 |
+
F16 : 13.00G @ 7B
|
20 |
+
```
|
21 |
|
22 |
+
Also available on [ollama](https://ollama.com/bramvanroy/fietje-2b-chat).
|
23 |
+
|
24 |
+
Quants were made with release [`b2777`](https://github.com/ggerganov/llama.cpp/releases/tag/b2777) of llama.cpp.
|
|
|
|
|
|
|
|