eachadea commited on
Commit
4c0bf74
1 Parent(s): 376d070

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +33 -11
README.md CHANGED
@@ -3,18 +3,40 @@ license: apache-2.0
3
  inference: true
4
  ---
5
 
6
- **NOTE: This GGML conversion is primarily for use with llama.cpp.**
 
 
 
 
 
 
 
 
 
 
 
 
7
  - PR #896 was used for q4_0. Everything else is latest as of upload time.
8
- - A warning for q4_2 and q4_3: These are WIP. Do not expect any kind of backwards compatibility until they are finalized.
9
- - 13B can be found here: https://huggingface.co/eachadea/ggml-vicuna-13b-1.1
10
- - **Choosing the right model:**
11
- - `ggml-vicuna-7b-1.1-q4_0` - Fast, lacks in accuracy.
12
- - `ggml-vicuna-7b-1.1-q4_1` - More accurate, lacks in speed.
13
-
14
- - `ggml-vicuna-7b-1.1-q4_2` - Pretty much a better `q4_0`. Similarly fast, but more accurate.
15
- - `ggml-vicuna-7b-1.1-q4_3` - Pretty much a better `q4_1`. More accurate, still slower.
16
-
17
- - `ggml-vicuna-7b-1.0-uncensored` - Available in `q4_2` and `q4_3`, is an uncensored/unfiltered variant of the model. It is based on the previous release and still uses the `### Human:` syntax. Avoid unless you need it.
 
 
 
 
 
 
 
 
 
 
18
 
19
  ---
20
 
 
3
  inference: true
4
  ---
5
 
6
+
7
+ ### Links
8
+ - [13B version of this model](https://huggingface.co/eachadea/ggml-vicuna-13b-1.1)
9
+ - [Set up with gpt4all-chat (one-click setup, available in in-app download menu)](https://gpt4all.io/index.html)
10
+ - [Set up with llama.cpp](https://github.com/ggerganov/llama.cpp)
11
+ - [Set up with oobabooga/text-generation-webui](https://github.com/oobabooga/text-generation-webui/blob/main/docs/llama.cpp-models.md)
12
+
13
+ ### Info
14
+ - Main files are based on v1.1 release
15
+ - See changelog below
16
+ - Use prompt template: ```HUMAN: <prompt> ASSISTANT: <response>```
17
+ - Uncensored files are based on v0 release
18
+ - Use prompt template: ```### User: <prompt> ### Assistant: <response>```
19
  - PR #896 was used for q4_0. Everything else is latest as of upload time.
20
+
21
+ ### Quantization
22
+ Several quantization methods are supported. They differ in the resulting model disk size and inference speed.
23
+
24
+ Model | F16 | Q4_0 | Q4_1 | Q4_2 | Q4_3 | Q5_0 | Q5_1 | Q8_0
25
+ -- | -- | -- | -- | -- | -- | -- | -- | --
26
+ 7B (ppl) | 5.9565 | 6.2103 | 6.1286 | 6.1698 | 6.0617 | 6.0139 | 5.9934 | 5.9571
27
+ 7B (size) | 13.0G | 4.0G | 4.8G | 4.0G | 4.8G | 4.4G | 4.8G | 7.1G
28
+ 7B (ms/tok @ 4th) | 128 | 56 | 61 | 84 | 91 | 91 | 95 | 75
29
+ 7B (ms/tok @ 8th) | 128 | 47 | 55 | 48 | 53 | 53 | 59 | 75
30
+ 7B (bpw) | 16.0 | 5.0 | 6.0 | 5.0 | 6.0 | 5.5 | 6.0 | 9.0
31
+ -- | -- | -- | -- | -- | -- | -- | -- | --
32
+ 13B (ppl) | 5.2455 | 5.3748 | 5.3471 | 5.3433 | 5.3234 | 5.2768 | 5.2582 | 5.2458
33
+ 13B (size) | 25.0G | 7.6G | 9.1G | 7.6G | 9.1G | 8.4G | 9.1G | 14G
34
+ 13B (ms/tok @ 4th) | 239 | 104 | 113 | 160 | 175 | 176 | 185 | 141
35
+ 13B (ms/tok @ 8th) | 240 | 85 | 99 | 97 | 114 | 108 | 117 | 147
36
+ 13B (bpw) | 16.0 | 5.0 | 6.0 | 5.0 | 6.0 | 5.5 | 6.0 | 9.0
37
+
38
+ q5_1 or 5_0 are the latest and most performant implementations. The former is slightly more accurate at the cost of a bit of performance. Most users should use one of the two.
39
+ If you encounter any kind of compatibility issues, you might want to try the older q4_x
40
 
41
  ---
42