eachadea
/

ggml-vicuna-13b-1.1

Document Question Answering

Model card Files Files and versions Community

eachadea commited on Apr 27, 2023

Commit

1da5ba4

1 Parent(s): 2db543d

Update README.md

Browse files

Files changed (1) hide show

README.md +34 -11

README.md CHANGED Viewed

@@ -1,20 +1,42 @@
 ---
 license: apache-2.0
-inference: false
 ---
-**NOTE: This GGML conversion is primarily for use with llama.cpp.**
 - PR #896 was used for q4_0. Everything else is latest as of upload time.
-- A warning for q4_2 and q4_3: These are WIP. Do not expect any kind of backwards compatibility until they are finalized.
-- 7B can be found here: https://huggingface.co/eachadea/ggml-vicuna-7b-1.1
-- **Choosing the right model:**
-  - `ggml-vicuna-13b-1.1-q4_0` - Fast, lacks in accuracy.
-  - `ggml-vicuna-13b-1.1-q4_1` - More accurate, lacks in speed.
-  - `ggml-vicuna-13b-1.1-q4_2` - Pretty much a better `q4_0`. Similarly fast, but more accurate.
-  - `ggml-vicuna-13b-1.1-q4_3` - Pretty much a better `q4_1`. More accurate, still pretty slow.
-  - `ggml-vicuna-13b-1.0-uncensored` - Available in `q4_2` and `q4_3`, is an uncensored/unfiltered variant of the model. It is based on the previous release and still uses the `### Human:` syntax. Avoid unless you need it.
 ---
@@ -50,6 +72,7 @@ The primary intended users of the model are researchers and hobbyists in natural
 ## Training dataset
 70K conversations collected from ShareGPT.com.
 ## Evaluation dataset
 A preliminary evaluation of the model quality is conducted by creating a set of 80 diverse questions and utilizing GPT-4 to judge the model outputs. See https://vicuna.lmsys.org/ for more details.

 ---
 license: apache-2.0
+inference: true
 ---
+### Links
+- [7B version of this model](https://huggingface.co/eachadea/ggml-vicuna-7b-1.1)
+- [Set up with gpt4all-chat (one-click setup, available in in-app download menu)](https://gpt4all.io/index.html)
+- [Set up with llama.cpp](https://github.com/ggerganov/llama.cpp)
+- [Set up with oobabooga/text-generation-webui](https://github.com/oobabooga/text-generation-webui/blob/main/docs/llama.cpp-models.md)
+### Info
+- Main files are based on v1.1 release
+  - See changelog below
+  - Use prompt template: ```HUMAN: <prompt> ASSISTANT: <response>```
+- Uncensored files are based on v0 release
+  - Use prompt template: ```### User: <prompt> ### Assistant: <response>```
 - PR #896 was used for q4_0. Everything else is latest as of upload time.
+### Quantization
+Several quantization methods are supported. They differ in the resulting model disk size and inference speed.
+Model | F16 | Q4_0 | Q4_1 | Q4_2 | Q4_3 | Q5_0 | Q5_1 | Q8_0
+-- | -- | -- | -- | -- | -- | -- | -- | --
+7B (ppl) | 5.9565 | 6.2103 | 6.1286 | 6.1698 | 6.0617 | 6.0139 | 5.9934 | 5.9571
+7B (size) | 13.0G | 4.0G | 4.8G | 4.0G | 4.8G | 4.4G | 4.8G | 7.1G
+7B (ms/tok @ 4th) | 128 | 56 | 61 | 84 | 91 | 91 | 95 | 75
+7B (ms/tok @ 8th) | 128 | 47 | 55 | 48 | 53 | 53 | 59 | 75
+7B (bpw) | 16.0 | 5.0 | 6.0 | 5.0 | 6.0 | 5.5 | 6.0 | 9.0
+-- | -- | -- | -- | -- | -- | -- | -- | --
+13B (ppl) | 5.2455 | 5.3748 | 5.3471 | 5.3433 | 5.3234 | 5.2768 | 5.2582 | 5.2458
+13B (size) | 25.0G | 7.6G | 9.1G | 7.6G | 9.1G | 8.4G | 9.1G | 14G
+13B (ms/tok @ 4th) | 239 | 104 | 113 | 160 | 175 | 176 | 185 | 141
+13B (ms/tok @ 8th) | 240 | 85 | 99 | 97 | 114 | 108 | 117 | 147
+13B (bpw) | 16.0 | 5.0 | 6.0 | 5.0 | 6.0 | 5.5 | 6.0 | 9.0
+q5_1 or 5_0 are the latest and most performant implementations. The former is slightly more accurate at the cost of a bit of performance. Most users should use one of the two.
+If you encounter any kind of compatibility issues, you might want to try the older q4_x
 ---
 ## Training dataset
 70K conversations collected from ShareGPT.com.
+(48k for the uncensored variant. 22k worth of garbage removed – see https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered)
 ## Evaluation dataset
 A preliminary evaluation of the model quality is conducted by creating a set of 80 diverse questions and utilizing GPT-4 to judge the model outputs. See https://vicuna.lmsys.org/ for more details.