eachadea
/

ggml-vicuna-7b-1.1

Model card Files Files and versions Community

eachadea commited on Apr 22, 2023

Commit

376d070

•

1 Parent(s): b6f8c1d

Update README.md

Files changed (1) hide show

README.md +14 -13

README.md CHANGED Viewed

@@ -3,19 +3,20 @@ license: apache-2.0
 inference: true
 ---
-**NOTE: This GGML conversion is primarily for use with llama.cpp.**
-- 7B parameters
-- 4-bit quantized
-- Based on version 1.1
-- Used best available quantization for each format
-- **Choosing between q4_0, q4_1, and q4_2:**
-  - 4_0 is the fastest. The quality is the poorest.
-  - 4_1 is slower. The quality is noticeably better.
-  - 4_2 generally offers the best speed to quality ratio. The drawback is that the format is WIP.
-- 13B version of this can be found here: https://huggingface.co/eachadea/ggml-vicuna-13b-1.1
-<br>
-<br>
 # Vicuna Model Card

 inference: true
 ---
+**NOTE: This GGML conversion is primarily for use with llama.cpp.**
+- PR #896 was used for q4_0. Everything else is latest as of upload time.
+- A warning for q4_2 and q4_3: These are WIP. Do not expect any kind of backwards compatibility until they are finalized.
+- 13B can be found here: https://huggingface.co/eachadea/ggml-vicuna-13b-1.1
+- **Choosing the right model:**
+  - `ggml-vicuna-7b-1.1-q4_0` - Fast, lacks in accuracy.
+  - `ggml-vicuna-7b-1.1-q4_1` - More accurate, lacks in speed.
+  - `ggml-vicuna-7b-1.1-q4_2` - Pretty much a better `q4_0`. Similarly fast, but more accurate.
+  - `ggml-vicuna-7b-1.1-q4_3` - Pretty much a better `q4_1`. More accurate, still slower.
+  - `ggml-vicuna-7b-1.0-uncensored` - Available in `q4_2` and `q4_3`, is an uncensored/unfiltered variant of the model. It is based on the previous release and still uses the `### Human:` syntax. Avoid unless you need it.
+---
 # Vicuna Model Card