eachadea commited on
Commit
f2c2e2d
1 Parent(s): bb35084

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -4
README.md CHANGED
@@ -10,10 +10,10 @@ inference: false
10
  - Based on version 1.1
11
  - Used PR "More accurate Q4_0 and Q4_1 quantizations #896" (should be closer in quality to unquantized)
12
  - For q4_2, "Q4_2 ARM #1046" was used. Will update regularly if new changes are made.
13
- - Choosing between q4_0 and q4_1, the logic of higher number \= better does not apply. If you are confused, stick with q4_0.
14
- - If you have performance to spare, it might be worth getting the q4_1. It's ~20% slower and requires 1GB more RAM, but has a ~5% lower perplexity, which is good for generation quality. You're not gonna notice it though.
15
- - If you have *lots* of performance to spare, [TheBloke's conversion](https://huggingface.co/TheBloke/vicuna-13B-1.1-GPTQ-4bit-128g-GGML) is maybe ~7% better in perplexity but ~50% slower and requires 2GB more RAM.
16
-
17
 
18
  - 7B version of this can be found here: https://huggingface.co/eachadea/ggml-vicuna-7b-1.1
19
 
 
10
  - Based on version 1.1
11
  - Used PR "More accurate Q4_0 and Q4_1 quantizations #896" (should be closer in quality to unquantized)
12
  - For q4_2, "Q4_2 ARM #1046" was used. Will update regularly if new changes are made.
13
+ - **Choosing between q4_0, q4_1, and q4_2:**
14
+ - 4_0 is the fastest. The quality is the poorest.
15
+ - 4_1 is a lot slower. The quality is noticably better.
16
+ - 4_2 is almost as fast as 4_0 and about as good as 4_1 **on Apple Silicon**. On Intel/AMD it's hardly better or faster than 4_1.
17
 
18
  - 7B version of this can be found here: https://huggingface.co/eachadea/ggml-vicuna-7b-1.1
19