eachadea
/

ggml-vicuna-13b-1.1

Document Question Answering

Model card Files Files and versions Community

eachadea commited on Apr 13, 2023

Commit

1053766

·

1 Parent(s): ff554e4

Update README.md

Files changed (1) hide show

README.md +3 -1

README.md CHANGED Viewed

@@ -8,7 +8,9 @@ inference: false
 - 4-bit quantized
 - Based on version 1.1
 - Used PR "More accurate Q4_0 and Q4_1 quantizations #896" (should be closer in quality to unquantized)
-- Choosing between q4_0 and q4_1, the logic of higher number \= better does not apply. If you are confused, stick with q4_0.
 - 7B version of this can be found here: https://huggingface.co/eachadea/ggml-vicuna-7b-1.1

 - 4-bit quantized
 - Based on version 1.1
 - Used PR "More accurate Q4_0 and Q4_1 quantizations #896" (should be closer in quality to unquantized)
+- Choosing between q4_0 and q4_1, the logic of higher number \= better does not apply. If you are confused, stick with q4_0.
+  - If you performance to spare, it might be worth getting the q4_1. It's ~20% slower and requires 1GB more RAM, but has a ~5% lower perplexity, which is good for generation quality. You're not gonna notice it though.
+  - If you have *lots* of performance to spare, [TheBloke's conversion](https://huggingface.co/TheBloke/vicuna-13B-1.1-GPTQ-4bit-128g-GGML) is maybe ~7% better in perplexity but ~50% slower and requires 2GB more RAM.
 - 7B version of this can be found here: https://huggingface.co/eachadea/ggml-vicuna-7b-1.1