eachadea commited on
Commit
e47bbbb
1 Parent(s): 54d3b22

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -5
README.md CHANGED
@@ -1,11 +1,19 @@
1
  ---
2
- license: apache-2.0
3
  inference: true
4
  ---
5
 
 
 
 
 
 
 
 
 
 
6
 
7
  ### Links
8
- - [13B version of this model](https://huggingface.co/eachadea/ggml-vicuna-13b-1.1)
9
  - [Set up with gpt4all-chat (one-click setup, available in in-app download menu)](https://gpt4all.io/index.html)
10
  - [Set up with llama.cpp](https://github.com/ggerganov/llama.cpp)
11
  - [Set up with oobabooga/text-generation-webui](https://github.com/oobabooga/text-generation-webui/blob/main/docs/llama.cpp-models.md)
@@ -38,9 +46,6 @@ Model | F16 | Q4_0 | Q4_1 | Q4_2 | Q4_3 | Q5_0 | Q5_1 | Q8_0
38
  q5_1 or 5_0 are the latest and most performant implementations. The former is slightly more accurate at the cost of a bit of performance. Most users should use one of the two.
39
  If you encounter any kind of compatibility issues, you might want to try the older q4_x
40
 
41
- **NOTE: q4_3 is EOL - avoid using.**
42
-
43
-
44
  ---
45
 
46
  # Vicuna Model Card
 
1
  ---
 
2
  inference: true
3
  ---
4
 
5
+ ### NOTE:
6
+ The PR [#1405](https://github.com/ggerganov/llama.cpp/pull/1405) brought breaking changes - none of the old models work with the latest build of llama.cpp.
7
+
8
+ Pre-PR #1405 files have been marked as old but remain accessible for those who need them.
9
+
10
+ Additionally, `q4_3` and `q4_2` have been completely axed in favor of their 5-bit counterparts (q5_1 and q5_0, respectively).
11
+
12
+ New files inference up to 10% faster without any quality reduction.
13
+
14
 
15
  ### Links
16
+ - [7B version of this model](https://huggingface.co/eachadea/ggml-vicuna-7b-1.1)
17
  - [Set up with gpt4all-chat (one-click setup, available in in-app download menu)](https://gpt4all.io/index.html)
18
  - [Set up with llama.cpp](https://github.com/ggerganov/llama.cpp)
19
  - [Set up with oobabooga/text-generation-webui](https://github.com/oobabooga/text-generation-webui/blob/main/docs/llama.cpp-models.md)
 
46
  q5_1 or 5_0 are the latest and most performant implementations. The former is slightly more accurate at the cost of a bit of performance. Most users should use one of the two.
47
  If you encounter any kind of compatibility issues, you might want to try the older q4_x
48
 
 
 
 
49
  ---
50
 
51
  # Vicuna Model Card