Update README.md
Browse files
README.md
CHANGED
@@ -3,9 +3,11 @@ inference: true
|
|
3 |
---
|
4 |
|
5 |
### NOTE:
|
6 |
-
The PR [#1405](https://github.com/ggerganov/llama.cpp/pull/1405) brought breaking changes - none of the old models work with the latest build of llama.cpp
|
7 |
-
|
8 |
-
|
|
|
|
|
9 |
|
10 |
|
11 |
### Links
|
@@ -42,8 +44,6 @@ Model | F16 | Q4_0 | Q4_1 | Q4_2 | Q4_3 | Q5_0 | Q5_1 | Q8_0
|
|
42 |
q5_1 or 5_0 are the latest and most performant implementations. The former is slightly more accurate at the cost of a bit of performance. Most users should use one of the two.
|
43 |
If you encounter any kind of compatibility issues, you might want to try the older q4_x
|
44 |
|
45 |
-
**NOTE: q4_3 and q4_2 are EOL - avoid using.**
|
46 |
-
|
47 |
---
|
48 |
|
49 |
# Vicuna Model Card
|
|
|
3 |
---
|
4 |
|
5 |
### NOTE:
|
6 |
+
The PR [#1405](https://github.com/ggerganov/llama.cpp/pull/1405) brought breaking changes - none of the old models work with the latest build of llama.cpp.
|
7 |
+
|
8 |
+
Pre-PR #1405 files have been marked as old but remain accessible for those who need them.
|
9 |
+
|
10 |
+
Additionally, `q4_3` and `q4_2` have been completely axed in favor of their 5-bit counterparts (q5_1 and q5_0, respectively).
|
11 |
|
12 |
|
13 |
### Links
|
|
|
44 |
q5_1 or 5_0 are the latest and most performant implementations. The former is slightly more accurate at the cost of a bit of performance. Most users should use one of the two.
|
45 |
If you encounter any kind of compatibility issues, you might want to try the older q4_x
|
46 |
|
|
|
|
|
47 |
---
|
48 |
|
49 |
# Vicuna Model Card
|