Update README.md
Browse files
README.md
CHANGED
@@ -56,10 +56,10 @@ bin/falcon -m /path/to/Falcon-40b-Instruct.ggmlv3.q4_0.bin -t 10 -n 200 -p "writ
|
|
56 |
## Provided files
|
57 |
| Name | Quant method | Bits | Size | Max RAM required | Use case |
|
58 |
| ---- | ---- | ---- | ---- | ---- | ----- |
|
59 |
-
| Falcon-40b-Instruct.ggmlv3.q4_0.bin | q4_0 | 4 | 23.54 GB | 26.04 GB |
|
60 |
-
| Falcon-40b-Instruct.ggmlv3.q4_1.bin | q4_1 | 4 | 26.15 GB | 28.65 GB |
|
61 |
-
| Falcon-40b-Instruct.ggmlv3.q5_0.bin | q5_0 | 5 | 28.77 GB | 31.27 GB |
|
62 |
-
| Falcon-40b-Instruct.ggmlv3.q5_1.bin | q5_1 | 5 | 31.38 GB | 33.88 GB |
|
63 |
|
64 |
A q8_0 file will be provided shortly. There is currently an issue preventing it from working. Once this is fixed, it will be uploaded.
|
65 |
|
|
|
56 |
## Provided files
|
57 |
| Name | Quant method | Bits | Size | Max RAM required | Use case |
|
58 |
| ---- | ---- | ---- | ---- | ---- | ----- |
|
59 |
+
| Falcon-40b-Instruct.ggmlv3.q4_0.bin | q4_0 | 4 | 23.54 GB | 26.04 GB | 4-bit. |
|
60 |
+
| Falcon-40b-Instruct.ggmlv3.q4_1.bin | q4_1 | 4 | 26.15 GB | 28.65 GB | 4-bit. Higher accuracy than q4_0 but not as high as q5_0. However has quicker inference than q5 models. |
|
61 |
+
| Falcon-40b-Instruct.ggmlv3.q5_0.bin | q5_0 | 5 | 28.77 GB | 31.27 GB | 5-bit. Higher accuracy, higher resource usage and slower inference. |
|
62 |
+
| Falcon-40b-Instruct.ggmlv3.q5_1.bin | q5_1 | 5 | 31.38 GB | 33.88 GB | 5-bit. Even higher accuracy, resource usage and slower inference. |
|
63 |
|
64 |
A q8_0 file will be provided shortly. There is currently an issue preventing it from working. Once this is fixed, it will be uploaded.
|
65 |
|