Update README.md
Browse files
README.md
CHANGED
@@ -1,7 +1,48 @@
|
|
1 |
---
|
2 |
license: other
|
|
|
3 |
---
|
4 |
|
|
|
|
|
5 |
Quantised 4bit and 2bit GGMLs of [changsung's alpaca-lora-65B](https://huggingface.co/chansung/alpaca-lora-65b)
|
6 |
|
7 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: other
|
3 |
+
inference: false
|
4 |
---
|
5 |
|
6 |
+
# Quantised GGMLs of alpaca-lora-65B
|
7 |
+
|
8 |
Quantised 4bit and 2bit GGMLs of [changsung's alpaca-lora-65B](https://huggingface.co/chansung/alpaca-lora-65b)
|
9 |
|
10 |
+
## Provided files
|
11 |
+
|
12 |
+
This repository contains two model files:
|
13 |
+
* 4bit - 39GB - `alpaca-lora-65B.GGML.q4_0.bin`
|
14 |
+
* 2bit - 23GB - `alpaca-lora-65B.GGML.q2_0.bin`
|
15 |
+
|
16 |
+
## Creation method, and requirements
|
17 |
+
|
18 |
+
### 4bit q4_0
|
19 |
+
|
20 |
+
This file was created using the new q4_0 quantisation method being trialled in [llama.cpp PR 896](https://github.com/ggerganov/llama.cpp/pull/896)
|
21 |
+
|
22 |
+
At the time of writing, this code has not yet been merged into the main [llama.cpp repo](https://github.com/ggerganov/llama.cpp) but it is likely to be merged soon.
|
23 |
+
|
24 |
+
This quantisation method is 4bit, but uses a new method to achieve a higher quality/lower perplexity than the previous method.
|
25 |
+
|
26 |
+
You can run inference on this model using any recent version of `llama.cpp` - you do not need to use the code from PR 896 specifically.
|
27 |
+
|
28 |
+
### 2bit q2_0
|
29 |
+
|
30 |
+
This file was created using an even newer and more experimental 2bit method being trialled in [llama.cpp PR 1004](https://github.com/ggerganov/llama.cpp/pull/1004).
|
31 |
+
|
32 |
+
Again, this code is not yet merged into the main `llama.cpp` repo.
|
33 |
+
|
34 |
+
And, unlike the 4bit file, to run this file you DO need to compile and run the same `llama.cpp` code that was used to create it.
|
35 |
+
|
36 |
+
To checkout this code and compile this version, do the following:
|
37 |
+
```
|
38 |
+
git clone https://github.com/sw/llama.cpp llama-q2q3
|
39 |
+
cd llama-q2q3
|
40 |
+
git checkout q2q3
|
41 |
+
make
|
42 |
+
```
|
43 |
+
|
44 |
+
# Original model card not provided
|
45 |
+
|
46 |
+
No model card was provided in [changsung's original repository](https://huggingface.co/chansung/alpaca-lora-65b).
|
47 |
+
|
48 |
+
Based on the name, I assume this is the result of fine tuning using the original GPT 3.5 Alpaca dataset. It is unknown as to whether the original Stanford data was used, or the [cleaned tloen/alpaca-lora variant](https://github.com/tloen/alpaca-lora).
|