TheBloke
/

alpaca-lora-65B-GGML

Model card Files Files and versions Community

TheBloke commited on Apr 19, 2023

Commit

c27b183

•

1 Parent(s): 7ca8a6f

Update README.md

Files changed (1) hide show

README.md +42 -1

README.md CHANGED Viewed

@@ -1,7 +1,48 @@
 ---
 license: other
 ---
 Quantised 4bit and 2bit GGMLs of [changsung's alpaca-lora-65B](https://huggingface.co/chansung/alpaca-lora-65b)
-More details coming soon.

 ---
 license: other
+inference: false
 ---
+# Quantised GGMLs of alpaca-lora-65B
 Quantised 4bit and 2bit GGMLs of [changsung's alpaca-lora-65B](https://huggingface.co/chansung/alpaca-lora-65b)
+## Provided files
+This repository contains two model files:
+* 4bit - 39GB - `alpaca-lora-65B.GGML.q4_0.bin`
+* 2bit - 23GB - `alpaca-lora-65B.GGML.q2_0.bin`
+## Creation method, and requirements
+### 4bit q4_0
+This file was created using the new q4_0 quantisation method being trialled in [llama.cpp PR 896](https://github.com/ggerganov/llama.cpp/pull/896)
+At the time of writing, this code has not yet been merged into the main [llama.cpp repo](https://github.com/ggerganov/llama.cpp) but it is likely to be merged soon.
+This quantisation method is 4bit, but uses a new method to achieve a higher quality/lower perplexity than the previous method.
+You can run inference on this model using any recent version of `llama.cpp` - you do not need to use the code from PR 896 specifically.
+### 2bit q2_0
+This file was created using an even newer and more experimental 2bit method being trialled in [llama.cpp PR 1004](https://github.com/ggerganov/llama.cpp/pull/1004).
+Again, this code is not yet merged into the main `llama.cpp` repo.
+And, unlike the 4bit file, to run this file you DO need to compile and run the same `llama.cpp` code that was used to create it.
+To checkout this code and compile this version, do the following:
+```
+git clone https://github.com/sw/llama.cpp llama-q2q3
+cd llama-q2q3
+git checkout q2q3
+make
+```
+# Original model card not provided
+No model card was provided in [changsung's original repository](https://huggingface.co/chansung/alpaca-lora-65b).
+Based on the name, I assume this is the result of fine tuning using the original GPT 3.5 Alpaca dataset. It is unknown as to whether the original Stanford data was used, or the [cleaned tloen/alpaca-lora variant](https://github.com/tloen/alpaca-lora).