TheBloke commited on
Commit
748baf3
1 Parent(s): 8477d26

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +51 -7
README.md CHANGED
@@ -4,21 +4,63 @@ license: other
4
  # Koala: A Dialogue Model for Academic Research
5
  This repo contains the weights of the Koala 7B model produced at Berkeley. It is the result of combining the diffs from https://huggingface.co/young-geng/koala with the original Llama 7B model.
6
 
7
- This version has then been quantized to 4bit using https://github.com/qwopqwop200/GPTQ-for-LLaMa
8
 
9
- These other versions are also available:
10
- * [Unquantized model in HF format](https://huggingface.co/TheBloke/koala-7B-HF)
11
- * [Unquantized model in GGML format for llama.cpp](https://huggingface.co/TheBloke/koala-7b-ggml-unquantized)
12
 
13
- ### WARNING: At the present time the GPTQ files uploaded here seem to be producing garbage output. It is not recommended to use them.
 
 
 
 
14
 
15
- I'm working on diagnosing this issue. If you manage to get the files working, please let me know!
16
 
17
- Quantization command was:
18
  ```
19
  python3 llama.py /content/koala-7B-HF c4 --wbits 4 --true-sequential --act-order --groupsize 128 --save /content/koala-7B-4bit-128g.pt
20
  ```
21
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
22
  The Koala delta weights were originally merged using the following commands, producing [koala-7B-HF](https://huggingface.co/TheBloke/koala-7B-HF):
23
  ```
24
  git clone https://github.com/young-geng/EasyLM
@@ -49,6 +91,8 @@ PYTHON_PATH="${PWD}:$PYTHONPATH" python \
49
  --tokenizer_path=/content/LLaMA-7B/tokenizer.model
50
  ```
51
 
 
 
52
  Check out the following links to learn more about the Berkeley Koala model.
53
  * [Blog post](https://bair.berkeley.edu/blog/2023/04/03/koala/)
54
  * [Online demo](https://koala.lmsys.org/)
 
4
  # Koala: A Dialogue Model for Academic Research
5
  This repo contains the weights of the Koala 7B model produced at Berkeley. It is the result of combining the diffs from https://huggingface.co/young-geng/koala with the original Llama 7B model.
6
 
7
+ This version has then been quantized to 4-bit using [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaMa).
8
 
9
+ ## Other Koala repos
 
 
10
 
11
+ I have also made these other Koala repose available:
12
+ * [GPTQ quantized 4bit 13B model in HF format](https://huggingface.co/TheBloke/koala-13B-GPTQ-4bit-128g)
13
+ * [Unquantized 13B model in HF format](https://huggingface.co/TheBloke/koala-13B-HF)
14
+ * [Unquantized 7B model in HF format](https://huggingface.co/TheBloke/koala-7B-HF)
15
+ * [Unquantized 7B model in GGML format for llama.cpp](https://huggingface.co/TheBloke/koala-7b-ggml-unquantized)
16
 
17
+ ## Quantization method
18
 
19
+ This GPTQ model was quantized using [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaMa) with the following command:
20
  ```
21
  python3 llama.py /content/koala-7B-HF c4 --wbits 4 --true-sequential --act-order --groupsize 128 --save /content/koala-7B-4bit-128g.pt
22
  ```
23
 
24
+ I created this model using the latest Triton branch of GPTQ-for-LLaMa but I believe it can be run with the CUDA branch also.
25
+
26
+ ## Provided files
27
+
28
+ I have provided both a `pt` and `safetensors` file. Either should work.
29
+
30
+ If both are present in the model directory for text-generation-webui I am not sure which it picks, so if you need one or the other specifically I'd recommend just downloading the one you need.
31
+
32
+ The `olderFormat` file was created with the aim of then converting it to GGML for use with llama.cpp. At present this file does not work.
33
+
34
+ ## How to run with text-generation-webui
35
+
36
+ The model files provided will not load as-is with [oobaboogas text-generation-webui](https://github.com/oobabooga/text-generation-webui).
37
+
38
+ They require the latest version of the GPTQ code.
39
+
40
+ Here are the commands I used to clone GPTQ-for-LLaMa, clone text-generation-webui, and install GPTQ into the UI:
41
+ ```
42
+ git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa
43
+ git clone https://github.com/oobabooga/text-generation-webui
44
+ mkdir -p text-generation-webui/repositories
45
+ ln -s GPTQ-for-LLaMa text-generation-webui/repositories/GPTQ-for-LLaMa
46
+ ```
47
+
48
+ Then install this model into `text-generation-webui/models` and run text-generation-webui as follows:
49
+ ```
50
+ cd text-generation-webui
51
+ python server.py --model koala-7B-GPTQ-4bit-128g --wbits 4 --groupsize 128 --model_type Llama
52
+ ```
53
+
54
+ The above commands assume you have installed all dependencies for GPTQ-for-LLaMa and text-generation-webui. Please see their respective repositories for further information.
55
+
56
+ If you cannot use the Triton branch for any reason, I believe it should also work to use the CUDA branch instead:
57
+ ```
58
+ git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa -b cuda
59
+ ```
60
+ Then link that into `text-generation-webui/repositories` as described above.
61
+
62
+ ## How the Koala delta weights were merged
63
+
64
  The Koala delta weights were originally merged using the following commands, producing [koala-7B-HF](https://huggingface.co/TheBloke/koala-7B-HF):
65
  ```
66
  git clone https://github.com/young-geng/EasyLM
 
91
  --tokenizer_path=/content/LLaMA-7B/tokenizer.model
92
  ```
93
 
94
+ ## Further info
95
+
96
  Check out the following links to learn more about the Berkeley Koala model.
97
  * [Blog post](https://bair.berkeley.edu/blog/2023/04/03/koala/)
98
  * [Online demo](https://koala.lmsys.org/)