elinas commited on
Commit
59e55ec
1 Parent(s): 9173230

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +32 -0
README.md CHANGED
@@ -7,6 +7,38 @@ This has been converted to int4 via GPTQ method. See the repo below for more inf
7
 
8
  https://github.com/qwopqwop200/GPTQ-for-LLaMa
9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
  --
11
  license: other
12
  ---
 
7
 
8
  https://github.com/qwopqwop200/GPTQ-for-LLaMa
9
 
10
+ # Usage
11
+ 1. Run manually through GPTQ
12
+ 2. (More setup but better UI) - Use the [text-generation-webui](https://github.com/oobabooga/text-generation-webui/wiki/LLaMA-model#4-bit-mode). Make sure to follow the installation steps first [here](https://github.com/oobabooga/text-generation-webui#installation) before adding GPTQ support.
13
+
14
+ **Note that a recent code change in GPTQ broke functionality for GPTQ in general, so please follow [these instructions](https://huggingface.co/elinas/alpaca-30b-lora-int4/discussions/2#641a38d5f1ad1c1173d8f192) to fix the issue!**
15
+
16
+ Since this is instruction tuned, for best results, use the following format for inference:
17
+ ```
18
+ ### Instruction:
19
+ your-prompt
20
+ ### Response:
21
+ ```
22
+
23
+ If you want deterministic results, turn off sampling. You can turn it off in the webui by unchecking `do_sample`.
24
+
25
+ For cai-chat mode, you won't want to use instruction prompting, rather create a character and set sampler settings. Here is an example of settings that work well for me:
26
+ ```
27
+ do_sample=True
28
+ temperature=0.95
29
+ top_p=1
30
+ typical_p=1
31
+ repetition_penalty=1.1
32
+ top_k=40
33
+ num_beams=1
34
+ penalty_alpha=0
35
+ min_length=0
36
+ length_penalty=1
37
+ no_repeat_ngram_size=0
38
+ early_stopping=False
39
+ ```
40
+ You can then save this as a `.txt` file in the `presets` folder.
41
+
42
  --
43
  license: other
44
  ---