TheBloke commited on
Commit
b779884
1 Parent(s): 7427c59

Update for Transformers GPTQ support

Browse files
README.md CHANGED
@@ -4,17 +4,20 @@ license: other
4
  ---
5
 
6
  <!-- header start -->
7
- <div style="width: 100%;">
8
- <img src="https://i.imgur.com/EBdldam.jpg" alt="TheBlokeAI" style="width: 100%; min-width: 400px; display: block; margin: auto;">
 
9
  </div>
10
  <div style="display: flex; justify-content: space-between; width: 100%;">
11
  <div style="display: flex; flex-direction: column; align-items: flex-start;">
12
- <p><a href="https://discord.gg/Jq4vkcDakD">Chat & support: my new Discord server</a></p>
13
  </div>
14
  <div style="display: flex; flex-direction: column; align-items: flex-end;">
15
- <p><a href="https://www.patreon.com/TheBlokeAI">Want to contribute? TheBloke's Patreon page</a></p>
16
  </div>
17
  </div>
 
 
18
  <!-- header end -->
19
 
20
  # Kaist AI's Selfee 13B GPTQ
@@ -72,11 +75,12 @@ It was created with group_size 128 to increase inference accuracy, but without -
72
  * Parameters: Groupsize = 128. Act Order / desc_act = False.
73
 
74
  <!-- footer start -->
 
75
  ## Discord
76
 
77
  For further support, and discussions on these models and AI in general, join us at:
78
 
79
- [TheBloke AI's Discord server](https://discord.gg/Jq4vkcDakD)
80
 
81
  ## Thanks, and how to contribute.
82
 
@@ -91,12 +95,15 @@ Donaters will get priority support on any and all AI/LLM/model questions and req
91
  * Patreon: https://patreon.com/TheBlokeAI
92
  * Ko-Fi: https://ko-fi.com/TheBlokeAI
93
 
94
- **Special thanks to**: Luke from CarbonQuill, Aemon Algiz, Dmitriy Samsonov.
 
 
95
 
96
- **Patreon special mentions**: Derek Yates, Sean Connelly, Luke, Nathan LeClaire, Trenton Dambrowitz, Mano Prime, David Flickinger, vamX, Nikolai Manek, senxiiz, Khalefa Al-Ahmad, Illia Dulskyi, trip7s trip, Jonathan Leane, Talal Aujan, Artur Olbinski, Cory Kujawski, Joseph William Delisle, Pyrater, Oscar Rangel, Lone Striker, Luke Pendergrass, Eugene Pentland, Johann-Peter Hartmann.
97
 
98
  Thank you to all my generous patrons and donaters!
99
 
 
 
100
  <!-- footer end -->
101
 
102
  # Original model card: Kaist AI's Selfee 13B
@@ -140,7 +147,7 @@ For other datsets, we do not need special data collection method.
140
  To train our model with high-quality instructions and answer pairs, we utilized data augmentation using OpenAI API calls. The process involved three steps. <br>
141
  Firstly, we collected various instructions from multiple fields and fed them to ChatGPT to generate answers. <br>
142
  Secondly, we gathered feedback on the generated answer by querying ChatGPT again and asked it to determine if the initial answer required any revision. <br>
143
- Thirdly, if a revision was necessary, we passed the instruction, initial answer, and feedback pair to ChatGPT to generate a revised answer and its feedback pair.
144
  We repeated the process until we received feedback that required no further revision or hit the maximum iteration. However, due to the token limitation of the ChatGPT API, we had to truncate some instances that needed more than 4096 tokens while augmenting.<br>
145
  You can see the details with command [here](data_augmentation/README.md).<br>
146
  *We provide the whole dataset after collection and augmentation using huggingface([code](data_collection/download_train.py)), so you can either use the code or follow our [data merging step](outputs/README.md) to replicate the training dataset. Feel free to use any of them!
@@ -202,17 +209,17 @@ python inference/apply_delta.py --path_raw {path_to_llama_7b} --path_tuned /ckpt
202
 
203
  Because SelFee is trained to generate iterative feedback and revisions until the response is satisfying, it automatically generates iterative feedback and revisions on a single forward pass. The model autonomously decides when to stop generating revisions based on the feedback. If the feedback chain ends with sequences like `Revision is not needed.`, the model autonomously terminates generation. <br>
204
 
205
- For autonomous inference mode,
206
 
207
  ```
208
- python inference/inference.py --model-path "ckpt/selfee-7b" --model-id "selfee" --question-file "evaluation/template/question.jsonl" --answer-file "evaluation/answer/selfee_7b_autonomous.jsonl"
209
  ```
210
 
211
 
212
  <b>Revision Enforce Inference Mode</b><br>
213
- We observed that increasing the minimum number of required revisions corresponds to a corresponding increase in performance. To enforce revisions, we automatically replace sequences such as `Revision is not needed.` into `Revision is needed.` during self-feedback generation. Because SelFee is trained to generate `Revision {index}:` after the sequence of `Revision is needed.`, the model would continually revise the answer.
214
 
215
- For revision enforce inference mode, use the `max-num-revision` argument.
216
 
217
  ```
218
  python inference/inference.py --model-path "ckpt/selfee-7b" --model-id "selfee" --question-file "evaluation/template/question.jsonl" --answer-file "evaluation/answer/selfee_7b_enforce_3_revision.jsonl" --max-num-revision 3
@@ -231,7 +238,7 @@ First, you need to get your API key to get access to the GPT-4 API.
231
  export OPENAI_API_KEYS={personal_key}
232
  ```
233
 
234
- To compare the performance of a generation result (for example, located on `evaluation/answer/file_A.jsonl`) with another generation result (located on `evaluation/anwer/file_B.jsonl`),
235
 
236
 
237
  ```
@@ -244,7 +251,7 @@ To mitigate the positional bias of GPT-4 model, we apply a bidirectional evaluat
244
  python evaluation/gpt4_automatic_evaluation.py -q evaluation/template/question.jsonl -a evaluation/answer/file_B.jsonl evaluation/answer/file_A.jsonl -p evaluation/template/prompt.jsonl -r evaluation/template/reviewer.jsonl -o evaluation/review/B_vs_A.jsonl
245
  ```
246
 
247
- ## Limitations
248
  Similar to other LLaMA-finetuned models, SelFee also make some mistakes especially for math, reasoning, factuality, and coding tasks. Although our performance outperforms ChatGPT on Vicuna setting, the evaluation setting contains some limitations in terms of comprehension (limited to 80 queries), inconsistency, and unreliability. Therefore, further research for a better evaluation setting is needed. Please take these claims with a grain of salt.
249
 
250
  ## Online demo
 
4
  ---
5
 
6
  <!-- header start -->
7
+ <!-- 200823 -->
8
+ <div style="width: auto; margin-left: auto; margin-right: auto">
9
+ <img src="https://i.imgur.com/EBdldam.jpg" alt="TheBlokeAI" style="width: 100%; min-width: 400px; display: block; margin: auto;">
10
  </div>
11
  <div style="display: flex; justify-content: space-between; width: 100%;">
12
  <div style="display: flex; flex-direction: column; align-items: flex-start;">
13
+ <p style="margin-top: 0.5em; margin-bottom: 0em;"><a href="https://discord.gg/theblokeai">Chat & support: TheBloke's Discord server</a></p>
14
  </div>
15
  <div style="display: flex; flex-direction: column; align-items: flex-end;">
16
+ <p style="margin-top: 0.5em; margin-bottom: 0em;"><a href="https://www.patreon.com/TheBlokeAI">Want to contribute? TheBloke's Patreon page</a></p>
17
  </div>
18
  </div>
19
+ <div style="text-align:center; margin-top: 0em; margin-bottom: 0em"><p style="margin-top: 0.25em; margin-bottom: 0em;">TheBloke's LLM work is generously supported by a grant from <a href="https://a16z.com">andreessen horowitz (a16z)</a></p></div>
20
+ <hr style="margin-top: 1.0em; margin-bottom: 1.0em;">
21
  <!-- header end -->
22
 
23
  # Kaist AI's Selfee 13B GPTQ
 
75
  * Parameters: Groupsize = 128. Act Order / desc_act = False.
76
 
77
  <!-- footer start -->
78
+ <!-- 200823 -->
79
  ## Discord
80
 
81
  For further support, and discussions on these models and AI in general, join us at:
82
 
83
+ [TheBloke AI's Discord server](https://discord.gg/theblokeai)
84
 
85
  ## Thanks, and how to contribute.
86
 
 
95
  * Patreon: https://patreon.com/TheBlokeAI
96
  * Ko-Fi: https://ko-fi.com/TheBlokeAI
97
 
98
+ **Special thanks to**: Aemon Algiz.
99
+
100
+ **Patreon special mentions**: Sam, theTransient, Jonathan Leane, Steven Wood, webtim, Johann-Peter Hartmann, Geoffrey Montalvo, Gabriel Tamborski, Willem Michiel, John Villwock, Derek Yates, Mesiah Bishop, Eugene Pentland, Pieter, Chadd, Stephen Murray, Daniel P. Andersen, terasurfer, Brandon Frisco, Thomas Belote, Sid, Nathan LeClaire, Magnesian, Alps Aficionado, Stanislav Ovsiannikov, Alex, Joseph William Delisle, Nikolai Manek, Michael Davis, Junyu Yang, K, J, Spencer Kim, Stefan Sabev, Olusegun Samson, transmissions 11, Michael Levine, Cory Kujawski, Rainer Wilmers, zynix, Kalila, Luke @flexchar, Ajan Kanaga, Mandus, vamX, Ai Maven, Mano Prime, Matthew Berman, subjectnull, Vitor Caleffi, Clay Pascal, biorpg, alfie_i, 阿明, Jeffrey Morgan, ya boyyy, Raymond Fosdick, knownsqashed, Olakabola, Leonard Tan, ReadyPlayerEmma, Enrico Ros, Dave, Talal Aujan, Illia Dulskyi, Sean Connelly, senxiiz, Artur Olbinski, Elle, Raven Klaugh, Fen Risland, Deep Realms, Imad Khwaja, Fred von Graf, Will Dee, usrbinkat, SuperWojo, Alexandros Triantafyllidis, Swaroop Kallakuri, Dan Guido, John Detwiler, Pedro Madruga, Iucharbius, Viktor Bowallius, Asp the Wyvern, Edmond Seymore, Trenton Dambrowitz, Space Cruiser, Spiking Neurons AB, Pyrater, LangChain4j, Tony Hughes, Kacper Wikieł, Rishabh Srivastava, David Ziegler, Luke Pendergrass, Andrey, Gabriel Puliatti, Lone Striker, Sebastain Graf, Pierre Kircher, Randy H, NimbleBox.ai, Vadim, danny, Deo Leter
101
 
 
102
 
103
  Thank you to all my generous patrons and donaters!
104
 
105
+ And thank you again to a16z for their generous grant.
106
+
107
  <!-- footer end -->
108
 
109
  # Original model card: Kaist AI's Selfee 13B
 
147
  To train our model with high-quality instructions and answer pairs, we utilized data augmentation using OpenAI API calls. The process involved three steps. <br>
148
  Firstly, we collected various instructions from multiple fields and fed them to ChatGPT to generate answers. <br>
149
  Secondly, we gathered feedback on the generated answer by querying ChatGPT again and asked it to determine if the initial answer required any revision. <br>
150
+ Thirdly, if a revision was necessary, we passed the instruction, initial answer, and feedback pair to ChatGPT to generate a revised answer and its feedback pair.
151
  We repeated the process until we received feedback that required no further revision or hit the maximum iteration. However, due to the token limitation of the ChatGPT API, we had to truncate some instances that needed more than 4096 tokens while augmenting.<br>
152
  You can see the details with command [here](data_augmentation/README.md).<br>
153
  *We provide the whole dataset after collection and augmentation using huggingface([code](data_collection/download_train.py)), so you can either use the code or follow our [data merging step](outputs/README.md) to replicate the training dataset. Feel free to use any of them!
 
209
 
210
  Because SelFee is trained to generate iterative feedback and revisions until the response is satisfying, it automatically generates iterative feedback and revisions on a single forward pass. The model autonomously decides when to stop generating revisions based on the feedback. If the feedback chain ends with sequences like `Revision is not needed.`, the model autonomously terminates generation. <br>
211
 
212
+ For autonomous inference mode,
213
 
214
  ```
215
+ python inference/inference.py --model-path "ckpt/selfee-7b" --model-id "selfee" --question-file "evaluation/template/question.jsonl" --answer-file "evaluation/answer/selfee_7b_autonomous.jsonl"
216
  ```
217
 
218
 
219
  <b>Revision Enforce Inference Mode</b><br>
220
+ We observed that increasing the minimum number of required revisions corresponds to a corresponding increase in performance. To enforce revisions, we automatically replace sequences such as `Revision is not needed.` into `Revision is needed.` during self-feedback generation. Because SelFee is trained to generate `Revision {index}:` after the sequence of `Revision is needed.`, the model would continually revise the answer.
221
 
222
+ For revision enforce inference mode, use the `max-num-revision` argument.
223
 
224
  ```
225
  python inference/inference.py --model-path "ckpt/selfee-7b" --model-id "selfee" --question-file "evaluation/template/question.jsonl" --answer-file "evaluation/answer/selfee_7b_enforce_3_revision.jsonl" --max-num-revision 3
 
238
  export OPENAI_API_KEYS={personal_key}
239
  ```
240
 
241
+ To compare the performance of a generation result (for example, located on `evaluation/answer/file_A.jsonl`) with another generation result (located on `evaluation/anwer/file_B.jsonl`),
242
 
243
 
244
  ```
 
251
  python evaluation/gpt4_automatic_evaluation.py -q evaluation/template/question.jsonl -a evaluation/answer/file_B.jsonl evaluation/answer/file_A.jsonl -p evaluation/template/prompt.jsonl -r evaluation/template/reviewer.jsonl -o evaluation/review/B_vs_A.jsonl
252
  ```
253
 
254
+ ## Limitations
255
  Similar to other LLaMA-finetuned models, SelFee also make some mistakes especially for math, reasoning, factuality, and coding tasks. Although our performance outperforms ChatGPT on Vicuna setting, the evaluation setting contains some limitations in terms of comprehension (limited to 80 queries), inconsistency, and unreliability. Therefore, further research for a better evaluation setting is needed. Please take these claims with a grain of salt.
256
 
257
  ## Online demo
config.json CHANGED
@@ -1,24 +1,34 @@
1
  {
2
- "_name_or_path": "/workspace/process/selfee-13b/delta",
3
- "architectures": [
4
- "LlamaForCausalLM"
5
- ],
6
- "bos_token_id": 1,
7
- "eos_token_id": 2,
8
- "hidden_act": "silu",
9
- "hidden_size": 5120,
10
- "initializer_range": 0.02,
11
- "intermediate_size": 13824,
12
- "max_position_embeddings": 2048,
13
- "max_sequence_length": 2048,
14
- "model_type": "llama",
15
- "num_attention_heads": 40,
16
- "num_hidden_layers": 40,
17
- "pad_token_id": 0,
18
- "rms_norm_eps": 1e-06,
19
- "tie_word_embeddings": false,
20
- "torch_dtype": "float32",
21
- "transformers_version": "4.30.0.dev0",
22
- "use_cache": true,
23
- "vocab_size": 32001
 
 
 
 
 
 
 
 
 
 
24
  }
 
1
  {
2
+ "_name_or_path": "/workspace/process/selfee-13b/delta",
3
+ "architectures": [
4
+ "LlamaForCausalLM"
5
+ ],
6
+ "bos_token_id": 1,
7
+ "eos_token_id": 2,
8
+ "hidden_act": "silu",
9
+ "hidden_size": 5120,
10
+ "initializer_range": 0.02,
11
+ "intermediate_size": 13824,
12
+ "max_position_embeddings": 2048,
13
+ "max_sequence_length": 2048,
14
+ "model_type": "llama",
15
+ "num_attention_heads": 40,
16
+ "num_hidden_layers": 40,
17
+ "pad_token_id": 0,
18
+ "rms_norm_eps": 1e-06,
19
+ "tie_word_embeddings": false,
20
+ "torch_dtype": "float32",
21
+ "transformers_version": "4.30.0.dev0",
22
+ "use_cache": true,
23
+ "vocab_size": 32001,
24
+ "quantization_config": {
25
+ "bits": 4,
26
+ "group_size": 128,
27
+ "damp_percent": 0.01,
28
+ "desc_act": false,
29
+ "sym": true,
30
+ "true_sequential": true,
31
+ "model_file_base_name": "model",
32
+ "quant_method": "gptq"
33
+ }
34
  }
selfee-13b-GPTQ-4bit-128g.no-act.order.safetensors → model.safetensors RENAMED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:3b483ce1e0cbf2140e764897d2a8fd5290be2c8fa72e63051060fcf0538e5ba0
3
- size 8111029176
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c0dcd0a34cde469eb7971cf76305642da4b4f52f600010fab0b14d3789214d28
3
+ size 8111029232
quantize_config.json CHANGED
@@ -1,8 +1,9 @@
1
  {
2
- "bits": 4,
3
- "group_size": 128,
4
- "damp_percent": 0.01,
5
- "desc_act": false,
6
- "sym": true,
7
- "true_sequential": true
 
8
  }
 
1
  {
2
+ "bits": 4,
3
+ "group_size": 128,
4
+ "damp_percent": 0.01,
5
+ "desc_act": false,
6
+ "sym": true,
7
+ "true_sequential": true,
8
+ "model_file_base_name": "model"
9
  }