TheBloke commited on
Commit
c19fef6
1 Parent(s): 565a506

Update for Transformers GPTQ support

Browse files
README.md CHANGED
@@ -16,17 +16,20 @@ tags:
16
  ---
17
 
18
  <!-- header start -->
19
- <div style="width: 100%;">
20
- <img src="https://i.imgur.com/EBdldam.jpg" alt="TheBlokeAI" style="width: 100%; min-width: 400px; display: block; margin: auto;">
 
21
  </div>
22
  <div style="display: flex; justify-content: space-between; width: 100%;">
23
  <div style="display: flex; flex-direction: column; align-items: flex-start;">
24
- <p><a href="https://discord.gg/theblokeai">Chat & support: my new Discord server</a></p>
25
  </div>
26
  <div style="display: flex; flex-direction: column; align-items: flex-end;">
27
- <p><a href="https://www.patreon.com/TheBlokeAI">Want to contribute? TheBloke's Patreon page</a></p>
28
  </div>
29
  </div>
 
 
30
  <!-- header end -->
31
 
32
  # Nous Hermes Llama 2 7B - GPTQ
@@ -63,13 +66,13 @@ Each separate quant is in a different branch. See below for instructions on fet
63
 
64
  | Branch | Bits | Group Size | Act Order (desc_act) | File Size | ExLlama Compatible? | Made With | Description |
65
  | ------ | ---- | ---------- | -------------------- | --------- | ------------------- | --------- | ----------- |
66
- | [main](https://huggingface.co/TheBloke/Nous-Hermes-Llama-2-7B-GPTQ/tree/main) | 4 | 128 | False | 3.90 GB | True | AutoGPTQ | Most compatible option. Good inference speed in AutoGPTQ and GPTQ-for-LLaMa. Lower inference quality than other options. |
67
- | [gptq-4bit-32g-actorder_True](https://huggingface.co/TheBloke/Nous-Hermes-Llama-2-7B-GPTQ/tree/gptq-4bit-32g-actorder_True) | 4 | 32 | True | 4.28 GB | True | AutoGPTQ | 4-bit, with Act Order and group size 32g. Gives highest possible inference quality, with maximum VRAM usage. Poor AutoGPTQ CUDA speed. |
68
- | [gptq-4bit-64g-actorder_True](https://huggingface.co/TheBloke/Nous-Hermes-Llama-2-7B-GPTQ/tree/gptq-4bit-64g-actorder_True) | 4 | 64 | True | 4.02 GB | True | AutoGPTQ | 4-bit, with Act Order and group size 64g. Uses less VRAM than 32g, but with slightly lower accuracy. Poor AutoGPTQ CUDA speed. |
69
- | [gptq-4bit-128g-actorder_True](https://huggingface.co/TheBloke/Nous-Hermes-Llama-2-7B-GPTQ/tree/gptq-4bit-128g-actorder_True) | 4 | 128 | True | 3.90 GB | True | AutoGPTQ | 4-bit, with Act Order and group size 128g. Uses even less VRAM than 64g, but with slightly lower accuracy. Poor AutoGPTQ CUDA speed. |
70
- | [gptq-8bit--1g-actorder_True](https://huggingface.co/TheBloke/Nous-Hermes-Llama-2-7B-GPTQ/tree/gptq-8bit--1g-actorder_True) | 8 | None | True | 7.01 GB | False | AutoGPTQ | 8-bit, with Act Order. No group size, to lower VRAM requirements and to improve AutoGPTQ speed. |
71
- | [gptq-8bit-128g-actorder_False](https://huggingface.co/TheBloke/Nous-Hermes-Llama-2-7B-GPTQ/tree/gptq-8bit-128g-actorder_False) | 8 | 128 | False | 7.16 GB | False | AutoGPTQ | 8-bit, with group size 128g for higher inference quality and without Act Order to improve AutoGPTQ speed. |
72
- | [gptq-8bit-128g-actorder_True](https://huggingface.co/TheBloke/Nous-Hermes-Llama-2-7B-GPTQ/tree/gptq-8bit-128g-actorder_True) | 8 | 128 | True | 7.16 GB | False | AutoGPTQ | 8-bit, with group size 128g for higher inference quality and with Act Order for even higher accuracy. Poor AutoGPTQ CUDA speed. |
73
  | [gptq-8bit-64g-actorder_True](https://huggingface.co/TheBloke/Nous-Hermes-Llama-2-7B-GPTQ/tree/gptq-8bit-64g-actorder_True) | 8 | 64 | True | 7.31 GB | False | AutoGPTQ | 8-bit, with group size 64g and Act Order for maximum inference quality. Poor AutoGPTQ CUDA speed. |
74
 
75
  ## How to download from branches
@@ -113,7 +116,7 @@ from transformers import AutoTokenizer, pipeline, logging
113
  from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
114
 
115
  model_name_or_path = "TheBloke/Nous-Hermes-Llama-2-7B-GPTQ"
116
- model_basename = "gptq_model-4bit-128g"
117
 
118
  use_triton = False
119
 
@@ -179,6 +182,7 @@ The files provided will work with AutoGPTQ (CUDA and Triton modes), GPTQ-for-LLa
179
  ExLlama works with Llama models in 4-bit. Please see the Provided Files table above for per-file compatibility.
180
 
181
  <!-- footer start -->
 
182
  ## Discord
183
 
184
  For further support, and discussions on these models and AI in general, join us at:
@@ -198,13 +202,15 @@ Donaters will get priority support on any and all AI/LLM/model questions and req
198
  * Patreon: https://patreon.com/TheBlokeAI
199
  * Ko-Fi: https://ko-fi.com/TheBlokeAI
200
 
201
- **Special thanks to**: Luke from CarbonQuill, Aemon Algiz.
202
 
203
- **Patreon special mentions**: Slarti, Chadd, John Detwiler, Pieter, zynix, K, Mano Prime, ReadyPlayerEmma, Ai Maven, Leonard Tan, Edmond Seymore, Joseph William Delisle, Luke @flexchar, Fred von Graf, Viktor Bowallius, Rishabh Srivastava, Nikolai Manek, Matthew Berman, Johann-Peter Hartmann, ya boyyy, Greatston Gnanesh, Femi Adebogun, Talal Aujan, Jonathan Leane, terasurfer, David Flickinger, William Sang, Ajan Kanaga, Vadim, Artur Olbinski, Raven Klaugh, Michael Levine, Oscar Rangel, Randy H, Cory Kujawski, RoA, Dave, Alex, Alexandros Triantafyllidis, Fen Risland, Eugene Pentland, vamX, Elle, Nathan LeClaire, Khalefa Al-Ahmad, Rainer Wilmers, subjectnull, Junyu Yang, Daniel P. Andersen, SuperWojo, LangChain4j, Mandus, Kalila, Illia Dulskyi, Trenton Dambrowitz, Asp the Wyvern, Derek Yates, Jeffrey Morgan, Deep Realms, Imad Khwaja, Pyrater, Preetika Verma, biorpg, Gabriel Tamborski, Stephen Murray, Spiking Neurons AB, Iucharbius, Chris Smitley, Willem Michiel, Luke Pendergrass, Sebastain Graf, senxiiz, Will Dee, Space Cruiser, Karl Bernard, Clay Pascal, Lone Striker, transmissions 11, webtim, WelcomeToTheClub, Sam, theTransient, Pierre Kircher, chris gileta, John Villwock, Sean Connelly, Willian Hasse
204
 
205
 
206
  Thank you to all my generous patrons and donaters!
207
 
 
 
208
  <!-- footer end -->
209
 
210
  # Original model card: NousResearch's Nous Hermes Llama 2 7B
@@ -230,16 +236,16 @@ The model was trained almost entirely on synthetic GPT-4 outputs. Curating high
230
  This includes data from diverse sources such as GPTeacher, the general, roleplay v1&2, code instruct datasets, Nous Instruct & PDACTL (unpublished), and several others, detailed further below
231
 
232
  ## Collaborators
233
- The model fine-tuning and the datasets were a collaboration of efforts and resources between Teknium, Karan4D, Emozilla, Huemin Art, and Redmond AI.
234
-
235
  Special mention goes to @winglian for assisting in some of the training issues.
236
 
237
- Huge shoutout and acknowledgement is deserved for all the dataset creators who generously share their datasets openly.
238
 
239
  Among the contributors of datasets:
240
  - GPTeacher was made available by Teknium
241
  - Wizard LM by nlpxucan
242
- - Nous Research Instruct Dataset was provided by Karan4D and HueminArt.
243
  - GPT4-LLM and Unnatural Instructions were provided by Microsoft
244
  - Airoboros dataset by jondurbin
245
  - Camel-AI's domain expert datasets are from Camel-AI
@@ -259,7 +265,7 @@ The model follows the Alpaca prompt format:
259
 
260
  ```
261
 
262
- or
263
 
264
  ```
265
  ### Instruction:
@@ -271,20 +277,20 @@ or
271
  ### Response:
272
  <leave a newline blank for model to respond>
273
 
274
- ```
275
 
276
  ## Benchmark Results
277
  Coming soon
278
 
279
  ## Resources for Applied Use Cases:
280
- For an example of a back and forth chatbot using huggingface transformers and discord, check out: https://github.com/teknium1/alpaca-discord
281
- For an example of a roleplaying discord chatbot, check out this: https://github.com/teknium1/alpaca-roleplay-discordbot
282
 
283
  LM Studio is a good choice for a chat interface that supports GGML versions (to come)
284
 
285
  ## Future Plans
286
- We plan to continue to iterate on both more high quality data, and new data filtering techniques to eliminate lower quality data going forward.
287
 
288
  ## Model Usage
289
  The model is available for download on Hugging Face. It is suitable for a wide range of language tasks, from generating creative text to understanding and following complex instructions.
290
-
 
16
  ---
17
 
18
  <!-- header start -->
19
+ <!-- 200823 -->
20
+ <div style="width: auto; margin-left: auto; margin-right: auto">
21
+ <img src="https://i.imgur.com/EBdldam.jpg" alt="TheBlokeAI" style="width: 100%; min-width: 400px; display: block; margin: auto;">
22
  </div>
23
  <div style="display: flex; justify-content: space-between; width: 100%;">
24
  <div style="display: flex; flex-direction: column; align-items: flex-start;">
25
+ <p style="margin-top: 0.5em; margin-bottom: 0em;"><a href="https://discord.gg/theblokeai">Chat & support: TheBloke's Discord server</a></p>
26
  </div>
27
  <div style="display: flex; flex-direction: column; align-items: flex-end;">
28
+ <p style="margin-top: 0.5em; margin-bottom: 0em;"><a href="https://www.patreon.com/TheBlokeAI">Want to contribute? TheBloke's Patreon page</a></p>
29
  </div>
30
  </div>
31
+ <div style="text-align:center; margin-top: 0em; margin-bottom: 0em"><p style="margin-top: 0.25em; margin-bottom: 0em;">TheBloke's LLM work is generously supported by a grant from <a href="https://a16z.com">andreessen horowitz (a16z)</a></p></div>
32
+ <hr style="margin-top: 1.0em; margin-bottom: 1.0em;">
33
  <!-- header end -->
34
 
35
  # Nous Hermes Llama 2 7B - GPTQ
 
66
 
67
  | Branch | Bits | Group Size | Act Order (desc_act) | File Size | ExLlama Compatible? | Made With | Description |
68
  | ------ | ---- | ---------- | -------------------- | --------- | ------------------- | --------- | ----------- |
69
+ | [main](https://huggingface.co/TheBloke/Nous-Hermes-Llama-2-7B-GPTQ/tree/main) | 4 | 128 | False | 3.90 GB | True | AutoGPTQ | Most compatible option. Good inference speed in AutoGPTQ and GPTQ-for-LLaMa. Lower inference quality than other options. |
70
+ | [gptq-4bit-32g-actorder_True](https://huggingface.co/TheBloke/Nous-Hermes-Llama-2-7B-GPTQ/tree/gptq-4bit-32g-actorder_True) | 4 | 32 | True | 4.28 GB | True | AutoGPTQ | 4-bit, with Act Order and group size 32g. Gives highest possible inference quality, with maximum VRAM usage. Poor AutoGPTQ CUDA speed. |
71
+ | [gptq-4bit-64g-actorder_True](https://huggingface.co/TheBloke/Nous-Hermes-Llama-2-7B-GPTQ/tree/gptq-4bit-64g-actorder_True) | 4 | 64 | True | 4.02 GB | True | AutoGPTQ | 4-bit, with Act Order and group size 64g. Uses less VRAM than 32g, but with slightly lower accuracy. Poor AutoGPTQ CUDA speed. |
72
+ | [gptq-4bit-128g-actorder_True](https://huggingface.co/TheBloke/Nous-Hermes-Llama-2-7B-GPTQ/tree/gptq-4bit-128g-actorder_True) | 4 | 128 | True | 3.90 GB | True | AutoGPTQ | 4-bit, with Act Order and group size 128g. Uses even less VRAM than 64g, but with slightly lower accuracy. Poor AutoGPTQ CUDA speed. |
73
+ | [gptq-8bit--1g-actorder_True](https://huggingface.co/TheBloke/Nous-Hermes-Llama-2-7B-GPTQ/tree/gptq-8bit--1g-actorder_True) | 8 | None | True | 7.01 GB | False | AutoGPTQ | 8-bit, with Act Order. No group size, to lower VRAM requirements and to improve AutoGPTQ speed. |
74
+ | [gptq-8bit-128g-actorder_False](https://huggingface.co/TheBloke/Nous-Hermes-Llama-2-7B-GPTQ/tree/gptq-8bit-128g-actorder_False) | 8 | 128 | False | 7.16 GB | False | AutoGPTQ | 8-bit, with group size 128g for higher inference quality and without Act Order to improve AutoGPTQ speed. |
75
+ | [gptq-8bit-128g-actorder_True](https://huggingface.co/TheBloke/Nous-Hermes-Llama-2-7B-GPTQ/tree/gptq-8bit-128g-actorder_True) | 8 | 128 | True | 7.16 GB | False | AutoGPTQ | 8-bit, with group size 128g for higher inference quality and with Act Order for even higher accuracy. Poor AutoGPTQ CUDA speed. |
76
  | [gptq-8bit-64g-actorder_True](https://huggingface.co/TheBloke/Nous-Hermes-Llama-2-7B-GPTQ/tree/gptq-8bit-64g-actorder_True) | 8 | 64 | True | 7.31 GB | False | AutoGPTQ | 8-bit, with group size 64g and Act Order for maximum inference quality. Poor AutoGPTQ CUDA speed. |
77
 
78
  ## How to download from branches
 
116
  from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
117
 
118
  model_name_or_path = "TheBloke/Nous-Hermes-Llama-2-7B-GPTQ"
119
+ model_basename = "model"
120
 
121
  use_triton = False
122
 
 
182
  ExLlama works with Llama models in 4-bit. Please see the Provided Files table above for per-file compatibility.
183
 
184
  <!-- footer start -->
185
+ <!-- 200823 -->
186
  ## Discord
187
 
188
  For further support, and discussions on these models and AI in general, join us at:
 
202
  * Patreon: https://patreon.com/TheBlokeAI
203
  * Ko-Fi: https://ko-fi.com/TheBlokeAI
204
 
205
+ **Special thanks to**: Aemon Algiz.
206
 
207
+ **Patreon special mentions**: Sam, theTransient, Jonathan Leane, Steven Wood, webtim, Johann-Peter Hartmann, Geoffrey Montalvo, Gabriel Tamborski, Willem Michiel, John Villwock, Derek Yates, Mesiah Bishop, Eugene Pentland, Pieter, Chadd, Stephen Murray, Daniel P. Andersen, terasurfer, Brandon Frisco, Thomas Belote, Sid, Nathan LeClaire, Magnesian, Alps Aficionado, Stanislav Ovsiannikov, Alex, Joseph William Delisle, Nikolai Manek, Michael Davis, Junyu Yang, K, J, Spencer Kim, Stefan Sabev, Olusegun Samson, transmissions 11, Michael Levine, Cory Kujawski, Rainer Wilmers, zynix, Kalila, Luke @flexchar, Ajan Kanaga, Mandus, vamX, Ai Maven, Mano Prime, Matthew Berman, subjectnull, Vitor Caleffi, Clay Pascal, biorpg, alfie_i, 阿明, Jeffrey Morgan, ya boyyy, Raymond Fosdick, knownsqashed, Olakabola, Leonard Tan, ReadyPlayerEmma, Enrico Ros, Dave, Talal Aujan, Illia Dulskyi, Sean Connelly, senxiiz, Artur Olbinski, Elle, Raven Klaugh, Fen Risland, Deep Realms, Imad Khwaja, Fred von Graf, Will Dee, usrbinkat, SuperWojo, Alexandros Triantafyllidis, Swaroop Kallakuri, Dan Guido, John Detwiler, Pedro Madruga, Iucharbius, Viktor Bowallius, Asp the Wyvern, Edmond Seymore, Trenton Dambrowitz, Space Cruiser, Spiking Neurons AB, Pyrater, LangChain4j, Tony Hughes, Kacper Wikieł, Rishabh Srivastava, David Ziegler, Luke Pendergrass, Andrey, Gabriel Puliatti, Lone Striker, Sebastain Graf, Pierre Kircher, Randy H, NimbleBox.ai, Vadim, danny, Deo Leter
208
 
209
 
210
  Thank you to all my generous patrons and donaters!
211
 
212
+ And thank you again to a16z for their generous grant.
213
+
214
  <!-- footer end -->
215
 
216
  # Original model card: NousResearch's Nous Hermes Llama 2 7B
 
236
  This includes data from diverse sources such as GPTeacher, the general, roleplay v1&2, code instruct datasets, Nous Instruct & PDACTL (unpublished), and several others, detailed further below
237
 
238
  ## Collaborators
239
+ The model fine-tuning and the datasets were a collaboration of efforts and resources between Teknium, Karan4D, Emozilla, Huemin Art, and Redmond AI.
240
+
241
  Special mention goes to @winglian for assisting in some of the training issues.
242
 
243
+ Huge shoutout and acknowledgement is deserved for all the dataset creators who generously share their datasets openly.
244
 
245
  Among the contributors of datasets:
246
  - GPTeacher was made available by Teknium
247
  - Wizard LM by nlpxucan
248
+ - Nous Research Instruct Dataset was provided by Karan4D and HueminArt.
249
  - GPT4-LLM and Unnatural Instructions were provided by Microsoft
250
  - Airoboros dataset by jondurbin
251
  - Camel-AI's domain expert datasets are from Camel-AI
 
265
 
266
  ```
267
 
268
+ or
269
 
270
  ```
271
  ### Instruction:
 
277
  ### Response:
278
  <leave a newline blank for model to respond>
279
 
280
+ ```
281
 
282
  ## Benchmark Results
283
  Coming soon
284
 
285
  ## Resources for Applied Use Cases:
286
+ For an example of a back and forth chatbot using huggingface transformers and discord, check out: https://github.com/teknium1/alpaca-discord
287
+ For an example of a roleplaying discord chatbot, check out this: https://github.com/teknium1/alpaca-roleplay-discordbot
288
 
289
  LM Studio is a good choice for a chat interface that supports GGML versions (to come)
290
 
291
  ## Future Plans
292
+ We plan to continue to iterate on both more high quality data, and new data filtering techniques to eliminate lower quality data going forward.
293
 
294
  ## Model Usage
295
  The model is available for download on Hugging Face. It is suitable for a wide range of language tasks, from generating creative text to understanding and following complex instructions.
296
+
config.json CHANGED
@@ -1,26 +1,37 @@
1
  {
2
- "_name_or_path": "output/hermes-llama2-4k/checkpoint-2259",
3
- "architectures": [
4
- "LlamaForCausalLM"
5
- ],
6
- "bos_token_id": 1,
7
- "eos_token_id": 2,
8
- "hidden_act": "silu",
9
- "hidden_size": 4096,
10
- "initializer_range": 0.02,
11
- "intermediate_size": 11008,
12
- "max_position_embeddings": 4096,
13
- "model_type": "llama",
14
- "num_attention_heads": 32,
15
- "num_hidden_layers": 32,
16
- "num_key_value_heads": 32,
17
- "pad_token_id": 0,
18
- "pretraining_tp": 1,
19
- "rms_norm_eps": 1e-05,
20
- "rope_scaling": null,
21
- "tie_word_embeddings": false,
22
- "torch_dtype": "bfloat16",
23
- "transformers_version": "4.32.0.dev0",
24
- "use_cache": true,
25
- "vocab_size": 32000
 
 
 
 
 
 
 
 
 
 
 
26
  }
 
1
  {
2
+ "_name_or_path": "output/hermes-llama2-4k/checkpoint-2259",
3
+ "architectures": [
4
+ "LlamaForCausalLM"
5
+ ],
6
+ "bos_token_id": 1,
7
+ "eos_token_id": 2,
8
+ "hidden_act": "silu",
9
+ "hidden_size": 4096,
10
+ "initializer_range": 0.02,
11
+ "intermediate_size": 11008,
12
+ "max_position_embeddings": 4096,
13
+ "model_type": "llama",
14
+ "num_attention_heads": 32,
15
+ "num_hidden_layers": 32,
16
+ "num_key_value_heads": 32,
17
+ "pad_token_id": 0,
18
+ "pretraining_tp": 1,
19
+ "rms_norm_eps": 1e-05,
20
+ "rope_scaling": null,
21
+ "tie_word_embeddings": false,
22
+ "torch_dtype": "bfloat16",
23
+ "transformers_version": "4.32.0.dev0",
24
+ "use_cache": true,
25
+ "vocab_size": 32000,
26
+ "quantization_config": {
27
+ "bits": 8,
28
+ "group_size": -1,
29
+ "damp_percent": 0.1,
30
+ "desc_act": true,
31
+ "sym": true,
32
+ "true_sequential": true,
33
+ "model_name_or_path": null,
34
+ "model_file_base_name": "model",
35
+ "quant_method": "gptq"
36
+ }
37
  }
gptq_model-8bit--1g.safetensors → model.safetensors RENAMED
File without changes
quantize_config.json CHANGED
@@ -6,5 +6,5 @@
6
  "sym": true,
7
  "true_sequential": true,
8
  "model_name_or_path": null,
9
- "model_file_base_name": null
10
  }
 
6
  "sym": true,
7
  "true_sequential": true,
8
  "model_name_or_path": null,
9
+ "model_file_base_name": "model"
10
  }