TheBloke commited on
Commit
77350f1
1 Parent(s): e668273

Initial GPTQ model commit

Browse files
Files changed (1) hide show
  1. README.md +62 -27
README.md CHANGED
@@ -50,20 +50,31 @@ Multiple GPTQ parameter permutations are provided; see Provided Files below for
50
  User: {prompt}<|end_of_turn|>Assistant:
51
  ```
52
 
53
- ## Provided files
54
 
55
  Multiple quantisation parameters are provided, to allow you to choose the best one for your hardware and requirements.
56
 
57
  Each separate quant is in a different branch. See below for instructions on fetching from different branches.
58
 
59
- | Branch | Bits | Group Size | Act Order (desc_act) | GPTQ Dataset | Size | ExLlama Compat? | Made With | Desc |
60
- | ------ | ---- | ---------- | -------------------- | ------------ | ---- | --------------- | --------- | ---- |
61
- | main | 4 | 128 | No | [wikitext](https://huggingface.co/datasets/wikitext/viewer/wikitext-2-v1/test) | 7.26 GB | Yes | AutoGPTQ | Most compatible option. Good inference speed in AutoGPTQ and GPTQ-for-LLaMa. Lower inference quality than other options. |
62
- | gptq-4bit-32g-actorder_True | 4 | 32 | Yes | [wikitext](https://huggingface.co/datasets/wikitext/viewer/wikitext-2-v1/test) | 8.00 GB | Yes | AutoGPTQ | 4-bit, with Act Order and group size 32g. Gives highest possible inference quality, with maximum VRAM usage. Poor AutoGPTQ CUDA speed. |
63
- | gptq-4bit-64g-actorder_True | 4 | 64 | Yes | [wikitext](https://huggingface.co/datasets/wikitext/viewer/wikitext-2-v1/test) | 7.51 GB | Yes | AutoGPTQ | 4-bit, with Act Order and group size 64g. Uses less VRAM than 32g, but with slightly lower accuracy. Poor AutoGPTQ CUDA speed. |
64
- | gptq-4bit-128g-actorder_True | 4 | 128 | Yes | [wikitext](https://huggingface.co/datasets/wikitext/viewer/wikitext-2-v1/test) | 7.26 GB | Yes | AutoGPTQ | 4-bit, with Act Order and group size 128g. Uses even less VRAM than 64g, but with slightly lower accuracy. Poor AutoGPTQ CUDA speed. |
65
- | gptq-8bit--1g-actorder_True | 8 | None | Yes | [wikitext](https://huggingface.co/datasets/wikitext/viewer/wikitext-2-v1/test) | 13.36 GB | No | AutoGPTQ | 8-bit, with Act Order. No group size, to lower VRAM requirements and to improve AutoGPTQ speed. |
66
- | gptq-8bit-128g-actorder_True | 8 | 128 | Yes | [wikitext](https://huggingface.co/datasets/wikitext/viewer/wikitext-2-v1/test) | 13.65 GB | No | AutoGPTQ | 8-bit, with group size 128g for higher inference quality and with Act Order for even higher accuracy. Poor AutoGPTQ CUDA speed. |
 
 
 
 
 
 
 
 
 
 
 
67
 
68
  ## How to download from branches
69
 
@@ -218,13 +229,13 @@ Thank you to all my generous patrons and donaters!
218
  We have used our own [OpenOrca dataset](https://huggingface.co/datasets/Open-Orca/OpenOrca) to fine-tune Llama2-13B using [OpenChat](https://huggingface.co/openchat) packing and conditional behavior cloning.
219
  This dataset is our attempt to reproduce the dataset generated for Microsoft Research's [Orca Paper](https://arxiv.org/abs/2306.02707).
220
 
221
- This second preview release is trained on a curated filtered subset of most of our GPT4 augmented data.
222
 
223
  This release highlights that our dataset and training methods have surpassed performance parity with the Orca paper.
224
- We measured this with BigBench-Hard and AGIEval results with the same methods as used in the Orca paper, finding ~103% of original Orca's performance on average.
225
- As well, this is done with ~1/10th the compute requirement and using <20% of the dataset size from the original Orca paper.
226
 
227
- We have run extensive evaluations internally and expect this model to place number 1 on both the HuggingFaceH4 Open LLM Leaderboard and the GPT4ALL Leaderboard for 13B models.
228
 
229
  "One" of [OpenChat](https://huggingface.co/openchat) has joined our team, and we'd like to provide special thanks for their training of this model!
230
  We have utilized OpenChat conditional behavior cloning and [MultiPack algorithm](https://github.com/imoneoi/multipack_sampler) which achieves 99.85% bin-packing efficiency on our dataset.
@@ -253,46 +264,58 @@ We have evaluated **OpenOrcaxOpenChat-Preview2-13B** on hard reasoning tasks fro
253
 
254
  Our average performance for BigBench-Hard: 0.488
255
 
256
- Average for AGIEval: 0.441
257
 
258
  In the Orca paper, they measured their score relative to Vicuna on these evals.
259
- We've done the same and have found our score averages to >103% of the total improvement that was shown in the Orca paper, using the same evaluation methods as outlined in the paper.
260
 
261
- So we are surpassing Orca performance with <20% of the dataset size and ~1/10th the training budget!
262
 
263
- ## BigBench-Hard Performance
264
-
265
- ![OpenOrca Preview2 BigBench-Hard Performance](https://huggingface.co/Open-Orca/OpenOrcaxOpenChat-Preview2-13B/resolve/main/OO_Preview2_BigBenchHard.png "BigBench-Hard Performance")
266
 
267
  ## AGIEval Performance
268
 
269
- ![OpenOrca Preview2 AGIEval Performance](https://huggingface.co/Open-Orca/OpenOrcaxOpenChat-Preview2-13B/resolve/main/OO_Preview2_AGIEval.png "AGIEval Performance")
 
 
 
 
 
 
 
 
 
 
 
 
270
 
271
  ## HuggingFaceH4 Open LLM Leaderboard Performance
272
 
273
  We have run our own tests using parameters matching the [HuggingFaceH4 Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) evals.
274
- We find
275
 
276
- ![OpenOrca Preview2 HuggingFace Leaderboard Performance](https://huggingface.co/Open-Orca/OpenOrcaxOpenChat-Preview2-13B/resolve/main/OO_Preview2_HFLeaderboard.png "GPT4ALL Performance")
 
 
277
 
278
  ## GPT4ALL Leaderboard Performance
279
 
280
  We have tested using parameters matching the GPT4ALL Benchmark Suite and report our results and placement vs their official reporting below.
281
- We place #1 for all open models and come within comparison of text-davinci-003, a proprietary model an order of magnitude larger.
282
 
283
- ![OpenOrca Preview2 GPT4ALL Performance](https://huggingface.co/Open-Orca/OpenOrcaxOpenChat-Preview2-13B/resolve/main/OO_Preview2_AGIEval.png "GPT4ALL Performance")
 
 
284
 
285
 
286
  # Dataset
287
 
288
  We used a curated, filtered selection of most of the GPT-4 augmented data from our OpenOrca dataset, which aims to reproduce the Orca Research Paper dataset.
289
- Further details of our curation practices will be forthcoming with our full model release.
290
 
291
 
292
  # Training
293
 
294
- We trained with 8x A100-80G GPUs for 46 hours, completing 5 epochs of full fine tuning on our dataset.
295
- This contrasts with the 20x A100-80G GPUs for 200 hours used in the Orca paper, for only 3 epochs.
296
  Our compute requirement was <1/10th that of the original Orca.
297
  Commodity cost was ~$600.
298
 
@@ -315,6 +338,18 @@ tokenize("User: Hello<|end_of_turn|>Assistant: Hi<|end_of_turn|>User: How are yo
315
  # Result: [1, 4911, 29901, 15043, 32000, 4007, 22137, 29901, 6324, 32000, 4911, 29901, 1128, 526, 366, 9826, 29973, 32000, 4007, 22137, 29901]
316
  ```
317
 
 
 
 
 
 
 
 
 
 
 
 
 
318
 
319
  # Serving
320
 
 
50
  User: {prompt}<|end_of_turn|>Assistant:
51
  ```
52
 
53
+ ## Provided files and GPTQ parameters
54
 
55
  Multiple quantisation parameters are provided, to allow you to choose the best one for your hardware and requirements.
56
 
57
  Each separate quant is in a different branch. See below for instructions on fetching from different branches.
58
 
59
+ <details>
60
+ <summary>Explanation of GPTQ parameters</summary>
61
+ - Bits: The bit size of the quantised model.
62
+ - GS: GPTQ group size. Higher numbers use less VRAM, but have lower quantisation accuracy. "None" is the lowest possible value.
63
+ - Act Order: True or False. Also known as `desc_act`. True results in better quantisation accuracy. Some GPTQ clients have issues with models that use Act Order plus Group Size.
64
+ - Damp %: A GPTQ parameter that affects how samples are processed for quantisation. 0.01 is default, but 0.1 results in slightly better accuracy.
65
+ - GPTQ dataset: The dataset used for quantisation. The dataset used for quantisation can affect the quantisation accuracy. The dataset used for quantisation is not the same as the dataset used to train the model.
66
+ - Sequence Length: The length of the dataset sequences used for quantisation. Ideally this is the same as the model sequence length. For some very long sequence models (16+K), a lower sequence length may have to be used. Note that a lower sequence length does not limit the sequence length of the quantised model. It only affects the quantisation accuracy on longer inference sequences.
67
+ - ExLlama Compatibility: Whether this file can be loaded with ExLlama, which currently only supports Llama models in 4-bit.
68
+ </details>
69
+
70
+ | Branch | Bits | GS | Act Order | Damp % | GPTQ Dataset | Seq Len | Size | ExLlama Compat? | By | Desc |
71
+ | ------ | ---- | -- | --------- | ------ | ------------ | ------- | ---- | --------------- | -- | ---- |
72
+ | main | 4 | 128 | No | 0.1 | [wikitext](https://huggingface.co/datasets/wikitext/viewer/wikitext-2-v1/test) | 4096 | 7.26 GB | Yes | AutoGPTQ | Most compatible option. Good inference speed in AutoGPTQ and GPTQ-for-LLaMa. Lower inference quality than other options. |
73
+ | gptq-4bit-32g-actorder_True | 4 | 32 | Yes | 0.1 | [wikitext](https://huggingface.co/datasets/wikitext/viewer/wikitext-2-v1/test) | 4096 | 8.00 GB | Yes | AutoGPTQ | 4-bit, with Act Order and group size 32g. Gives highest possible inference quality, with maximum VRAM usage. Poor AutoGPTQ CUDA speed. |
74
+ | gptq-4bit-64g-actorder_True | 4 | 64 | Yes | 0.1 | [wikitext](https://huggingface.co/datasets/wikitext/viewer/wikitext-2-v1/test) | 4096 | 7.51 GB | Yes | AutoGPTQ | 4-bit, with Act Order and group size 64g. Uses less VRAM than 32g, but with slightly lower accuracy. Poor AutoGPTQ CUDA speed. |
75
+ | gptq-4bit-128g-actorder_True | 4 | 128 | Yes | 0.1 | [wikitext](https://huggingface.co/datasets/wikitext/viewer/wikitext-2-v1/test) | 4096 | 7.26 GB | Yes | AutoGPTQ | 4-bit, with Act Order and group size 128g. Uses even less VRAM than 64g, but with slightly lower accuracy. Poor AutoGPTQ CUDA speed. |
76
+ | gptq-8bit--1g-actorder_True | 8 | None | Yes | 0.1 | [wikitext](https://huggingface.co/datasets/wikitext/viewer/wikitext-2-v1/test) | 4096 | 13.36 GB | No | AutoGPTQ | 8-bit, with Act Order. No group size, to lower VRAM requirements and to improve AutoGPTQ speed. |
77
+ | gptq-8bit-128g-actorder_True | 8 | 128 | Yes | 0.1 | [wikitext](https://huggingface.co/datasets/wikitext/viewer/wikitext-2-v1/test) | 4096 | 13.65 GB | No | AutoGPTQ | 8-bit, with group size 128g for higher inference quality and with Act Order for even higher accuracy. Poor AutoGPTQ CUDA speed. |
78
 
79
  ## How to download from branches
80
 
 
229
  We have used our own [OpenOrca dataset](https://huggingface.co/datasets/Open-Orca/OpenOrca) to fine-tune Llama2-13B using [OpenChat](https://huggingface.co/openchat) packing and conditional behavior cloning.
230
  This dataset is our attempt to reproduce the dataset generated for Microsoft Research's [Orca Paper](https://arxiv.org/abs/2306.02707).
231
 
232
+ This second preview release is trained on a curated filtered subset of most of our GPT-4 augmented data.
233
 
234
  This release highlights that our dataset and training methods have surpassed performance parity with the Orca paper.
235
+ We measured this with BigBench-Hard and AGIEval results with the same methods as used in the Orca paper, finding **~103%** of original Orca's performance on average.
236
+ As well, this is done with <1/10th the compute requirement and using <20% of the dataset size from the original Orca paper.
237
 
238
+ We have run extensive evaluations internally and expect this model to **place number 1** on both the HuggingFaceH4 Open LLM Leaderboard and the GPT4ALL Leaderboard for 13B models.
239
 
240
  "One" of [OpenChat](https://huggingface.co/openchat) has joined our team, and we'd like to provide special thanks for their training of this model!
241
  We have utilized OpenChat conditional behavior cloning and [MultiPack algorithm](https://github.com/imoneoi/multipack_sampler) which achieves 99.85% bin-packing efficiency on our dataset.
 
264
 
265
  Our average performance for BigBench-Hard: 0.488
266
 
267
+ Average for AGIEval: 0.447
268
 
269
  In the Orca paper, they measured their score relative to Vicuna on these evals.
270
+ We have done the same and have found our score averages to **~103%** of the total performance that was shown in the Orca paper, using the same evaluation methods as outlined in the paper.
271
 
272
+ So we are surpassing Orca performance with <20% of the dataset size and <1/10th the training budget!
273
 
274
+ As well, we have evaluated using the methodology and tools for the HuggingFace Leaderboard and GPT4ALL Leaderboard, and find that we place #1 on both for all 13B models at release time!
 
 
275
 
276
  ## AGIEval Performance
277
 
278
+ We present our results in two columns.
279
+ The column for "`(Orca Paper eval)`" uses the methods outlined in the Orca paper, so as to be a direct apples-to-apples comparison with the results from the paper.
280
+ The column for "`(HF Leaderboard eval)`" uses EleutherAI's LM Evaluation Harness with settings outlined by HuggingFace. These results are not comparable to the other columns, as the methods are different.
281
+
282
+ ![OpenOrca Preview2 AGIEval Performance](https://huggingface.co/Open-Orca/OpenOrcaxOpenChat-Preview2-13B/resolve/main/Images/OpenOrcaP2AGIEval.png "AGIEval Performance")
283
+
284
+ ## BigBench-Hard Performance
285
+
286
+ We present our results in two columns.
287
+ The column for "`(Orca Paper eval)`" uses the methods outlined in the Orca paper, so as to be a direct apples-to-apples comparison with the results from the paper.
288
+ The column for "`(HF Leaderboard eval)`" uses EleutherAI's LM Evaluation Harness with settings outlined by HuggingFace. These results are not comparable to the other columns, as the methods are different.
289
+
290
+ ![OpenOrca Preview2 BigBench-Hard Performance](https://huggingface.co/Open-Orca/OpenOrcaxOpenChat-Preview2-13B/resolve/main/Images/OpenOrcaP2BigBenchHardEval.png "BigBench-Hard Performance")
291
 
292
  ## HuggingFaceH4 Open LLM Leaderboard Performance
293
 
294
  We have run our own tests using parameters matching the [HuggingFaceH4 Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) evals.
 
295
 
296
+ We place #1 for all 13B models at release time!
297
+
298
+ ![OpenOrca Preview2 HuggingFace Leaderboard Performance](https://huggingface.co/Open-Orca/OpenOrcaxOpenChat-Preview2-13B/resolve/main/Images/OpenOrcaP2HuggingFaceLeaderboard.png "GPT4ALL Performance")
299
 
300
  ## GPT4ALL Leaderboard Performance
301
 
302
  We have tested using parameters matching the GPT4ALL Benchmark Suite and report our results and placement vs their official reporting below.
 
303
 
304
+ We place #1 for all open models and come within comparison of `text-davinci-003`, a proprietary OpenAI model an order of magnitude larger.
305
+
306
+ ![OpenOrca Preview2 GPT4ALL Performance](https://huggingface.co/Open-Orca/OpenOrcaxOpenChat-Preview2-13B/resolve/main/Images/OpenOrcaP2GPT4ALL_Leaderboard.png "GPT4ALL Performance")
307
 
308
 
309
  # Dataset
310
 
311
  We used a curated, filtered selection of most of the GPT-4 augmented data from our OpenOrca dataset, which aims to reproduce the Orca Research Paper dataset.
312
+ Further details of our curation practices will be forthcoming with our full model releases.
313
 
314
 
315
  # Training
316
 
317
+ We trained with 8x A100-80G GPUs for 46 hours, completing 5 epochs of full fine tuning on our dataset in one training run.
318
+ This contrasts with the 20x A100-80G GPUs for 200 hours used in the Orca paper, for only 3 epochs, and requiring stacked training (which is known to suffer catastrophic forgetting).
319
  Our compute requirement was <1/10th that of the original Orca.
320
  Commodity cost was ~$600.
321
 
 
338
  # Result: [1, 4911, 29901, 15043, 32000, 4007, 22137, 29901, 6324, 32000, 4911, 29901, 1128, 526, 366, 9826, 29973, 32000, 4007, 22137, 29901]
339
  ```
340
 
341
+ For UIs with Prefix and Suffix fields, these will likely work:
342
+
343
+ Prefix (include a space after colon):
344
+ ```
345
+ User:
346
+ ```
347
+
348
+ Suffix (space after colon):
349
+ ```
350
+ <|end_of_turn|>\nAssistant:
351
+ ```
352
+
353
 
354
  # Serving
355