Text Generation
Transformers
Safetensors
English
llama
text-generation-inference
4-bit precision
gptq
TheBloke commited on
Commit
8c5b43b
1 Parent(s): 6b38ec6

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +27 -15
README.md CHANGED
@@ -6,7 +6,7 @@ inference: false
6
  language:
7
  - en
8
  library_name: transformers
9
- license: llama2
10
  model_creator: Open-Orca
11
  model_link: https://huggingface.co/Open-Orca/OpenOrca-Platypus2-13B
12
  model_name: OpenOrca Platypus2 13B
@@ -66,7 +66,15 @@ Multiple GPTQ parameter permutations are provided; see Provided Files below for
66
  ```
67
 
68
  <!-- prompt-template end -->
 
 
69
 
 
 
 
 
 
 
70
  <!-- README_GPTQ.md-provided-files start -->
71
  ## Provided files and GPTQ parameters
72
 
@@ -91,22 +99,22 @@ All recent GPTQ files are made with AutoGPTQ, and all files in non-main branches
91
 
92
  | Branch | Bits | GS | Act Order | Damp % | GPTQ Dataset | Seq Len | Size | ExLlama | Desc |
93
  | ------ | ---- | -- | --------- | ------ | ------------ | ------- | ---- | ------- | ---- |
94
- | [main](https://huggingface.co/TheBloke/OpenOrca-Platypus2-13B-GPTQ/tree/main) | 4 | 128 | No | 0.1 | [wikitext](https://huggingface.co/datasets/wikitext/viewer/wikitext-2-v1/test) | 4096 | 7.26 GB | Yes | Most compatible option. Good inference speed in AutoGPTQ and GPTQ-for-LLaMa. Lower inference quality than other options. |
95
- | [gptq-4bit-32g-actorder_True](https://huggingface.co/TheBloke/OpenOrca-Platypus2-13B-GPTQ/tree/gptq-4bit-32g-actorder_True) | 4 | 32 | Yes | 0.1 | [wikitext](https://huggingface.co/datasets/wikitext/viewer/wikitext-2-v1/test) | 4096 | 8.00 GB | Yes | 4-bit, with Act Order and group size 32g. Gives highest possible inference quality, with maximum VRAM usage. Poor AutoGPTQ CUDA speed. |
96
- | [gptq-4bit-64g-actorder_True](https://huggingface.co/TheBloke/OpenOrca-Platypus2-13B-GPTQ/tree/gptq-4bit-64g-actorder_True) | 4 | 64 | Yes | 0.1 | [wikitext](https://huggingface.co/datasets/wikitext/viewer/wikitext-2-v1/test) | 4096 | 7.51 GB | Yes | 4-bit, with Act Order and group size 64g. Uses less VRAM than 32g, but with slightly lower accuracy. Poor AutoGPTQ CUDA speed. |
97
- | [gptq-4bit-128g-actorder_True](https://huggingface.co/TheBloke/OpenOrca-Platypus2-13B-GPTQ/tree/gptq-4bit-128g-actorder_True) | 4 | 128 | Yes | 0.1 | [wikitext](https://huggingface.co/datasets/wikitext/viewer/wikitext-2-v1/test) | 4096 | 7.26 GB | Yes | 4-bit, with Act Order and group size 128g. Uses even less VRAM than 64g, but with slightly lower accuracy. Poor AutoGPTQ CUDA speed. |
98
- | [gptq-8bit--1g-actorder_True](https://huggingface.co/TheBloke/OpenOrca-Platypus2-13B-GPTQ/tree/gptq-8bit--1g-actorder_True) | 8 | None | Yes | 0.1 | [wikitext](https://huggingface.co/datasets/wikitext/viewer/wikitext-2-v1/test) | 4096 | 13.36 GB | No | 8-bit, with Act Order. No group size, to lower VRAM requirements and to improve AutoGPTQ speed. |
99
- | [gptq-8bit-128g-actorder_True](https://huggingface.co/TheBloke/OpenOrca-Platypus2-13B-GPTQ/tree/gptq-8bit-128g-actorder_True) | 8 | 128 | Yes | 0.1 | [wikitext](https://huggingface.co/datasets/wikitext/viewer/wikitext-2-v1/test) | 4096 | 13.65 GB | No | 8-bit, with group size 128g for higher inference quality and with Act Order for even higher accuracy. Poor AutoGPTQ CUDA speed. |
100
 
101
  <!-- README_GPTQ.md-provided-files end -->
102
 
103
  <!-- README_GPTQ.md-download-from-branches start -->
104
  ## How to download from branches
105
 
106
- - In text-generation-webui, you can add `:branch` to the end of the download name, eg `TheBloke/OpenOrca-Platypus2-13B-GPTQ:gptq-4bit-32g-actorder_True`
107
  - With Git, you can clone a branch with:
108
  ```
109
- git clone --single-branch --branch gptq-4bit-32g-actorder_True https://huggingface.co/TheBloke/OpenOrca-Platypus2-13B-GPTQ
110
  ```
111
  - In Python Transformers code, the branch is the `revision` parameter; see below.
112
  <!-- README_GPTQ.md-download-from-branches end -->
@@ -119,7 +127,7 @@ It is strongly recommended to use the text-generation-webui one-click-installers
119
 
120
  1. Click the **Model tab**.
121
  2. Under **Download custom model or LoRA**, enter `TheBloke/OpenOrca-Platypus2-13B-GPTQ`.
122
- - To download from a specific branch, enter for example `TheBloke/OpenOrca-Platypus2-13B-GPTQ:gptq-4bit-32g-actorder_True`
123
  - see Provided Files above for the list of branches for each option.
124
  3. Click **Download**.
125
  4. The model will start downloading. Once it's finished it will say "Done".
@@ -167,10 +175,10 @@ from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
167
 
168
  model_name_or_path = "TheBloke/OpenOrca-Platypus2-13B-GPTQ"
169
  # To use a different branch, change revision
170
- # For example: revision="gptq-4bit-32g-actorder_True"
171
  model = AutoModelForCausalLM.from_pretrained(model_name_or_path,
172
- torch_dtype=torch.float16,
173
  device_map="auto",
 
174
  revision="main")
175
 
176
  tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)
@@ -187,7 +195,7 @@ prompt_template=f'''### Instruction:
187
  print("\n\n*** Generate:")
188
 
189
  input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()
190
- output = model.generate(inputs=input_ids, temperature=0.7, max_new_tokens=512)
191
  print(tokenizer.decode(output[0]))
192
 
193
  # Inference can also be done using transformers' pipeline
@@ -198,9 +206,11 @@ pipe = pipeline(
198
  model=model,
199
  tokenizer=tokenizer,
200
  max_new_tokens=512,
 
201
  temperature=0.7,
202
  top_p=0.95,
203
- repetition_penalty=1.15
 
204
  )
205
 
206
  print(pipe(prompt_template)[0]['generated_text'])
@@ -225,10 +235,12 @@ For further support, and discussions on these models and AI in general, join us
225
 
226
  [TheBloke AI's Discord server](https://discord.gg/theblokeai)
227
 
228
- ## Thanks, and how to contribute.
229
 
230
  Thanks to the [chirper.ai](https://chirper.ai) team!
231
 
 
 
232
  I've had a lot of people ask if they can contribute. I enjoy providing models and helping people, and would love to be able to spend even more time doing it, as well as expanding into new projects like fine tuning/training.
233
 
234
  If you're able and willing to contribute it will be most gratefully received and will help me to keep providing more models, and to start work on new AI projects.
 
6
  language:
7
  - en
8
  library_name: transformers
9
+ license: cc-by-nc-4.0
10
  model_creator: Open-Orca
11
  model_link: https://huggingface.co/Open-Orca/OpenOrca-Platypus2-13B
12
  model_name: OpenOrca Platypus2 13B
 
66
  ```
67
 
68
  <!-- prompt-template end -->
69
+ <!-- licensing start -->
70
+ ## Licensing
71
 
72
+ The creator of the source model has listed its license as `cc-by-nc-4.0`, and this quantization has therefore used that same license.
73
+
74
+ As this model is based on Llama 2, it is also subject to the Meta Llama 2 license terms, and the license files for that are additionally included. It should therefore be considered as being claimed to be licensed under both licenses. I contacted Hugging Face for clarification on dual licensing but they do not yet have an official position. Should this change, or should Meta provide any feedback on this situation, I will update this section accordingly.
75
+
76
+ In the meantime, any questions regarding licensing, and in particular how these two licenses might interact, should be directed to the original model repository: [Open-Orca's OpenOrca Platypus2 13B](https://huggingface.co/Open-Orca/OpenOrca-Platypus2-13B).
77
+ <!-- licensing end -->
78
  <!-- README_GPTQ.md-provided-files start -->
79
  ## Provided files and GPTQ parameters
80
 
 
99
 
100
  | Branch | Bits | GS | Act Order | Damp % | GPTQ Dataset | Seq Len | Size | ExLlama | Desc |
101
  | ------ | ---- | -- | --------- | ------ | ------------ | ------- | ---- | ------- | ---- |
102
+ | [main](https://huggingface.co/TheBloke/OpenOrca-Platypus2-13B-GPTQ/tree/main) | 4 | 128 | No | 0.1 | [wikitext](https://huggingface.co/datasets/wikitext/viewer/wikitext-2-v1/test) | 4096 | 7.26 GB | Yes | 4-bit, without Act Order and group size 128g. |
103
+ | [gptq-4bit-32g-actorder_True](https://huggingface.co/TheBloke/OpenOrca-Platypus2-13B-GPTQ/tree/gptq-4bit-32g-actorder_True) | 4 | 32 | Yes | 0.1 | [wikitext](https://huggingface.co/datasets/wikitext/viewer/wikitext-2-v1/test) | 4096 | 8.00 GB | Yes | 4-bit, with Act Order and group size 32g. Gives highest possible inference quality, with maximum VRAM usage. |
104
+ | [gptq-4bit-64g-actorder_True](https://huggingface.co/TheBloke/OpenOrca-Platypus2-13B-GPTQ/tree/gptq-4bit-64g-actorder_True) | 4 | 64 | Yes | 0.1 | [wikitext](https://huggingface.co/datasets/wikitext/viewer/wikitext-2-v1/test) | 4096 | 7.51 GB | Yes | 4-bit, with Act Order and group size 64g. Uses less VRAM than 32g, but with slightly lower accuracy. |
105
+ | [gptq-4bit-128g-actorder_True](https://huggingface.co/TheBloke/OpenOrca-Platypus2-13B-GPTQ/tree/gptq-4bit-128g-actorder_True) | 4 | 128 | Yes | 0.1 | [wikitext](https://huggingface.co/datasets/wikitext/viewer/wikitext-2-v1/test) | 4096 | 7.26 GB | Yes | 4-bit, with Act Order and group size 128g. Uses even less VRAM than 64g, but with slightly lower accuracy. |
106
+ | [gptq-8bit--1g-actorder_True](https://huggingface.co/TheBloke/OpenOrca-Platypus2-13B-GPTQ/tree/gptq-8bit--1g-actorder_True) | 8 | None | Yes | 0.1 | [wikitext](https://huggingface.co/datasets/wikitext/viewer/wikitext-2-v1/test) | 4096 | 13.36 GB | No | 8-bit, with Act Order. No group size, to lower VRAM requirements. |
107
+ | [gptq-8bit-128g-actorder_True](https://huggingface.co/TheBloke/OpenOrca-Platypus2-13B-GPTQ/tree/gptq-8bit-128g-actorder_True) | 8 | 128 | Yes | 0.1 | [wikitext](https://huggingface.co/datasets/wikitext/viewer/wikitext-2-v1/test) | 4096 | 13.65 GB | No | 8-bit, with group size 128g for higher inference quality and with Act Order for even higher accuracy. |
108
 
109
  <!-- README_GPTQ.md-provided-files end -->
110
 
111
  <!-- README_GPTQ.md-download-from-branches start -->
112
  ## How to download from branches
113
 
114
+ - In text-generation-webui, you can add `:branch` to the end of the download name, eg `TheBloke/OpenOrca-Platypus2-13B-GPTQ:main`
115
  - With Git, you can clone a branch with:
116
  ```
117
+ git clone --single-branch --branch main https://huggingface.co/TheBloke/OpenOrca-Platypus2-13B-GPTQ
118
  ```
119
  - In Python Transformers code, the branch is the `revision` parameter; see below.
120
  <!-- README_GPTQ.md-download-from-branches end -->
 
127
 
128
  1. Click the **Model tab**.
129
  2. Under **Download custom model or LoRA**, enter `TheBloke/OpenOrca-Platypus2-13B-GPTQ`.
130
+ - To download from a specific branch, enter for example `TheBloke/OpenOrca-Platypus2-13B-GPTQ:main`
131
  - see Provided Files above for the list of branches for each option.
132
  3. Click **Download**.
133
  4. The model will start downloading. Once it's finished it will say "Done".
 
175
 
176
  model_name_or_path = "TheBloke/OpenOrca-Platypus2-13B-GPTQ"
177
  # To use a different branch, change revision
178
+ # For example: revision="main"
179
  model = AutoModelForCausalLM.from_pretrained(model_name_or_path,
 
180
  device_map="auto",
181
+ trust_remote_code=False,
182
  revision="main")
183
 
184
  tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)
 
195
  print("\n\n*** Generate:")
196
 
197
  input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()
198
+ output = model.generate(inputs=input_ids, temperature=0.7, do_sample=True, top_p=0.95, top_k=40, max_new_tokens=512)
199
  print(tokenizer.decode(output[0]))
200
 
201
  # Inference can also be done using transformers' pipeline
 
206
  model=model,
207
  tokenizer=tokenizer,
208
  max_new_tokens=512,
209
+ do_sample=True,
210
  temperature=0.7,
211
  top_p=0.95,
212
+ top_k=40,
213
+ repetition_penalty=1.1
214
  )
215
 
216
  print(pipe(prompt_template)[0]['generated_text'])
 
235
 
236
  [TheBloke AI's Discord server](https://discord.gg/theblokeai)
237
 
238
+ ## Thanks, and how to contribute
239
 
240
  Thanks to the [chirper.ai](https://chirper.ai) team!
241
 
242
+ Thanks to Clay from [gpus.llm-utils.org](llm-utils)!
243
+
244
  I've had a lot of people ask if they can contribute. I enjoy providing models and helping people, and would love to be able to spend even more time doing it, as well as expanding into new projects like fine tuning/training.
245
 
246
  If you're able and willing to contribute it will be most gratefully received and will help me to keep providing more models, and to start work on new AI projects.