TheBloke commited on
Commit
2be6f8b
1 Parent(s): 693bbe7

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +35 -20
README.md CHANGED
@@ -1,4 +1,5 @@
1
  ---
 
2
  inference: false
3
  language:
4
  - zh
@@ -12,7 +13,6 @@ language:
12
  library_name: transformers
13
  license: llama2
14
  model_creator: OpenBuddy
15
- model_link: https://huggingface.co/OpenBuddy/openbuddy-llama2-13b-v11.1-bf16
16
  model_name: OpenBuddy Llama2 13B v11.1
17
  model_type: llama
18
  pipeline_tag: text-generation
@@ -58,18 +58,24 @@ Multiple GPTQ parameter permutations are provided; see Provided Files below for
58
  <!-- repositories-available end -->
59
 
60
  <!-- prompt-template start -->
61
- ## Prompt template: Vicuna-Short
62
 
63
  ```
64
- You are a helpful AI assistant.
 
 
 
 
 
65
 
66
- USER: {prompt}
67
- ASSISTANT:
68
 
69
  ```
70
 
71
  <!-- prompt-template end -->
72
 
 
73
  <!-- README_GPTQ.md-provided-files start -->
74
  ## Provided files and GPTQ parameters
75
 
@@ -94,20 +100,20 @@ All recent GPTQ files are made with AutoGPTQ, and all files in non-main branches
94
 
95
  | Branch | Bits | GS | Act Order | Damp % | GPTQ Dataset | Seq Len | Size | ExLlama | Desc |
96
  | ------ | ---- | -- | --------- | ------ | ------------ | ------- | ---- | ------- | ---- |
97
- | [main](https://huggingface.co/TheBloke/OpenBuddy-Llama2-13B-v11.1-GPTQ/tree/main) | 4 | 128 | Yes | 0.1 | [wikitext](https://huggingface.co/datasets/wikitext/viewer/wikitext-2-v1/test) | 4096 | 7.37 GB | Yes | Most compatible option. Good inference speed in AutoGPTQ and GPTQ-for-LLaMa. Lower inference quality than other options. |
98
- | [gptq-4bit-32g-actorder_True](https://huggingface.co/TheBloke/OpenBuddy-Llama2-13B-v11.1-GPTQ/tree/gptq-4bit-32g-actorder_True) | 4 | 32 | Yes | 0.1 | [wikitext](https://huggingface.co/datasets/wikitext/viewer/wikitext-2-v1/test) | 4096 | 8.12 GB | Yes | 4-bit, with Act Order and group size 32g. Gives highest possible inference quality, with maximum VRAM usage. Poor AutoGPTQ CUDA speed. |
99
- | [gptq-8bit--1g-actorder_True](https://huggingface.co/TheBloke/OpenBuddy-Llama2-13B-v11.1-GPTQ/tree/gptq-8bit--1g-actorder_True) | 8 | None | Yes | 0.1 | [wikitext](https://huggingface.co/datasets/wikitext/viewer/wikitext-2-v1/test) | 4096 | 13.48 GB | No | 8-bit, with Act Order. No group size, to lower VRAM requirements and to improve AutoGPTQ speed. |
100
- | [gptq-8bit-128g-actorder_True](https://huggingface.co/TheBloke/OpenBuddy-Llama2-13B-v11.1-GPTQ/tree/gptq-8bit-128g-actorder_True) | 8 | 128 | Yes | 0.1 | [wikitext](https://huggingface.co/datasets/wikitext/viewer/wikitext-2-v1/test) | 4096 | 13.77 GB | No | 8-bit, with group size 128g for higher inference quality and with Act Order for even higher accuracy. Poor AutoGPTQ CUDA speed. |
101
 
102
  <!-- README_GPTQ.md-provided-files end -->
103
 
104
  <!-- README_GPTQ.md-download-from-branches start -->
105
  ## How to download from branches
106
 
107
- - In text-generation-webui, you can add `:branch` to the end of the download name, eg `TheBloke/OpenBuddy-Llama2-13B-v11.1-GPTQ:gptq-4bit-32g-actorder_True`
108
  - With Git, you can clone a branch with:
109
  ```
110
- git clone --single-branch --branch gptq-4bit-32g-actorder_True https://huggingface.co/TheBloke/OpenBuddy-Llama2-13B-v11.1-GPTQ
111
  ```
112
  - In Python Transformers code, the branch is the `revision` parameter; see below.
113
  <!-- README_GPTQ.md-download-from-branches end -->
@@ -120,7 +126,7 @@ It is strongly recommended to use the text-generation-webui one-click-installers
120
 
121
  1. Click the **Model tab**.
122
  2. Under **Download custom model or LoRA**, enter `TheBloke/OpenBuddy-Llama2-13B-v11.1-GPTQ`.
123
- - To download from a specific branch, enter for example `TheBloke/OpenBuddy-Llama2-13B-v11.1-GPTQ:gptq-4bit-32g-actorder_True`
124
  - see Provided Files above for the list of branches for each option.
125
  3. Click **Download**.
126
  4. The model will start downloading. Once it's finished it will say "Done".
@@ -168,26 +174,31 @@ from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
168
 
169
  model_name_or_path = "TheBloke/OpenBuddy-Llama2-13B-v11.1-GPTQ"
170
  # To use a different branch, change revision
171
- # For example: revision="gptq-4bit-32g-actorder_True"
172
  model = AutoModelForCausalLM.from_pretrained(model_name_or_path,
173
- torch_dtype=torch.bfloat16,
174
  device_map="auto",
 
175
  revision="main")
176
 
177
  tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)
178
 
179
  prompt = "Tell me about AI"
180
- prompt_template=f'''You are a helpful AI assistant.
 
 
 
 
 
181
 
182
- USER: {prompt}
183
- ASSISTANT:
184
 
185
  '''
186
 
187
  print("\n\n*** Generate:")
188
 
189
  input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()
190
- output = model.generate(inputs=input_ids, temperature=0.7, max_new_tokens=512)
191
  print(tokenizer.decode(output[0]))
192
 
193
  # Inference can also be done using transformers' pipeline
@@ -198,9 +209,11 @@ pipe = pipeline(
198
  model=model,
199
  tokenizer=tokenizer,
200
  max_new_tokens=512,
 
201
  temperature=0.7,
202
  top_p=0.95,
203
- repetition_penalty=1.15
 
204
  )
205
 
206
  print(pipe(prompt_template)[0]['generated_text'])
@@ -225,10 +238,12 @@ For further support, and discussions on these models and AI in general, join us
225
 
226
  [TheBloke AI's Discord server](https://discord.gg/theblokeai)
227
 
228
- ## Thanks, and how to contribute.
229
 
230
  Thanks to the [chirper.ai](https://chirper.ai) team!
231
 
 
 
232
  I've had a lot of people ask if they can contribute. I enjoy providing models and helping people, and would love to be able to spend even more time doing it, as well as expanding into new projects like fine tuning/training.
233
 
234
  If you're able and willing to contribute it will be most gratefully received and will help me to keep providing more models, and to start work on new AI projects.
 
1
  ---
2
+ base_model: https://huggingface.co/OpenBuddy/openbuddy-llama2-13b-v11.1-bf16
3
  inference: false
4
  language:
5
  - zh
 
13
  library_name: transformers
14
  license: llama2
15
  model_creator: OpenBuddy
 
16
  model_name: OpenBuddy Llama2 13B v11.1
17
  model_type: llama
18
  pipeline_tag: text-generation
 
58
  <!-- repositories-available end -->
59
 
60
  <!-- prompt-template start -->
61
+ ## Prompt template: OpenBuddy
62
 
63
  ```
64
+ You are a helpful, respectful and honest INTP-T AI Assistant named Buddy. You are talking to a human User.
65
+ Always answer as helpfully and logically as possible, while being safe. Your answers should not include any harmful, political, religious, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.
66
+ If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
67
+ You like to use emojis. You can speak fluently in many languages, for example: English, Chinese.
68
+ You cannot access the internet, but you have vast knowledge, cutoff: 2021-09.
69
+ You are trained by OpenBuddy team, (https://openbuddy.ai, https://github.com/OpenBuddy/OpenBuddy), you are based on LLaMA and Falcon transformers model, not related to GPT or OpenAI.
70
 
71
+ User: {prompt}
72
+ Assistant:
73
 
74
  ```
75
 
76
  <!-- prompt-template end -->
77
 
78
+
79
  <!-- README_GPTQ.md-provided-files start -->
80
  ## Provided files and GPTQ parameters
81
 
 
100
 
101
  | Branch | Bits | GS | Act Order | Damp % | GPTQ Dataset | Seq Len | Size | ExLlama | Desc |
102
  | ------ | ---- | -- | --------- | ------ | ------------ | ------- | ---- | ------- | ---- |
103
+ | [main](https://huggingface.co/TheBloke/OpenBuddy-Llama2-13B-v11.1-GPTQ/tree/main) | 4 | 128 | Yes | 0.1 | [wikitext](https://huggingface.co/datasets/wikitext/viewer/wikitext-2-v1/test) | 4096 | 7.37 GB | Yes | 4-bit, with Act Order and group size 128g. Uses even less VRAM than 64g, but with slightly lower accuracy. |
104
+ | [gptq-4bit-32g-actorder_True](https://huggingface.co/TheBloke/OpenBuddy-Llama2-13B-v11.1-GPTQ/tree/gptq-4bit-32g-actorder_True) | 4 | 32 | Yes | 0.1 | [wikitext](https://huggingface.co/datasets/wikitext/viewer/wikitext-2-v1/test) | 4096 | 8.12 GB | Yes | 4-bit, with Act Order and group size 32g. Gives highest possible inference quality, with maximum VRAM usage. |
105
+ | [gptq-8bit--1g-actorder_True](https://huggingface.co/TheBloke/OpenBuddy-Llama2-13B-v11.1-GPTQ/tree/gptq-8bit--1g-actorder_True) | 8 | None | Yes | 0.1 | [wikitext](https://huggingface.co/datasets/wikitext/viewer/wikitext-2-v1/test) | 4096 | 13.48 GB | No | 8-bit, with Act Order. No group size, to lower VRAM requirements. |
106
+ | [gptq-8bit-128g-actorder_True](https://huggingface.co/TheBloke/OpenBuddy-Llama2-13B-v11.1-GPTQ/tree/gptq-8bit-128g-actorder_True) | 8 | 128 | Yes | 0.1 | [wikitext](https://huggingface.co/datasets/wikitext/viewer/wikitext-2-v1/test) | 4096 | 13.77 GB | No | 8-bit, with group size 128g for higher inference quality and with Act Order for even higher accuracy. |
107
 
108
  <!-- README_GPTQ.md-provided-files end -->
109
 
110
  <!-- README_GPTQ.md-download-from-branches start -->
111
  ## How to download from branches
112
 
113
+ - In text-generation-webui, you can add `:branch` to the end of the download name, eg `TheBloke/OpenBuddy-Llama2-13B-v11.1-GPTQ:main`
114
  - With Git, you can clone a branch with:
115
  ```
116
+ git clone --single-branch --branch main https://huggingface.co/TheBloke/OpenBuddy-Llama2-13B-v11.1-GPTQ
117
  ```
118
  - In Python Transformers code, the branch is the `revision` parameter; see below.
119
  <!-- README_GPTQ.md-download-from-branches end -->
 
126
 
127
  1. Click the **Model tab**.
128
  2. Under **Download custom model or LoRA**, enter `TheBloke/OpenBuddy-Llama2-13B-v11.1-GPTQ`.
129
+ - To download from a specific branch, enter for example `TheBloke/OpenBuddy-Llama2-13B-v11.1-GPTQ:main`
130
  - see Provided Files above for the list of branches for each option.
131
  3. Click **Download**.
132
  4. The model will start downloading. Once it's finished it will say "Done".
 
174
 
175
  model_name_or_path = "TheBloke/OpenBuddy-Llama2-13B-v11.1-GPTQ"
176
  # To use a different branch, change revision
177
+ # For example: revision="main"
178
  model = AutoModelForCausalLM.from_pretrained(model_name_or_path,
 
179
  device_map="auto",
180
+ trust_remote_code=False,
181
  revision="main")
182
 
183
  tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)
184
 
185
  prompt = "Tell me about AI"
186
+ prompt_template=f'''You are a helpful, respectful and honest INTP-T AI Assistant named Buddy. You are talking to a human User.
187
+ Always answer as helpfully and logically as possible, while being safe. Your answers should not include any harmful, political, religious, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.
188
+ If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
189
+ You like to use emojis. You can speak fluently in many languages, for example: English, Chinese.
190
+ You cannot access the internet, but you have vast knowledge, cutoff: 2021-09.
191
+ You are trained by OpenBuddy team, (https://openbuddy.ai, https://github.com/OpenBuddy/OpenBuddy), you are based on LLaMA and Falcon transformers model, not related to GPT or OpenAI.
192
 
193
+ User: {prompt}
194
+ Assistant:
195
 
196
  '''
197
 
198
  print("\n\n*** Generate:")
199
 
200
  input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()
201
+ output = model.generate(inputs=input_ids, temperature=0.7, do_sample=True, top_p=0.95, top_k=40, max_new_tokens=512)
202
  print(tokenizer.decode(output[0]))
203
 
204
  # Inference can also be done using transformers' pipeline
 
209
  model=model,
210
  tokenizer=tokenizer,
211
  max_new_tokens=512,
212
+ do_sample=True,
213
  temperature=0.7,
214
  top_p=0.95,
215
+ top_k=40,
216
+ repetition_penalty=1.1
217
  )
218
 
219
  print(pipe(prompt_template)[0]['generated_text'])
 
238
 
239
  [TheBloke AI's Discord server](https://discord.gg/theblokeai)
240
 
241
+ ## Thanks, and how to contribute
242
 
243
  Thanks to the [chirper.ai](https://chirper.ai) team!
244
 
245
+ Thanks to Clay from [gpus.llm-utils.org](llm-utils)!
246
+
247
  I've had a lot of people ask if they can contribute. I enjoy providing models and helping people, and would love to be able to spend even more time doing it, as well as expanding into new projects like fine tuning/training.
248
 
249
  If you're able and willing to contribute it will be most gratefully received and will help me to keep providing more models, and to start work on new AI projects.