TheBloke commited on
Commit
8afa2b7
1 Parent(s): 1d9421e

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +36 -21
README.md CHANGED
@@ -1,4 +1,5 @@
1
  ---
 
2
  inference: false
3
  language:
4
  - zh
@@ -12,7 +13,6 @@ language:
12
  library_name: transformers
13
  license: llama2
14
  model_creator: OpenBuddy
15
- model_link: https://huggingface.co/OpenBuddy/openbuddy-llama2-70b-v10.1-bf16
16
  model_name: OpenBuddy Llama2 70b v10.1
17
  model_type: llama
18
  pipeline_tag: text-generation
@@ -57,18 +57,24 @@ Multiple GPTQ parameter permutations are provided; see Provided Files below for
57
  <!-- repositories-available end -->
58
 
59
  <!-- prompt-template start -->
60
- ## Prompt template: Vicuna-Short
61
 
62
  ```
63
- You are a helpful AI assistant.
 
 
 
 
 
64
 
65
- USER: {prompt}
66
- ASSISTANT:
67
 
68
  ```
69
 
70
  <!-- prompt-template end -->
71
 
 
72
  <!-- README_GPTQ.md-provided-files start -->
73
  ## Provided files and GPTQ parameters
74
 
@@ -93,22 +99,22 @@ All recent GPTQ files are made with AutoGPTQ, and all files in non-main branches
93
 
94
  | Branch | Bits | GS | Act Order | Damp % | GPTQ Dataset | Seq Len | Size | ExLlama | Desc |
95
  | ------ | ---- | -- | --------- | ------ | ------------ | ------- | ---- | ------- | ---- |
96
- | [main](https://huggingface.co/TheBloke/OpenBuddy-Llama2-70b-v10.1-GPTQ/tree/main) | 4 | None | Yes | 0.1 | [wikitext](https://huggingface.co/datasets/wikitext/viewer/wikitext-2-v1/test) | 4096 | 35.52 GB | Yes | Most compatible option. Good inference speed in AutoGPTQ and GPTQ-for-LLaMa. Lower inference quality than other options. |
97
- | [gptq-4bit-32g-actorder_True](https://huggingface.co/TheBloke/OpenBuddy-Llama2-70b-v10.1-GPTQ/tree/gptq-4bit-32g-actorder_True) | 4 | 32 | Yes | 0.1 | [wikitext](https://huggingface.co/datasets/wikitext/viewer/wikitext-2-v1/test) | 4096 | 40.85 GB | Yes | 4-bit, with Act Order and group size 32g. Gives highest possible inference quality, with maximum VRAM usage. Poor AutoGPTQ CUDA speed. |
98
- | [gptq-4bit-64g-actorder_True](https://huggingface.co/TheBloke/OpenBuddy-Llama2-70b-v10.1-GPTQ/tree/gptq-4bit-64g-actorder_True) | 4 | 64 | Yes | 0.1 | [wikitext](https://huggingface.co/datasets/wikitext/viewer/wikitext-2-v1/test) | 4096 | 38.17 GB | Yes | 4-bit, with Act Order and group size 64g. Uses less VRAM than 32g, but with slightly lower accuracy. Poor AutoGPTQ CUDA speed. |
99
- | [gptq-4bit-128g-actorder_True](https://huggingface.co/TheBloke/OpenBuddy-Llama2-70b-v10.1-GPTQ/tree/gptq-4bit-128g-actorder_True) | 4 | 128 | Yes | 0.1 | [wikitext](https://huggingface.co/datasets/wikitext/viewer/wikitext-2-v1/test) | 4096 | 36.84 GB | Yes | 4-bit, with Act Order and group size 128g. Uses even less VRAM than 64g, but with slightly lower accuracy. Poor AutoGPTQ CUDA speed. |
100
  | [gptq-3bit--1g-actorder_True](https://huggingface.co/TheBloke/OpenBuddy-Llama2-70b-v10.1-GPTQ/tree/gptq-3bit--1g-actorder_True) | 3 | None | Yes | 0.1 | [wikitext](https://huggingface.co/datasets/wikitext/viewer/wikitext-2-v1/test) | 4096 | 26.96 GB | No | 3-bit, with Act Order and no group size. Lowest possible VRAM requirements. May be lower quality than 3-bit 128g. |
101
- | [gptq-3bit-128g-actorder_True](https://huggingface.co/TheBloke/OpenBuddy-Llama2-70b-v10.1-GPTQ/tree/gptq-3bit-128g-actorder_True) | 3 | 128 | Yes | 0.1 | [wikitext](https://huggingface.co/datasets/wikitext/viewer/wikitext-2-v1/test) | 4096 | 28.21 GB | No | 3-bit, with group size 128g and act-order. Higher quality than 128g-False but poor AutoGPTQ CUDA speed. |
102
 
103
  <!-- README_GPTQ.md-provided-files end -->
104
 
105
  <!-- README_GPTQ.md-download-from-branches start -->
106
  ## How to download from branches
107
 
108
- - In text-generation-webui, you can add `:branch` to the end of the download name, eg `TheBloke/OpenBuddy-Llama2-70b-v10.1-GPTQ:gptq-4bit-32g-actorder_True`
109
  - With Git, you can clone a branch with:
110
  ```
111
- git clone --single-branch --branch gptq-4bit-32g-actorder_True https://huggingface.co/TheBloke/OpenBuddy-Llama2-70b-v10.1-GPTQ
112
  ```
113
  - In Python Transformers code, the branch is the `revision` parameter; see below.
114
  <!-- README_GPTQ.md-download-from-branches end -->
@@ -121,7 +127,7 @@ It is strongly recommended to use the text-generation-webui one-click-installers
121
 
122
  1. Click the **Model tab**.
123
  2. Under **Download custom model or LoRA**, enter `TheBloke/OpenBuddy-Llama2-70b-v10.1-GPTQ`.
124
- - To download from a specific branch, enter for example `TheBloke/OpenBuddy-Llama2-70b-v10.1-GPTQ:gptq-4bit-32g-actorder_True`
125
  - see Provided Files above for the list of branches for each option.
126
  3. Click **Download**.
127
  4. The model will start downloading. Once it's finished it will say "Done".
@@ -169,26 +175,31 @@ from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
169
 
170
  model_name_or_path = "TheBloke/OpenBuddy-Llama2-70b-v10.1-GPTQ"
171
  # To use a different branch, change revision
172
- # For example: revision="gptq-4bit-32g-actorder_True"
173
  model = AutoModelForCausalLM.from_pretrained(model_name_or_path,
174
- torch_dtype=torch.bfloat16,
175
  device_map="auto",
 
176
  revision="main")
177
 
178
  tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)
179
 
180
  prompt = "Tell me about AI"
181
- prompt_template=f'''You are a helpful AI assistant.
 
 
 
 
 
182
 
183
- USER: {prompt}
184
- ASSISTANT:
185
 
186
  '''
187
 
188
  print("\n\n*** Generate:")
189
 
190
  input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()
191
- output = model.generate(inputs=input_ids, temperature=0.7, max_new_tokens=512)
192
  print(tokenizer.decode(output[0]))
193
 
194
  # Inference can also be done using transformers' pipeline
@@ -199,9 +210,11 @@ pipe = pipeline(
199
  model=model,
200
  tokenizer=tokenizer,
201
  max_new_tokens=512,
 
202
  temperature=0.7,
203
  top_p=0.95,
204
- repetition_penalty=1.15
 
205
  )
206
 
207
  print(pipe(prompt_template)[0]['generated_text'])
@@ -226,10 +239,12 @@ For further support, and discussions on these models and AI in general, join us
226
 
227
  [TheBloke AI's Discord server](https://discord.gg/theblokeai)
228
 
229
- ## Thanks, and how to contribute.
230
 
231
  Thanks to the [chirper.ai](https://chirper.ai) team!
232
 
 
 
233
  I've had a lot of people ask if they can contribute. I enjoy providing models and helping people, and would love to be able to spend even more time doing it, as well as expanding into new projects like fine tuning/training.
234
 
235
  If you're able and willing to contribute it will be most gratefully received and will help me to keep providing more models, and to start work on new AI projects.
 
1
  ---
2
+ base_model: https://huggingface.co/OpenBuddy/openbuddy-llama2-70b-v10.1-bf16
3
  inference: false
4
  language:
5
  - zh
 
13
  library_name: transformers
14
  license: llama2
15
  model_creator: OpenBuddy
 
16
  model_name: OpenBuddy Llama2 70b v10.1
17
  model_type: llama
18
  pipeline_tag: text-generation
 
57
  <!-- repositories-available end -->
58
 
59
  <!-- prompt-template start -->
60
+ ## Prompt template: OpenBuddy
61
 
62
  ```
63
+ You are a helpful, respectful and honest INTP-T AI Assistant named Buddy. You are talking to a human User.
64
+ Always answer as helpfully and logically as possible, while being safe. Your answers should not include any harmful, political, religious, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.
65
+ If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
66
+ You like to use emojis. You can speak fluently in many languages, for example: English, Chinese.
67
+ You cannot access the internet, but you have vast knowledge, cutoff: 2021-09.
68
+ You are trained by OpenBuddy team, (https://openbuddy.ai, https://github.com/OpenBuddy/OpenBuddy), you are based on LLaMA and Falcon transformers model, not related to GPT or OpenAI.
69
 
70
+ User: {prompt}
71
+ Assistant:
72
 
73
  ```
74
 
75
  <!-- prompt-template end -->
76
 
77
+
78
  <!-- README_GPTQ.md-provided-files start -->
79
  ## Provided files and GPTQ parameters
80
 
 
99
 
100
  | Branch | Bits | GS | Act Order | Damp % | GPTQ Dataset | Seq Len | Size | ExLlama | Desc |
101
  | ------ | ---- | -- | --------- | ------ | ------------ | ------- | ---- | ------- | ---- |
102
+ | [main](https://huggingface.co/TheBloke/OpenBuddy-Llama2-70b-v10.1-GPTQ/tree/main) | 4 | None | Yes | 0.1 | [wikitext](https://huggingface.co/datasets/wikitext/viewer/wikitext-2-v1/test) | 4096 | 35.52 GB | Yes | 4-bit, with Act Order. No group size, to lower VRAM requirements. |
103
+ | [gptq-4bit-32g-actorder_True](https://huggingface.co/TheBloke/OpenBuddy-Llama2-70b-v10.1-GPTQ/tree/gptq-4bit-32g-actorder_True) | 4 | 32 | Yes | 0.1 | [wikitext](https://huggingface.co/datasets/wikitext/viewer/wikitext-2-v1/test) | 4096 | 40.85 GB | Yes | 4-bit, with Act Order and group size 32g. Gives highest possible inference quality, with maximum VRAM usage. |
104
+ | [gptq-4bit-64g-actorder_True](https://huggingface.co/TheBloke/OpenBuddy-Llama2-70b-v10.1-GPTQ/tree/gptq-4bit-64g-actorder_True) | 4 | 64 | Yes | 0.1 | [wikitext](https://huggingface.co/datasets/wikitext/viewer/wikitext-2-v1/test) | 4096 | 38.17 GB | Yes | 4-bit, with Act Order and group size 64g. Uses less VRAM than 32g, but with slightly lower accuracy. |
105
+ | [gptq-4bit-128g-actorder_True](https://huggingface.co/TheBloke/OpenBuddy-Llama2-70b-v10.1-GPTQ/tree/gptq-4bit-128g-actorder_True) | 4 | 128 | Yes | 0.1 | [wikitext](https://huggingface.co/datasets/wikitext/viewer/wikitext-2-v1/test) | 4096 | 36.84 GB | Yes | 4-bit, with Act Order and group size 128g. Uses even less VRAM than 64g, but with slightly lower accuracy. |
106
  | [gptq-3bit--1g-actorder_True](https://huggingface.co/TheBloke/OpenBuddy-Llama2-70b-v10.1-GPTQ/tree/gptq-3bit--1g-actorder_True) | 3 | None | Yes | 0.1 | [wikitext](https://huggingface.co/datasets/wikitext/viewer/wikitext-2-v1/test) | 4096 | 26.96 GB | No | 3-bit, with Act Order and no group size. Lowest possible VRAM requirements. May be lower quality than 3-bit 128g. |
107
+ | [gptq-3bit-128g-actorder_True](https://huggingface.co/TheBloke/OpenBuddy-Llama2-70b-v10.1-GPTQ/tree/gptq-3bit-128g-actorder_True) | 3 | 128 | Yes | 0.1 | [wikitext](https://huggingface.co/datasets/wikitext/viewer/wikitext-2-v1/test) | 4096 | 28.21 GB | No | 3-bit, with group size 128g and act-order. Higher quality than 128g-False. |
108
 
109
  <!-- README_GPTQ.md-provided-files end -->
110
 
111
  <!-- README_GPTQ.md-download-from-branches start -->
112
  ## How to download from branches
113
 
114
+ - In text-generation-webui, you can add `:branch` to the end of the download name, eg `TheBloke/OpenBuddy-Llama2-70b-v10.1-GPTQ:main`
115
  - With Git, you can clone a branch with:
116
  ```
117
+ git clone --single-branch --branch main https://huggingface.co/TheBloke/OpenBuddy-Llama2-70b-v10.1-GPTQ
118
  ```
119
  - In Python Transformers code, the branch is the `revision` parameter; see below.
120
  <!-- README_GPTQ.md-download-from-branches end -->
 
127
 
128
  1. Click the **Model tab**.
129
  2. Under **Download custom model or LoRA**, enter `TheBloke/OpenBuddy-Llama2-70b-v10.1-GPTQ`.
130
+ - To download from a specific branch, enter for example `TheBloke/OpenBuddy-Llama2-70b-v10.1-GPTQ:main`
131
  - see Provided Files above for the list of branches for each option.
132
  3. Click **Download**.
133
  4. The model will start downloading. Once it's finished it will say "Done".
 
175
 
176
  model_name_or_path = "TheBloke/OpenBuddy-Llama2-70b-v10.1-GPTQ"
177
  # To use a different branch, change revision
178
+ # For example: revision="main"
179
  model = AutoModelForCausalLM.from_pretrained(model_name_or_path,
 
180
  device_map="auto",
181
+ trust_remote_code=False,
182
  revision="main")
183
 
184
  tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)
185
 
186
  prompt = "Tell me about AI"
187
+ prompt_template=f'''You are a helpful, respectful and honest INTP-T AI Assistant named Buddy. You are talking to a human User.
188
+ Always answer as helpfully and logically as possible, while being safe. Your answers should not include any harmful, political, religious, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.
189
+ If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
190
+ You like to use emojis. You can speak fluently in many languages, for example: English, Chinese.
191
+ You cannot access the internet, but you have vast knowledge, cutoff: 2021-09.
192
+ You are trained by OpenBuddy team, (https://openbuddy.ai, https://github.com/OpenBuddy/OpenBuddy), you are based on LLaMA and Falcon transformers model, not related to GPT or OpenAI.
193
 
194
+ User: {prompt}
195
+ Assistant:
196
 
197
  '''
198
 
199
  print("\n\n*** Generate:")
200
 
201
  input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()
202
+ output = model.generate(inputs=input_ids, temperature=0.7, do_sample=True, top_p=0.95, top_k=40, max_new_tokens=512)
203
  print(tokenizer.decode(output[0]))
204
 
205
  # Inference can also be done using transformers' pipeline
 
210
  model=model,
211
  tokenizer=tokenizer,
212
  max_new_tokens=512,
213
+ do_sample=True,
214
  temperature=0.7,
215
  top_p=0.95,
216
+ top_k=40,
217
+ repetition_penalty=1.1
218
  )
219
 
220
  print(pipe(prompt_template)[0]['generated_text'])
 
239
 
240
  [TheBloke AI's Discord server](https://discord.gg/theblokeai)
241
 
242
+ ## Thanks, and how to contribute
243
 
244
  Thanks to the [chirper.ai](https://chirper.ai) team!
245
 
246
+ Thanks to Clay from [gpus.llm-utils.org](llm-utils)!
247
+
248
  I've had a lot of people ask if they can contribute. I enjoy providing models and helping people, and would love to be able to spend even more time doing it, as well as expanding into new projects like fine tuning/training.
249
 
250
  If you're able and willing to contribute it will be most gratefully received and will help me to keep providing more models, and to start work on new AI projects.