Sandiago21 commited on
Commit
0ff9c51
1 Parent(s): 6535eae

commit initial model artifacts

Browse files
.gitattributes CHANGED
@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ finetuned_conversations.pth filter=lfs diff=lfs merge=lfs -text
37
+ pytorch_model.bin filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,3 +1,254 @@
1
  ---
2
  license: other
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: other
3
+ language:
4
+ - en
5
+ library_name: transformers
6
+ pipeline_tag: text-generation
7
+ tags:
8
+ - falcon
9
+ - falcon-40b
10
+ - prompt answering
11
+ - peft
12
  ---
13
+
14
+ ## Model Card for Model ID
15
+
16
+ This repository contains further fine-tuned falcon-40b model on conversations and question answering prompts.
17
+
18
+ **I used falcon-40b (https://huggingface.co/tiiuae/falcon-40b) as a base model, so this model has the same license with falcon-40b model (Apache-2.0)**
19
+
20
+
21
+ ## Model Details
22
+
23
+ Anyone can use (ask prompts) and play with the model using the pre-existing Jupyter Notebook in the **noteboooks** folder. The Jupyter Notebook contains example code to load the model and ask prompts to it as well as example prompts to get you started.
24
+
25
+ ### Model Description
26
+
27
+ The tiiuae/falcon-40b model was finetuned on conversations and question answering prompts.
28
+
29
+ **Developed by:** [More Information Needed]
30
+
31
+ **Shared by:** [More Information Needed]
32
+
33
+ **Model type:** Causal LM
34
+
35
+ **Language(s) (NLP):** English, multilingual
36
+
37
+ **License:** Apache-2.0
38
+
39
+ **Finetuned from model:** tiiuae/falcon-40b
40
+
41
+
42
+ ## Model Sources [optional]
43
+
44
+ **Repository:** [More Information Needed]
45
+ **Paper:** [More Information Needed]
46
+ **Demo:** [More Information Needed]
47
+
48
+ ## Uses
49
+
50
+ The model can be used for prompt answering
51
+
52
+
53
+ ### Direct Use
54
+
55
+ The model can be used for prompt answering
56
+
57
+
58
+ ### Downstream Use
59
+
60
+ Generating text and prompt answering
61
+
62
+
63
+ ## Recommendations
64
+
65
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
66
+
67
+
68
+ # Usage
69
+
70
+ ## Creating prompt
71
+
72
+ The model was trained on the following kind of prompt:
73
+
74
+ ```python
75
+ def generate_prompt(prompt: str) -> str:
76
+ return f"""
77
+ <human>: {prompt}
78
+ <assistant>:
79
+ """.strip()
80
+ ```
81
+
82
+ ## How to Get Started with the Model
83
+
84
+ Use the code below to get started with the model.
85
+
86
+ 1. You can git clone the repo, which contains also the artifacts for the base model for simplicity and completeness, and run the following code snippet to load the mode:
87
+
88
+ ```python
89
+ import torch
90
+ from peft import PeftConfig, PeftModel
91
+ from transformers import GenerationConfig, AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
92
+
93
+ MODEL_NAME = "."
94
+
95
+ config = PeftConfig.from_pretrained(MODEL_NAME)
96
+
97
+ compute_dtype = getattr(torch, "float16")
98
+
99
+ bnb_config = BitsAndBytesConfig(
100
+ load_in_4bit=True,
101
+ bnb_4bit_quant_type="nf4",
102
+ bnb_4bit_compute_dtype=compute_dtype,
103
+ bnb_4bit_use_double_quant=True,
104
+ )
105
+
106
+ model = AutoModelForCausalLM.from_pretrained(
107
+ config.base_model_name_or_path,
108
+ quantization_config=bnb_config,
109
+ device_map="auto",
110
+ trust_remote_code=True,
111
+ )
112
+
113
+ tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
114
+
115
+ model = PeftModel.from_pretrained(model, MODEL_NAME)
116
+
117
+ generation_config = model.generation_config
118
+ generation_config.top_p = 0.7
119
+ generation_config.num_return_sequences = 1
120
+ generation_config.max_new_tokens = 32
121
+ generation_config.use_cache = False
122
+ generation_config.pad_token_id = tokenizer.eos_token_id
123
+ generation_config.eos_token_id = tokenizer.eos_token_id
124
+
125
+ model.eval()
126
+ if torch.__version__ >= "2":
127
+ model = torch.compile(model)
128
+ ```
129
+
130
+ ### Example of Usage
131
+ ```python
132
+ prompt = "What is the capital city of Greece and with which countries does Greece border?"
133
+
134
+ prompt = generate_prompt(prompt)
135
+ input_ids = tokenizer(prompt, return_tensors="pt").input_ids
136
+ input_ids = input_ids.to(model.device)
137
+
138
+ with torch.no_grad():
139
+ outputs = model.generate(
140
+ input_ids=input_ids,
141
+ generation_config=generation_config,
142
+ return_dict_in_generate=True,
143
+ output_scores=True,
144
+ )
145
+
146
+ response = tokenizer.decode(outputs.sequences[0], skip_special_tokens=True)
147
+ print(response)
148
+
149
+ >>> The capital city of Greece is Athens and it borders Albania, Bulgaria, Macedonia, and Turkey.
150
+ ```
151
+
152
+ 2. You can also directly call the model from HuggingFace using the following code snippet:
153
+
154
+ ```python
155
+ import torch
156
+ from peft import PeftConfig, PeftModel
157
+ from transformers import GenerationConfig, AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
158
+
159
+ MODEL_NAME = "Sandiago21/falcon-40b-prompt-answering"
160
+ BASE_MODEL = "tiiuae/falcon-40b"
161
+
162
+ compute_dtype = getattr(torch, "float16")
163
+
164
+ bnb_config = BitsAndBytesConfig(
165
+ load_in_4bit=True,
166
+ bnb_4bit_quant_type="nf4",
167
+ bnb_4bit_compute_dtype=compute_dtype,
168
+ bnb_4bit_use_double_quant=True,
169
+ )
170
+
171
+ model = AutoModelForCausalLM.from_pretrained(
172
+ BASE_MODEL,
173
+ quantization_config=bnb_config,
174
+ device_map="auto",
175
+ trust_remote_code=True,
176
+ )
177
+
178
+ tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
179
+
180
+ model = PeftModel.from_pretrained(model, MODEL_NAME)
181
+
182
+ generation_config = model.generation_config
183
+ generation_config.top_p = 0.7
184
+ generation_config.num_return_sequences = 1
185
+ generation_config.max_new_tokens = 32
186
+ generation_config.use_cache = False
187
+ generation_config.pad_token_id = tokenizer.eos_token_id
188
+ generation_config.eos_token_id = tokenizer.eos_token_id
189
+
190
+ model.eval()
191
+ if torch.__version__ >= "2":
192
+ model = torch.compile(model)
193
+ ```
194
+
195
+ ### Example of Usage
196
+
197
+ ```python
198
+ prompt = "What is the capital city of Greece and with which countries does Greece border?"
199
+
200
+ prompt = generate_prompt(prompt)
201
+ input_ids = tokenizer(prompt, return_tensors="pt").input_ids
202
+ input_ids = input_ids.to(model.device)
203
+
204
+ with torch.no_grad():
205
+ outputs = model.generate(
206
+ input_ids=input_ids,
207
+ generation_config=generation_config,
208
+ return_dict_in_generate=True,
209
+ output_scores=True,
210
+ )
211
+
212
+ response = tokenizer.decode(outputs.sequences[0], skip_special_tokens=True)
213
+ print(response)
214
+
215
+ >>> The capital city of Greece is Athens and it borders Albania, Bulgaria, Macedonia, and Turkey.
216
+ ```
217
+
218
+ ## Training Details
219
+
220
+ ## Training procedure
221
+
222
+ ### Training hyperparameters
223
+
224
+ The following hyperparameters were used during training:
225
+ - learning_rate: 2e-05
226
+ - train_batch_size: 4
227
+ - eval_batch_size: 8
228
+ - seed: 42
229
+ - gradient_accumulation_steps: 2
230
+ - total_train_batch_size: 8
231
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
232
+ - lr_scheduler_type: linear
233
+ - lr_scheduler_warmup_steps: 50
234
+ - num_epochs: 2
235
+ - mixed_precision_training: Native AMP
236
+
237
+ ### Framework versions
238
+
239
+ - Transformers 4.28.1
240
+ - Pytorch 2.0.0+cu117
241
+ - Datasets 2.12.0
242
+ - Tokenizers 0.12.1
243
+
244
+ ### Training Data
245
+
246
+ The tiiuae/falcon-40b was finetuned on conversations and question answering data
247
+
248
+ ### Training Procedure
249
+
250
+ The tiiuae/falcon-40b model was further trained and finetuned on question answering and prompts data for 1 epoch (approximately 10 hours of training on a single GPU)
251
+
252
+ ## Model Architecture and Objective
253
+
254
+ The model is based on tiiuae/falcon-40b model and finetuned adapters on top of the main model on conversations and question answering data.
adapter_config.json ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "base_model_name_or_path": "tiiuae/falcon-40b",
3
+ "bias": "none",
4
+ "fan_in_fan_out": false,
5
+ "inference_mode": true,
6
+ "init_lora_weights": true,
7
+ "layers_pattern": null,
8
+ "layers_to_transform": null,
9
+ "lora_alpha": 16,
10
+ "lora_dropout": 0.1,
11
+ "modules_to_save": null,
12
+ "peft_type": "LORA",
13
+ "r": 64,
14
+ "revision": null,
15
+ "target_modules": [
16
+ "query_key_value"
17
+ ],
18
+ "task_type": "CAUSAL_LM"
19
+ }
adapter_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:17eb2eb3871449a810505692bcd9d51ed01938e9125d74e627a93104de3cc676
3
+ size 267431853
config.json ADDED
@@ -0,0 +1,44 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "tiiuae/falcon-40b",
3
+ "alibi": false,
4
+ "apply_residual_connection_post_layernorm": false,
5
+ "architectures": [
6
+ "RWForCausalLM"
7
+ ],
8
+ "attention_dropout": 0.0,
9
+ "auto_map": {
10
+ "AutoConfig": "tiiuae/falcon-40b--configuration_RW.RWConfig",
11
+ "AutoModel": "tiiuae/falcon-40b--modelling_RW.RWModel",
12
+ "AutoModelForCausalLM": "tiiuae/falcon-40b--modelling_RW.RWForCausalLM",
13
+ "AutoModelForQuestionAnswering": "tiiuae/falcon-40b--modelling_RW.RWForQuestionAnswering",
14
+ "AutoModelForSequenceClassification": "tiiuae/falcon-40b--modelling_RW.RWForSequenceClassification",
15
+ "AutoModelForTokenClassification": "tiiuae/falcon-40b--modelling_RW.RWForTokenClassification"
16
+ },
17
+ "bias": false,
18
+ "bos_token_id": 11,
19
+ "eos_token_id": 11,
20
+ "hidden_dropout": 0.0,
21
+ "hidden_size": 8192,
22
+ "initializer_range": 0.02,
23
+ "layer_norm_epsilon": 1e-05,
24
+ "model_type": "RefinedWeb",
25
+ "n_head": 128,
26
+ "n_head_kv": 8,
27
+ "n_layer": 60,
28
+ "parallel_attn": true,
29
+ "quantization_config": {
30
+ "bnb_4bit_compute_dtype": "float16",
31
+ "bnb_4bit_quant_type": "nf4",
32
+ "bnb_4bit_use_double_quant": true,
33
+ "llm_int8_enable_fp32_cpu_offload": false,
34
+ "llm_int8_has_fp16_weight": false,
35
+ "llm_int8_skip_modules": null,
36
+ "llm_int8_threshold": 6.0,
37
+ "load_in_4bit": true,
38
+ "load_in_8bit": false
39
+ },
40
+ "torch_dtype": "bfloat16",
41
+ "transformers_version": "4.30.0.dev0",
42
+ "use_cache": false,
43
+ "vocab_size": 65024
44
+ }
finetuned_conversations.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0b1a352d4f1ab628ee67132bea9332505baab31553d204dcd8fd9f5e10af73a1
3
+ size 22790689091
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:948fcb4cb227f09823443ff3813dd14daa7d033268d72bea3a4ff38989b28bf0
3
+ size 22790664801
special_tokens_map.json ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ ">>TITLE<<",
4
+ ">>ABSTRACT<<",
5
+ ">>INTRODUCTION<<",
6
+ ">>SUMMARY<<",
7
+ ">>COMMENT<<",
8
+ ">>ANSWER<<",
9
+ ">>QUESTION<<",
10
+ ">>DOMAIN<<",
11
+ ">>PREFIX<<",
12
+ ">>SUFFIX<<",
13
+ ">>MIDDLE<<"
14
+ ],
15
+ "eos_token": "<|endoftext|>",
16
+ "pad_token": "<|endoftext|>"
17
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "clean_up_tokenization_spaces": true,
4
+ "eos_token": "<|endoftext|>",
5
+ "model_max_length": 2048,
6
+ "tokenizer_class": "PreTrainedTokenizerFast"
7
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:34e639b3ce4423bde112dbf3ebf2ac94b3cc7aee6acc1ecfbb17baeb71c95be1
3
+ size 3963