jordiclive commited on
Commit
bc83a40
1 Parent(s): b252c5c

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +185 -0
README.md ADDED
@@ -0,0 +1,185 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ datasets:
4
+ - OpenAssistant/oasst1
5
+ language:
6
+ - en
7
+ tags:
8
+ - sft
9
+ pipeline_tag: text-generation
10
+ widget:
11
+ - text: >-
12
+ <|prompter|>What is a meme, and what's the history behind this
13
+ word?<|endoftext|><|assistant|>
14
+ - text: <|prompter|>What's the Earth total population<|endoftext|><|assistant|>
15
+ - text: <|prompter|>Write a story about future of AI development<|endoftext|><|assistant|>
16
+ ---
17
+
18
+
19
+
20
+ # Load Merged Model (Recommended, identical configuration to a fine-tuned model)
21
+ ```
22
+ import torch
23
+ from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig
24
+
25
+ repo_id = "jordiclive/falcon-40b-lora-sft-stage2-1.1k"
26
+ dtype = torch.bfloat16
27
+ tokenizer = AutoTokenizer.from_pretrained(repo_id)
28
+ model = AutoModelForCausalLM.from_pretrained(
29
+ repo_id,
30
+ subfolder="merged_model",
31
+ torch_dtype=dtype,
32
+ trust_remote_code=True,
33
+ )
34
+ ```
35
+
36
+
37
+
38
+ # LoRA Adapter for Falcon 40B trained on oasst-top1
39
+
40
+ This repo contains a **Falcon 40B** LoRA fine-tuned model and the low-rank adapter fit on datasets part of the OpenAssistant project.
41
+
42
+
43
+ This version of the weights was trained with the following hyperparameters:
44
+
45
+ - Epochs: 8
46
+ - Batch size: 128
47
+ - Max Length: 2048
48
+ - Learning rate: 1e-4
49
+ - Lora _r_: 64
50
+ - Lora Alpha: 16
51
+ - Lora target modules: ["dense_4h_to_h", "dense", "query_key_value", "dense_h_to_4h"]
52
+
53
+ These are recommended from the QLoRA paper. The model was trained with flash attention and gradient checkpointing and deepspeed stage 3 on 8 x A100 80gb
54
+
55
+
56
+ Dataset:
57
+ ```
58
+ oasst-top1:
59
+ datasets:
60
+ - oasst_export:
61
+ lang: "bg,ca,cs,da,de,en,es,fr,hr,hu,it,nl,pl,pt,ro,ru,sl,sr,sv,uk" # sft-8.0
62
+ input_file_path: 2023-05-06_OASST_labels.jsonl.gz
63
+ val_split: 0.05
64
+ top_k: 1
65
+ ```
66
+
67
+ ## Model Details
68
+
69
+ - **Developed** as part of the OpenAssistant Project
70
+ - **Model type:** PEFT Adapter for frozen Falcon
71
+ - **Language:** English
72
+
73
+ ## Prompting
74
+
75
+ Two special tokens are used to mark the beginning of user and assistant turns:
76
+ `<|prompter|>` and `<|assistant|>`. Each turn ends with a `<|endoftext|>` token.
77
+
78
+ Input prompt example:
79
+ ```
80
+ <|prompter|>What is a meme, and what's the history behind this word?<|endoftext|><|assistant|>
81
+ ```
82
+ The input ends with the `<|assistant|>` token to signal that the model should
83
+ start generating the assistant reply.
84
+
85
+
86
+
87
+ # Example Inference code (Prompt Template)
88
+
89
+ ```
90
+ model = model.to(device)
91
+ if dtype == torch.float16:
92
+ model = model.half()
93
+
94
+
95
+ # Choose Generation parameters
96
+
97
+ generation_config = GenerationConfig(
98
+ temperature=0.1,
99
+ top_p=0.75,
100
+ top_k=40,
101
+ num_beams=4,
102
+ )
103
+
104
+
105
+ def format_system_prompt(prompt, eos_token=tokenizer.eos_token):
106
+ return "{}{}{}{}".format("<|prompter|>", prompt, eos_token, "<|assistant|>")
107
+
108
+ def generate(prompt, generation_config=generation_config, max_new_tokens=2048, device=device):
109
+ prompt = format_system_prompt(prompt,eos_token=tokenizer.eos_token) # OpenAssistant Prompt Format expected
110
+ input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)
111
+ with torch.no_grad():
112
+ generation_output = model.generate(
113
+ input_ids=input_ids,
114
+ generation_config=generation_config,
115
+ return_dict_in_generate=True,
116
+ output_scores=True,
117
+ max_new_tokens=max_new_tokens,
118
+ eos_token_id=tokenizer.eos_token_id,
119
+ )
120
+ s = generation_output.sequences[0]
121
+ output = tokenizer.decode(s)
122
+ print("Text generated:")
123
+ print(output)
124
+ return output
125
+ ```
126
+
127
+ ## LoRA weights
128
+ If you want to use the LoRA weights separately, several special token embeddings also need to be added.
129
+
130
+ ```
131
+ base_model_id = "tiiuae/falcon-40b"
132
+
133
+ import torch
134
+ import transformers
135
+ from huggingface_hub import hf_hub_download
136
+ from peft import PeftModel
137
+
138
+
139
+ def add_embeddings(model, embed_path, tokenizer):
140
+ old_embeddings = model.get_input_embeddings()
141
+ old_num_tokens, old_embedding_dim = old_embeddings.weight.size()
142
+ new_embeddings = torch.nn.Embedding(old_num_tokens, old_embedding_dim)
143
+ new_embeddings.to(old_embeddings.weight.device, dtype=old_embeddings.weight.dtype)
144
+ model._init_weights(new_embeddings)
145
+ embed_weights = torch.load(embed_path, map_location=old_embeddings.weight.device)
146
+ vocab_size = tokenizer.vocab_size
147
+ new_embeddings.weight.data[:vocab_size, :] = old_embeddings.weight.data[:vocab_size, :]
148
+ new_embeddings.weight.data[vocab_size : vocab_size + embed_weights.shape[0], :] = embed_weights.to(
149
+ new_embeddings.weight.dtype
150
+ ).to(new_embeddings.weight.device)
151
+ model.set_input_embeddings(new_embeddings)
152
+ model.tie_weights()
153
+
154
+
155
+ def load_peft_model(model, peft_model_path, tokenizer):
156
+ embed_weights = hf_hub_download(peft_model_path, "extra_embeddings.pt")
157
+ model.resize_token_embeddings(tokenizer.vocab_size + torch.load(embed_weights).shape[0])
158
+ model.config.eos_token_id = tokenizer.eos_token_id
159
+ model.config.bos_token_id = tokenizer.bos_token_id
160
+ model.config.pad_token_id = tokenizer.pad_token_id
161
+ model = PeftModel.from_pretrained(
162
+ model,
163
+ model_id=peft_model_path,
164
+ torch_dtype=model.dtype,
165
+ )
166
+ model.eos_token_id = tokenizer.eos_token_id
167
+ add_embeddings(model, embed_weights, tokenizer)
168
+ return model
169
+
170
+
171
+ def load_lora_model(base_model_id, tokenizer, device, dtype):
172
+ model = transformers.AutoModelForCausalLM.from_pretrained(
173
+ base_model_id,
174
+ torch_dtype=dtype,
175
+ trust_remote_code=True,
176
+ )
177
+ model = load_peft_model(model, repo_id, tokenizer)
178
+ model = model.to(device)
179
+ return model
180
+
181
+
182
+ model = load_lora_model(base_model_id=base_model_id, tokenizer=tokenizer, device=device, dtype=dtype)
183
+ ```
184
+
185
+