Text Generation
English
sft
jordiclive commited on
Commit
e28ba8e
1 Parent(s): aab36dd

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +177 -0
README.md ADDED
@@ -0,0 +1,177 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ datasets:
4
+ - sahil2801/CodeAlpaca-20k
5
+ - yahma/alpaca-cleaned
6
+ - databricks/databricks-dolly-15k
7
+ - OpenAssistant/oasst1
8
+ - jeffwan/sharegpt_vicuna
9
+ - qwedsacf/grade-school-math-instructions
10
+ - vicgalle/alpaca-gpt4
11
+ language:
12
+ - en
13
+ tags:
14
+ - sft
15
+ pipeline_tag: text-generation
16
+ widget:
17
+ - text: >-
18
+ <|prompter|>What is a meme, and what's the history behind this
19
+ word?</s><|assistant|>
20
+ - text: <|prompter|>What's the Earth total population</s><|assistant|>
21
+ - text: <|prompter|>Write a story about future of AI development</s><|assistant|>
22
+ ---
23
+
24
+
25
+
26
+ # LoRA Adapter for LLaMA 7B trained on more datasets than tloen/alpaca-lora-7b
27
+
28
+ This repo contains a low-rank adapter for **LLaMA-7b** fit on
29
+ - `Nebulous/gpt4all_pruned`
30
+ - `sahil2801/CodeAlpaca-20k`
31
+ - `yahma/alpaca-cleaned`
32
+ - datasets part of the OpenAssistant project.
33
+
34
+
35
+ You can see sampling results [here](https://open-assistant.github.io/oasst-model-eval/?f=https%3A%2F%2Fraw.githubusercontent.com%2FOpen-Assistant%2Foasst-model-eval%2Fmain%2Fsampling_reports%2Foasst-sft%2F2023-03-18_llama_30b_oasst_latcyr_400_sampling_noprefix_lottery.json%0Ahttps%3A%2F%2Fraw.githubusercontent.com%2FOpen-Assistant%2Foasst-model-eval%2F8e90ce6504c159d4046991bf37757c108aed913f%2Fsampling_reports%2Foasst-sft%2Freport_file_jordiclive_alpaca_gpt4-dolly_15k-vicuna-lora-7b_full_lottery_no_prefix.json)
36
+ Note these are not optimized and the OpenAssistant defaults for comparing models.
37
+
38
+ This version of the weights was trained with the following hyperparameters:
39
+
40
+ - Epochs: 8
41
+ - Batch size: 128
42
+ - Max Length: 2048
43
+ - Learning rate: 8e-6
44
+ - Lora _r_: 16
45
+ - Lora Alpha: 32
46
+ - Lora target modules: q_proj, k_proj, v_proj, o_proj
47
+
48
+ The model was trained with flash attention and gradient checkpointing.
49
+
50
+ ## Dataset Details
51
+ - dolly15k:
52
+ val_split: 0.05
53
+ max_val_set: 300
54
+ - oasst_export:
55
+ lang: "bg,ca,cs,da,de,en,es,fr,hr,hu,it,nl,pl,pt,ro,ru,sl,sr,sv,uk"
56
+ input_file_path: 2023-04-12_oasst_release_ready_synth.jsonl.gz
57
+ val_split: 0.05
58
+ - vicuna:
59
+ val_split: 0.05
60
+ max_val_set: 800
61
+ fraction: 0.8
62
+ - dolly15k:
63
+ val_split: 0.05
64
+ max_val_set: 300
65
+ - grade_school_math_instructions:
66
+ val_split: 0.05
67
+ - code_alpaca:
68
+ val_split: 0.05
69
+ max_val_set: 250
70
+ - alpaca_gpt4:
71
+ val_split: 0.02
72
+ max_val_set: 250
73
+
74
+ ## Model Details
75
+
76
+ - **Developed** as part of the OpenAssistant Project
77
+ - **Model type:** PEFT Adapter for frozen LLaMA
78
+ - **Language:** English
79
+
80
+ ## Prompting
81
+
82
+ Two special tokens are used to mark the beginning of user and assistant turns:
83
+ `<|prompter|>` and `<|assistant|>`. Each turn ends with a `<|endoftext|>` token.
84
+
85
+ Input prompt example:
86
+ ```
87
+ <|prompter|>What is a meme, and what's the history behind this word?</s><|assistant|>
88
+ ```
89
+ The input ends with the `<|assistant|>` token to signal that the model should
90
+ start generating the assistant reply.
91
+
92
+
93
+ # Example Inference Code (Note several embeddings need to be loaded along with the LoRA weights), assumes on GPU and torch.float16:
94
+
95
+ ```
96
+ from typing import List, NamedTuple
97
+
98
+ import torch
99
+ import transformers
100
+ from huggingface_hub import hf_hub_download
101
+ from peft import PeftModel
102
+ from transformers import GenerationConfig
103
+
104
+ device = "cuda" if torch.cuda.is_available() else "cpu"
105
+ tokenizer = transformers.AutoTokenizer.from_pretrained("jordiclive/alpaca_gpt4-dolly_15k-vicuna-lora-7b")
106
+
107
+
108
+ model = transformers.AutoModelForCausalLM.from_pretrained(
109
+ "decapoda-research/llama-7b-hf", torch_dtype=torch.float16
110
+ ) # Load Base Model
111
+ model.resize_token_embeddings(
112
+ len(tokenizer)
113
+ ) # This model repo also contains several embeddings for special tokens that need to be loaded.
114
+
115
+ model.config.eos_token_id = tokenizer.eos_token_id
116
+ model.config.bos_token_id = tokenizer.bos_token_id
117
+ model.config.pad_token_id = tokenizer.pad_token_id
118
+
119
+ lora_weights = "jordiclive/alpaca_gpt4-dolly_15k-vicuna-lora-7b"
120
+ model = PeftModel.from_pretrained(
121
+ model,
122
+ lora_weights,
123
+ torch_dtype=torch.float16,
124
+ ) # Load Lora model
125
+
126
+ model.eos_token_id = tokenizer.eos_token_id
127
+ filename = hf_hub_download("jordiclive/alpaca_gpt4-dolly_15k-vicuna-lora-7b", "extra_embeddings.pt")
128
+ embed_weights = torch.load(
129
+ filename, map_location=torch.device("cuda" if torch.cuda.is_available() else "cpu")
130
+ ) # Load embeddings for special tokens
131
+ model.base_model.model.model.embed_tokens.weight[32000:, :] = embed_weights.to(
132
+ model.base_model.model.model.embed_tokens.weight.dtype
133
+ ).to(
134
+ device
135
+ ) # Add special token embeddings
136
+
137
+
138
+ model = model.half().to(device)
139
+ generation_config = GenerationConfig(
140
+ temperature=0.1,
141
+ top_p=0.75,
142
+ top_k=40,
143
+ num_beams=4,
144
+ )
145
+
146
+
147
+ def format_system_prompt(prompt, eos_token="</s>"):
148
+ return "{}{}{}{}".format(
149
+ "<|prompter|>",
150
+ prompt,
151
+ eos_token,
152
+ "<|assistant|>"
153
+ )
154
+
155
+ def generate(prompt, generation_config=generation_config, max_new_tokens=2048, device=device):
156
+ prompt = format_system_prompt(prompt) # OpenAssistant Prompt Format expected
157
+ input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)
158
+ with torch.no_grad():
159
+ generation_output = model.generate(
160
+ input_ids=input_ids,
161
+ generation_config=generation_config,
162
+ return_dict_in_generate=True,
163
+ output_scores=True,
164
+ max_new_tokens=max_new_tokens,
165
+ eos_token_id=2,
166
+ )
167
+ s = generation_output.sequences[0]
168
+ output = tokenizer.decode(s)
169
+ print("Text generated:")
170
+ print(output)
171
+ return output
172
+
173
+
174
+ generate("What is a meme, and what's the history behind this word?")
175
+ generate("What's the Earth total population")
176
+ generate("Write a story about future of AI development")
177
+ ```