Text Generation
PEFT
Safetensors
mistral
conversational
Eval Results
dfurman commited on
Commit
ac2cfe3
1 Parent(s): e660611

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +231 -1
README.md CHANGED
@@ -13,9 +13,239 @@ pipeline_tag: text-generation
13
 
14
  This instruction model was built via parameter-efficient QLoRA finetuning of [llama-2-13b](https://huggingface.co/meta-llama/Llama-2-13b-hf) on the first 100k rows of [ehartford/dolphin](https://huggingface.co/datasets/ehartford/dolphin) (an open-source implementation of [Microsoft's Orca](https://www.microsoft.com/en-us/research/publication/orca-progressive-learning-from-complex-explanation-traces-of-gpt-4/)). Finetuning was executed on a single A6000 (48 GB) for roughly 18 hours on the [Lambda Labs](https://cloud.lambdalabs.com/instances) platform.
15
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16
  ---
17
 
18
  ### Framework versions
19
 
20
 
21
- - PEFT 0.5.0.dev0
 
13
 
14
  This instruction model was built via parameter-efficient QLoRA finetuning of [llama-2-13b](https://huggingface.co/meta-llama/Llama-2-13b-hf) on the first 100k rows of [ehartford/dolphin](https://huggingface.co/datasets/ehartford/dolphin) (an open-source implementation of [Microsoft's Orca](https://www.microsoft.com/en-us/research/publication/orca-progressive-learning-from-complex-explanation-traces-of-gpt-4/)). Finetuning was executed on a single A6000 (48 GB) for roughly 18 hours on the [Lambda Labs](https://cloud.lambdalabs.com/instances) platform.
15
 
16
+ * Model license: Llama 2 Community License Agreement
17
+ * Basic usage: [notebook](assets/basic_inference_llama_2_13b_dolphin.ipynb)
18
+ * Finetuning code: [notebook](https://github.com/daniel-furman/sft-demos/blob/main/src/sft/one_gpu/llama-2/dolphin/sft-llama-2-13b-dolphin-peft.ipynb)
19
+ * Loss curves: [plot](https://huggingface.co/dfurman/llama-2-13b-dolphin-peft#finetuning-description)
20
+ * Runtime stats: [table](https://huggingface.co/dfurman/llama-2-13b-dolphin-peft#runtime-tests)
21
+
22
+ ### Example prompts and responses
23
+
24
+ Example 1:
25
+
26
+ **User**:
27
+ >You are a helpful assistant. Write me a numbered list of things to do in New York City.\n
28
+
29
+ **llama-2-13b-dolphin-peft**:
30
+
31
+ > 1. Visit the iconic Statue of Liberty and Ellis Island.
32
+ > 2. Take a stroll through Central Park and enjoy its many attractions.
33
+ > 3. Explore the world-renowned museums, such as the Metropolitan Museum of Art and the Museum of Modern Art.
34
+ > 4. Experience the vibrant energy of Times Square and take in the bright lights and billboards.
35
+ > 5. Visit the 9/11 Memorial and Museum to pay tribute to those who lost their lives in the attacks.
36
+ > 6. Enjoy a Broadway show or a concert at one of the many theaters and venues in the city.
37
+ > 7. Take a ride on the Staten Island Ferry for a free view of the Statue of Liberty and the New York City skyline.
38
+ > 8. Shop at the famous Fifth Avenue stores and explore the high-end boutiques.
39
+ > 9. Indulge in a variety of cuisines at one of the many restaurants in the city.
40
+ > 10. Visit the Empire State Building and enjoy the panoramic views of the city from the observation deck.
41
+
42
+ <br>
43
+
44
+ Example 2:
45
+
46
+ **User**:
47
+ >You are a helpful assistant. Write a short email inviting my friends to a dinner party on Friday. Respond succinctly.\n"
48
+
49
+ **llama-2-13b-dolphin-peft**:
50
+
51
+ > Subject: Friday Dinner Party Invitation
52
+ >
53
+ > Dear Friends,
54
+ >
55
+ > I hope this email finds you well. I'm excited to invite you all to a dinner party on Friday, March 10th, at 7:00 PM. The address is 123 Main Street, Anytown, USA.
56
+ >
57
+ > Please RSVP by Wednesday, March 8th, so I can plan accordingly. I look forward to seeing you all and sharing a delicious meal together!
58
+ >
59
+ > Best,
60
+ > Your Friendly Assistant
61
+ >
62
+ > P.S. If you have any dietary restrictions or allergies, please let me know in your RSVP. Thank you!
63
+
64
+ <br>
65
+
66
+ ## Model Description
67
+
68
+ The architecture is a modification of a standard decoder-only transformer.
69
+
70
+ The llama-2-13b models have been modified from a standard transformer in the following ways:
71
+ * It uses the [SwiGLU activation function](https://arxiv.org/abs/2002.05202)
72
+ * It uses [rotary positional embeddings](https://arxiv.org/abs/2104.09864) (RoPE)
73
+
74
+ | Hyperparameter | Value |
75
+ |----------------|-------|
76
+ | n_parameters | 13B |
77
+ | tokens | 2.0T |
78
+ | vocab size | 32000 |
79
+ | sequence length | 4096 |
80
+
81
+ ## Finetuning Description
82
+
83
+ This model was trained on a single A6000 (48 GB) for about 18 hours using the [Lambda Labs](https://cloud.lambdalabs.com/instances) platform.
84
+
85
+ ![loss curves](https://raw.githubusercontent.com/daniel-furman/sft-demos/main/assets/jul_24_23_1_13_00_log_loss_curves_llama-2-13b-dolphin.png)
86
+
87
+ The above loss curve was generated from the run's private wandb.ai log.
88
+
89
+ ## PreTraining Data
90
+
91
+ For more details on the pretraining process, see [Llama-2-13b-hf](https://huggingface.co/meta-llama/Llama-2-13b-hf).
92
+
93
+ The data was tokenized using the [Llama-2-13b-hf](https://huggingface.co/meta-llama/Llama-2-13b-hf) tokenizer.
94
+
95
+ ## Limitations and Biases
96
+
97
+ _The following language is modified from [EleutherAI's GPT-NeoX-20B](https://huggingface.co/EleutherAI/gpt-neox-20b)_
98
+
99
+ This model can produce factually incorrect output, and should not be relied on to produce factually accurate information.
100
+ This model was trained on various public datasets.
101
+ While great efforts have been taken to clean the pretraining data, it is possible that this model could generate lewd, biased or otherwise offensive outputs.
102
+
103
+ ## How to Use
104
+
105
+ Basic usage: [notebook](assets/basic_inference_llama_2_13b_dolphin.ipynb)
106
+
107
+ Install and import the package dependencies:
108
+
109
+ ```python
110
+ !pip install -q -U huggingface_hub peft transformers torch accelerate
111
+ ```
112
+
113
+ ```python
114
+ import torch
115
+ from peft import PeftModel, PeftConfig
116
+ from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
117
+ ```
118
+
119
+ Sign into a HF account with access to Llama-2:
120
+
121
+ ```python
122
+ from huggingface_hub import notebook_login
123
+ notebook_login()
124
+ ```
125
+
126
+ Basic model loading:
127
+
128
+ ```python
129
+ peft_model_id = "dfurman/llama-2-13b-dolphin-peft"
130
+ config = PeftConfig.from_pretrained(peft_model_id)
131
+
132
+ tokenizer = AutoTokenizer.from_pretrained(
133
+ config.base_model_name_or_path,
134
+ use_auth_token=True
135
+ )
136
+ tokenizer.pad_token = tokenizer.eos_token
137
+ model = AutoModelForCausalLM.from_pretrained(
138
+ config.base_model_name_or_path,
139
+ torch_dtype=torch.bfloat16,
140
+ device_map="auto",
141
+ use_auth_token=True,
142
+ )
143
+
144
+ # Load the Lora model
145
+ model = PeftModel.from_pretrained(model, peft_model_id)
146
+ ```
147
+
148
+ Once loaded, the model and tokenizer can be used with the following code:
149
+
150
+ ```python
151
+ def llama_generate(
152
+ model: AutoModelForCausalLM,
153
+ tokenizer: AutoTokenizer,
154
+ prompt: str,
155
+ max_new_tokens: int = 128,
156
+ temperature: int = 1.0,
157
+ ) -> str:
158
+ """
159
+ Initialize the pipeline
160
+ Uses Hugging Face GenerationConfig defaults
161
+ https://huggingface.co/docs/transformers/v4.29.1/en/main_classes/text_generation#transformers.GenerationConfig
162
+ Args:
163
+ model (transformers.AutoModelForCausalLM): Falcon model for text generation
164
+ tokenizer (transformers.AutoTokenizer): Tokenizer for model
165
+ prompt (str): Prompt for text generation
166
+ max_new_tokens (int, optional): Max new tokens after the prompt to generate. Defaults to 128.
167
+ temperature (float, optional): The value used to modulate the next token probabilities.
168
+ Defaults to 1.0
169
+ """
170
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
171
+
172
+ inputs = tokenizer(
173
+ [prompt],
174
+ return_tensors="pt",
175
+ return_token_type_ids=False,
176
+ ).to(
177
+ device
178
+ ) # tokenize inputs, load on device
179
+
180
+ # when running Torch modules in lower precision, it is best practice to use the torch.autocast context manager.
181
+ with torch.autocast("cuda", dtype=torch.bfloat16):
182
+ response = model.generate(
183
+ **inputs,
184
+ max_new_tokens=max_new_tokens,
185
+ temperature=temperature,
186
+ return_dict_in_generate=True,
187
+ eos_token_id=tokenizer.eos_token_id,
188
+ pad_token_id=tokenizer.pad_token_id,
189
+ )
190
+
191
+ decoded_output = tokenizer.decode(
192
+ response["sequences"][0],
193
+ skip_special_tokens=True,
194
+ ) # grab output in natural language
195
+
196
+ return decoded_output[len(prompt) :] # remove prompt from output
197
+ ```
198
+
199
+ We can now generate text! For example:
200
+
201
+ ```python
202
+ prompt = "### Human: Write me a numbered list of things to do in New York City.### Assistant: "
203
+
204
+ response = llama_generate(
205
+ model,
206
+ tokenizer,
207
+ prompt,
208
+ max_new_tokens=250,
209
+ temperature=0.92,
210
+ )
211
+
212
+ print(response)
213
+ ```
214
+
215
+ ### Runtime tests
216
+
217
+
218
+ | runtime / 50 tokens (sec) | GPU | attn | torch dtype | VRAM (GB) |
219
+ |:-----------------------------:|:----------------------:|:---------------------:|:-------------:|:-----------------------:|
220
+ | 2.93 | 1x A100 (40 GB SXM) | torch | bfloat16 | 25 |
221
+ | 3.24 | 1x A6000 (48 GB) | torch | bfloat16 | 25 |
222
+
223
+ The above runtime stats were generated from this [notebook](https://github.com/daniel-furman/sft-demos/blob/main/src/sft/one_gpu/llama-2/dolphin/postprocessing-llama-2-13b-dolphin-peft.ipynb).
224
+
225
+ ## Acknowledgements
226
+
227
+ This model was finetuned by Daniel Furman on July 22, 2023 and is intended primarily for research purposes.
228
+
229
+ ## Disclaimer
230
+
231
+ The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please cosult an attorney before using this model for commercial purposes.
232
+
233
+ ## Meta citation for llama-2 blog
234
+
235
+ ```
236
+ @online{Meta2023Introducing,
237
+ author = {Meta AI},
238
+ title = {Meta and Microsoft Introduce the Next Generation of Llama},
239
+ year = {2023},
240
+ url = {https://about.fb.com/news/2023/07/llama-2/},
241
+ note = {Accessed: 2023-07-24},
242
+ urldate = {2023-07-24}
243
+ }
244
+ ```
245
+
246
  ---
247
 
248
  ### Framework versions
249
 
250
 
251
+ - PEFT 0.5.0.dev0