Sandiago21 commited on
Commit
325d3f4
·
1 Parent(s): 004ee3c

update README.md with new instructions to call and run the fine-tuned model

Browse files
Files changed (1) hide show
  1. README.md +73 -4
README.md CHANGED
@@ -20,6 +20,7 @@ This repository contains a LLaMA-13B further fine-tuned model on conversations a
20
 
21
  ## Model Details
22
 
 
23
 
24
  ### Model Description
25
 
@@ -95,23 +96,91 @@ def generate_prompt(instruction: str, input_ctxt: str = None) -> str:
95
 
96
  Use the code below to get started with the model.
97
 
 
 
98
  ```python
99
  import torch
100
  from transformers import GenerationConfig, LlamaTokenizer, LlamaForCausalLM
101
 
102
- tokenizer = LlamaTokenizer.from_pretrained("Sandiago21/llama-13b-hf-prompt-answering")
 
 
 
103
  model = LlamaForCausalLM.from_pretrained(
104
- "Sandiago21/llama-13b-hf-prompt-answering",
105
  load_in_8bit=True,
106
  torch_dtype=torch.float16,
107
  device_map="auto",
108
  )
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
109
  generation_config = GenerationConfig(
110
  temperature=0.2,
111
  top_p=0.75,
112
  top_k=40,
113
  num_beams=4,
114
- max_new_tokens=128,
115
  )
116
 
117
  model.eval()
@@ -139,7 +208,7 @@ with torch.no_grad():
139
  response = tokenizer.decode(outputs.sequences[0], skip_special_tokens=True)
140
  print(response)
141
 
142
- >>> The capital city of Greece is Athens and it borders Albania, Macedonia, Bulgaria and Turkey.
143
  ```
144
 
145
  ## Training Details
 
20
 
21
  ## Model Details
22
 
23
+ Anyone can use (ask prompts) and play with the model using the pre-existing Jupyter Notebook in the **noteboooks** folder.
24
 
25
  ### Model Description
26
 
 
96
 
97
  Use the code below to get started with the model.
98
 
99
+ 1. You can git clone the repo, which contains also the artifacts for the base model for simplicity and completeness, and run the following code snippet to load the mode:
100
+
101
  ```python
102
  import torch
103
  from transformers import GenerationConfig, LlamaTokenizer, LlamaForCausalLM
104
 
105
+ MODEL_NAME = "Sandiago21/llama-7b-hf-prompt-answering"
106
+
107
+ config = PeftConfig.from_pretrained(MODEL_NAME)
108
+
109
  model = LlamaForCausalLM.from_pretrained(
110
+ config.base_model_name_or_path,
111
  load_in_8bit=True,
112
  torch_dtype=torch.float16,
113
  device_map="auto",
114
  )
115
+
116
+ tokenizer = LlamaTokenizer.from_pretrained(MODEL_NAME)
117
+
118
+ model = PeftModel.from_pretrained(model, MODEL_NAME)
119
+
120
+ generation_config = GenerationConfig(
121
+ temperature=0.2,
122
+ top_p=0.75,
123
+ top_k=40,
124
+ num_beams=4,
125
+ max_new_tokens=32,
126
+ )
127
+
128
+ model.eval()
129
+ if torch.__version__ >= "2":
130
+ model = torch.compile(model)
131
+ ```
132
+
133
+ ### Example of Usage
134
+ ```python
135
+ instruction = "What is the capital city of Greece and with which countries does Greece border?"
136
+ input_ctxt = None # For some tasks, you can provide an input context to help the model generate a better response.
137
+
138
+ prompt = generate_prompt(instruction, input_ctxt)
139
+ input_ids = tokenizer(prompt, return_tensors="pt").input_ids
140
+ input_ids = input_ids.to(model.device)
141
+
142
+ with torch.no_grad():
143
+ outputs = model.generate(
144
+ input_ids=input_ids,
145
+ generation_config=generation_config,
146
+ return_dict_in_generate=True,
147
+ output_scores=True,
148
+ )
149
+
150
+ response = tokenizer.decode(outputs.sequences[0], skip_special_tokens=True)
151
+ print(response)
152
+
153
+ >>> The capital city of Greece is Athens and it borders Turkey, Bulgaria, Macedonia, Albania, and the Aegean Sea.
154
+ ```
155
+
156
+ 2. You can also directly call the model from HuggingFace using the following code snippet:
157
+
158
+ ```python
159
+ import torch
160
+ from transformers import GenerationConfig, LlamaTokenizer, LlamaForCausalLM
161
+
162
+ MODEL_NAME = "Sandiago21/llama-7b-hf-prompt-answering"
163
+ BASE_MODEL = "decapoda-research/llama-7b-hf
164
+
165
+ config = PeftConfig.from_pretrained(MODEL_NAME)
166
+
167
+ model = LlamaForCausalLM.from_pretrained(
168
+ BASE_MODEL,
169
+ load_in_8bit=True,
170
+ torch_dtype=torch.float16,
171
+ device_map="auto",
172
+ )
173
+
174
+ tokenizer = LlamaTokenizer.from_pretrained(MODEL_NAME)
175
+
176
+ model = PeftModel.from_pretrained(model, MODEL_NAME)
177
+
178
  generation_config = GenerationConfig(
179
  temperature=0.2,
180
  top_p=0.75,
181
  top_k=40,
182
  num_beams=4,
183
+ max_new_tokens=32,
184
  )
185
 
186
  model.eval()
 
208
  response = tokenizer.decode(outputs.sequences[0], skip_special_tokens=True)
209
  print(response)
210
 
211
+ >>> The capital city of Greece is Athens and it borders Turkey, Bulgaria, Macedonia, Albania, and the Aegean Sea.
212
  ```
213
 
214
  ## Training Details