HCZhang commited on
Commit
bb53375
1 Parent(s): 17e64aa

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +109 -0
README.md CHANGED
@@ -109,6 +109,115 @@ _Few-shot is disabled for Jellyfish models._
109
  <|start_header_id|>assistant<|end_header_id|>
110
  ```
111
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
112
  ## Prompts
113
 
114
  We provide the prompts used for both fine-tuning and inference.
 
109
  <|start_header_id|>assistant<|end_header_id|>
110
  ```
111
 
112
+ ## Training Details
113
+
114
+ ### Training Method
115
+
116
+ We used LoRA to speed up the training process, targeting the q_proj, k_proj, v_proj, and o_proj modules.
117
+
118
+ ## Uses
119
+
120
+ To accelerate the inference, we strongly recommend running Jellyfish using [vLLM](https://github.com/vllm-project/vllm).
121
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
122
+
123
+ ### Python Script
124
+ We provide two simple Python code examples for inference using the Jellyfish model.
125
+
126
+ #### Using Transformers and Torch Modules
127
+ <div style="height: auto; max-height: 400px; overflow-y: scroll;">
128
+
129
+ ```python
130
+ from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig
131
+ import torch
132
+
133
+ if torch.cuda.is_available():
134
+ device = "cuda"
135
+ else:
136
+ device = "cpu"
137
+
138
+ # Model will be automatically downloaded from HuggingFace model hub if not cached.
139
+ # Model files will be cached in "~/.cache/huggingface/hub/models--NECOUDBFM--Jellyfish/" by default.
140
+ # You can also download the model manually and replace the model name with the path to the model files.
141
+ model = AutoModelForCausalLM.from_pretrained(
142
+ "NECOUDBFM/Jellyfish",
143
+ torch_dtype=torch.float16,
144
+ device_map="auto",
145
+ )
146
+ tokenizer = AutoTokenizer.from_pretrained("NECOUDBFM/Jellyfish")
147
+
148
+ system_message = "You are an AI assistant that follows instruction extremely well. Help as much as you can."
149
+
150
+ # You need to define the user_message variable based on the task and the data you want to test on.
151
+ user_message = "Hello, world."
152
+
153
+ prompt = f"<|start_header_id|>system<|end_header_id|>{system message}<|eot_id|>\n<|start_header_id|>user<|end_header_id|>{user_message}<|eot_id|>\n<|start_header_id|>assistant<|end_header_id|>"
154
+ inputs = tokenizer(prompt, return_tensors="pt")
155
+ input_ids = inputs["input_ids"].to(device)
156
+
157
+ # You can modify the sampling parameters according to your needs.
158
+ generation_config = GenerationConfig(
159
+ do_samples=True,
160
+ temperature=0.35,
161
+ top_p=0.9,
162
+ )
163
+
164
+ with torch.no_grad():
165
+ generation_output = model.generate(
166
+ input_ids=input_ids,
167
+ generation_config=generation_config,
168
+ return_dict_in_generate=True,
169
+ output_scores=True,
170
+ max_new_tokens=1024,
171
+ pad_token_id=tokenizer.eos_token_id,
172
+ repetition_penalty=1.15,
173
+ )
174
+
175
+ output = generation_output[0]
176
+ response = tokenizer.decode(
177
+ output[:, input_ids.shape[-1] :][0], skip_special_tokens=True
178
+ ).strip()
179
+
180
+ print(response)
181
+
182
+ ```
183
+ </div>
184
+
185
+ #### Using vLLM
186
+ <div style="height: auto; max-height: 400px; overflow-y: scroll;">
187
+
188
+ ```python
189
+ from vllm import LLM, SamplingParams
190
+
191
+ # To use vllm for inference, you need to download the model files either using HuggingFace model hub or manually.
192
+ # You should modify the path to the model according to your local environment.
193
+ path_to_model = (
194
+ "/workspace/models/Jellyfish"
195
+ )
196
+
197
+ model = LLM(model=path_to_model)
198
+
199
+ # You can modify the sampling parameters according to your needs.
200
+ # Caution: The stop parameter should not be changed.
201
+ sampling_params = SamplingParams(
202
+ temperature=0.35,
203
+ top_p=0.9,
204
+ max_tokens=1024,
205
+ stop=["<|eot_id|>"],
206
+ )
207
+
208
+ system_message = "You are an AI assistant that follows instruction extremely well. Help as much as you can."
209
+
210
+ # You need to define the user_message variable based on the task and the data you want to test on.
211
+ user_message = "Hello, world."
212
+
213
+ prompt = ff"<|start_header_id|>system<|end_header_id|>{system message}<|eot_id|>\n<|start_header_id|>user<|end_header_id|>{user_message}<|eot_id|>\n<|start_header_id|>assistant<|end_header_id|>"
214
+ outputs = model.generate(prompt, sampling_params)
215
+ response = outputs[0].outputs[0].text.strip()
216
+ print(response)
217
+
218
+ ```
219
+ </div>
220
+
221
  ## Prompts
222
 
223
  We provide the prompts used for both fine-tuning and inference.