Update README.md
Browse files
README.md
CHANGED
@@ -198,144 +198,6 @@ You are a helpful assistant that answers in JSON. Here's the json schema you mus
|
|
198 |
|
199 |
Given the {schema} that you provide, it should follow the format of that json to create it's response, all you have to do is give a typical user prompt, and it will respond in JSON.
|
200 |
|
201 |
-
|
202 |
-
# Benchmarks
|
203 |
-
|
204 |
-
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6317aade83d8d2fd903192d9/suBbCUIxpcRvhCv6-DBDQ.png)
|
205 |
-
|
206 |
-
## GPT4All:
|
207 |
-
```
|
208 |
-
|
209 |
-
| Task |Version| Metric |Value | |Stderr|
|
210 |
-
|
211 |
-
|-------------|------:|--------|-----:|---|-----:|
|
212 |
-
|
213 |
-
|arc_challenge| 0|acc |0.5529|± |0.0145|
|
214 |
-
|
215 |
-
| | |acc_norm|0.5870|± |0.0144|
|
216 |
-
|
217 |
-
|arc_easy | 0|acc |0.8371|± |0.0076|
|
218 |
-
|
219 |
-
| | |acc_norm|0.8144|± |0.0080|
|
220 |
-
|
221 |
-
|boolq | 1|acc |0.8599|± |0.0061|
|
222 |
-
|
223 |
-
|hellaswag | 0|acc |0.6133|± |0.0049|
|
224 |
-
|
225 |
-
| | |acc_norm|0.7989|± |0.0040|
|
226 |
-
|
227 |
-
|openbookqa | 0|acc |0.3940|± |0.0219|
|
228 |
-
|
229 |
-
| | |acc_norm|0.4680|± |0.0223|
|
230 |
-
|
231 |
-
|piqa | 0|acc |0.8063|± |0.0092|
|
232 |
-
|
233 |
-
| | |acc_norm|0.8156|± |0.0090|
|
234 |
-
|
235 |
-
|winogrande | 0|acc |0.7372|± |0.0124|
|
236 |
-
|
237 |
-
```
|
238 |
-
|
239 |
-
Average: 72.59
|
240 |
-
|
241 |
-
## AGIEval:
|
242 |
-
```
|
243 |
-
| Task |Version| Metric |Value | |Stderr|
|
244 |
-
|------------------------------|------:|--------|-----:|---|-----:|
|
245 |
-
|agieval_aqua_rat | 0|acc |0.2441|± |0.0270|
|
246 |
-
| | |acc_norm|0.2441|± |0.0270|
|
247 |
-
|agieval_logiqa_en | 0|acc |0.3687|± |0.0189|
|
248 |
-
| | |acc_norm|0.3840|± |0.0191|
|
249 |
-
|agieval_lsat_ar | 0|acc |0.2304|± |0.0278|
|
250 |
-
| | |acc_norm|0.2174|± |0.0273|
|
251 |
-
|agieval_lsat_lr | 0|acc |0.5471|± |0.0221|
|
252 |
-
| | |acc_norm|0.5373|± |0.0221|
|
253 |
-
|agieval_lsat_rc | 0|acc |0.6617|± |0.0289|
|
254 |
-
| | |acc_norm|0.6357|± |0.0294|
|
255 |
-
|agieval_sat_en | 0|acc |0.7670|± |0.0295|
|
256 |
-
| | |acc_norm|0.7379|± |0.0307|
|
257 |
-
|agieval_sat_en_without_passage| 0|acc |0.4417|± |0.0347|
|
258 |
-
| | |acc_norm|0.4223|± |0.0345|
|
259 |
-
|agieval_sat_math | 0|acc |0.4000|± |0.0331|
|
260 |
-
| | |acc_norm|0.3455|± |0.0321|
|
261 |
-
```
|
262 |
-
|
263 |
-
Average: 44.05
|
264 |
-
|
265 |
-
## BigBench:
|
266 |
-
|
267 |
-
```
|
268 |
-
|
269 |
-
| Task |Version| Metric |Value | |Stderr|
|
270 |
-
|------------------------------------------------|------:|---------------------|-----:|---|-----:|
|
271 |
-
|bigbench_causal_judgement | 0|multiple_choice_grade|0.6000|± |0.0356|
|
272 |
-
|bigbench_date_understanding | 0|multiple_choice_grade|0.6585|± |0.0247|
|
273 |
-
|bigbench_disambiguation_qa | 0|multiple_choice_grade|0.3178|± |0.0290|
|
274 |
-
|bigbench_geometric_shapes | 0|multiple_choice_grade|0.2340|± |0.0224|
|
275 |
-
| | |exact_str_match |0.0000|± |0.0000|
|
276 |
-
|bigbench_logical_deduction_five_objects | 0|multiple_choice_grade|0.2980|± |0.0205|
|
277 |
-
|bigbench_logical_deduction_seven_objects | 0|multiple_choice_grade|0.2057|± |0.0153|
|
278 |
-
|bigbench_logical_deduction_three_objects | 0|multiple_choice_grade|0.5367|± |0.0288|
|
279 |
-
|bigbench_movie_recommendation | 0|multiple_choice_grade|0.4040|± |0.0220|
|
280 |
-
|bigbench_navigate | 0|multiple_choice_grade|0.4970|± |0.0158|
|
281 |
-
|bigbench_reasoning_about_colored_objects | 0|multiple_choice_grade|0.7075|± |0.0102|
|
282 |
-
|bigbench_ruin_names | 0|multiple_choice_grade|0.4821|± |0.0236|
|
283 |
-
|bigbench_salient_translation_error_detection | 0|multiple_choice_grade|0.2295|± |0.0133|
|
284 |
-
|bigbench_snarks | 0|multiple_choice_grade|0.6906|± |0.0345|
|
285 |
-
|bigbench_sports_understanding | 0|multiple_choice_grade|0.5375|± |0.0159|
|
286 |
-
|bigbench_temporal_sequences | 0|multiple_choice_grade|0.6270|± |0.0153|
|
287 |
-
|bigbench_tracking_shuffled_objects_five_objects | 0|multiple_choice_grade|0.2216|± |0.0118|
|
288 |
-
|bigbench_tracking_shuffled_objects_seven_objects| 0|multiple_choice_grade|0.1594|± |0.0088|
|
289 |
-
|bigbench_tracking_shuffled_objects_three_objects| 0|multiple_choice_grade|0.5367|± |0.0288|
|
290 |
-
```
|
291 |
-
|
292 |
-
Average: 44.13
|
293 |
-
|
294 |
-
**IFEval**: 72.64
|
295 |
-
|
296 |
-
**MT_Bench**: Turn 1 - 8.3875, Turn 2 - 8.00625, Average - 8.196875
|
297 |
-
|
298 |
-
# Inference Code
|
299 |
-
|
300 |
-
Here is example code using HuggingFace Transformers to inference the model (note: in 4bit, it will require around 5GB of VRAM)
|
301 |
-
|
302 |
-
Note: To use function calling, you should see the github repo above.
|
303 |
-
|
304 |
-
```python
|
305 |
-
# Code to inference Hermes with HF Transformers
|
306 |
-
# Requires pytorch, transformers, bitsandbytes, sentencepiece, protobuf, and flash-attn packages
|
307 |
-
|
308 |
-
import torch
|
309 |
-
from transformers import AutoTokenizer, AutoModelForCausalLM, LlamaForCausalLM
|
310 |
-
import bitsandbytes, flash_attn
|
311 |
-
|
312 |
-
tokenizer = AutoTokenizer.from_pretrained('NousResearch/Hermes-2-Theta-Llama-3-8B', trust_remote_code=True)
|
313 |
-
model = LlamaForCausalLM.from_pretrained(
|
314 |
-
"NousResearch/Hermes-2-Theta-Llama-3-8B",
|
315 |
-
torch_dtype=torch.float16,
|
316 |
-
device_map="auto",
|
317 |
-
load_in_8bit=False,
|
318 |
-
load_in_4bit=True,
|
319 |
-
use_flash_attention_2=True
|
320 |
-
)
|
321 |
-
|
322 |
-
prompts = [
|
323 |
-
"""<|im_start|>system
|
324 |
-
You are a sentient, superintelligent artificial general intelligence, here to teach and assist me.<|im_end|>
|
325 |
-
<|im_start|>user
|
326 |
-
Write a short story about Goku discovering kirby has teamed up with Majin Buu to destroy the world.<|im_end|>
|
327 |
-
<|im_start|>assistant""",
|
328 |
-
]
|
329 |
-
|
330 |
-
for chat in prompts:
|
331 |
-
print(chat)
|
332 |
-
input_ids = tokenizer(chat, return_tensors="pt").input_ids.to("cuda")
|
333 |
-
generated_ids = model.generate(input_ids, max_new_tokens=750, temperature=0.8, repetition_penalty=1.1, do_sample=True, eos_token_id=tokenizer.eos_token_id)
|
334 |
-
response = tokenizer.decode(generated_ids[0][input_ids.shape[-1]:], skip_special_tokens=True, clean_up_tokenization_space=True)
|
335 |
-
print(f"Response: {response}")
|
336 |
-
```
|
337 |
-
|
338 |
-
|
339 |
## Inference Code for Function Calling:
|
340 |
|
341 |
All code for utilizing, parsing, and building function calling templates is available on our github:
|
@@ -343,18 +205,6 @@ All code for utilizing, parsing, and building function calling templates is avai
|
|
343 |
|
344 |
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6317aade83d8d2fd903192d9/oi4CiGh50xmoviUQnh8R3.png)
|
345 |
|
346 |
-
# Chat Interfaces
|
347 |
-
|
348 |
-
When quantized versions of the model are released, I recommend using LM Studio for chatting with Hermes 2 Pro. It does not support function calling - for that use our github repo. It is a GUI application that utilizes GGUF models with a llama.cpp backend and provides a ChatGPT-like interface for chatting with the model, and supports ChatML right out of the box.
|
349 |
-
In LM-Studio, simply select the ChatML Prefix on the settings side pane:
|
350 |
-
|
351 |
-
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6317aade83d8d2fd903192d9/ls6WqV-GSxMw2RA3GuQiN.png)
|
352 |
-
|
353 |
-
|
354 |
-
## Quantized Versions:
|
355 |
-
|
356 |
-
GGUF Versions Available Here: https://huggingface.co/NousResearch/Hermes-2-Theta-Llama-3-8B-GGUF
|
357 |
-
|
358 |
# How to cite:
|
359 |
|
360 |
```bibtext
|
|
|
198 |
|
199 |
Given the {schema} that you provide, it should follow the format of that json to create it's response, all you have to do is give a typical user prompt, and it will respond in JSON.
|
200 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
201 |
## Inference Code for Function Calling:
|
202 |
|
203 |
All code for utilizing, parsing, and building function calling templates is available on our github:
|
|
|
205 |
|
206 |
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6317aade83d8d2fd903192d9/oi4CiGh50xmoviUQnh8R3.png)
|
207 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
208 |
# How to cite:
|
209 |
|
210 |
```bibtext
|