NousResearch
/

Hermes-3-Llama-3.2-3B

Model card Files Files and versions Community

hjc-puro commited on about 11 hours ago

Commit

7952630

•

1 Parent(s): 0e0cfef

Update README.md

Browse files

Files changed (1) hide show

README.md +64 -62

README.md CHANGED Viewed

@@ -59,76 +59,80 @@ Hermes 3 is competitive, if not superior, to Llama-3.1 Instruct models at genera
 ## GPT4All:
 ```
-|    Task     |Version| Metric |Value |   |Stderr|
-|-------------|------:|--------|-----:|---|-----:|
-|arc_challenge|      0|acc     |0.5529|±  |0.0145|
-|             |       |acc_norm|0.5870|±  |0.0144|
-|arc_easy     |      0|acc     |0.8371|±  |0.0076|
-|             |       |acc_norm|0.8144|±  |0.0080|
-|boolq        |      1|acc     |0.8599|±  |0.0061|
-|hellaswag    |      0|acc     |0.6133|±  |0.0049|
-|             |       |acc_norm|0.7989|±  |0.0040|
-|openbookqa   |      0|acc     |0.3940|±  |0.0219|
-|             |       |acc_norm|0.4680|±  |0.0223|
-|piqa         |      0|acc     |0.8063|±  |0.0092|
-|             |       |acc_norm|0.8156|±  |0.0090|
-|winogrande   |      0|acc     |0.7372|±  |0.0124|
 ```
-Average: 72.59
 ## AGIEval:
 ```
-|             Task             |Version| Metric |Value |   |Stderr|
-|------------------------------|------:|--------|-----:|---|-----:|
-|agieval_aqua_rat              |      0|acc     |0.2441|±  |0.0270|
-|                              |       |acc_norm|0.2441|±  |0.0270|
-|agieval_logiqa_en             |      0|acc     |0.3687|±  |0.0189|
-|                              |       |acc_norm|0.3840|±  |0.0191|
-|agieval_lsat_ar               |      0|acc     |0.2304|±  |0.0278|
-|                              |       |acc_norm|0.2174|±  |0.0273|
-|agieval_lsat_lr               |      0|acc     |0.5471|±  |0.0221|
-|                              |       |acc_norm|0.5373|±  |0.0221|
-|agieval_lsat_rc               |      0|acc     |0.6617|±  |0.0289|
-|                              |       |acc_norm|0.6357|±  |0.0294|
-|agieval_sat_en                |      0|acc     |0.7670|±  |0.0295|
-|                              |       |acc_norm|0.7379|±  |0.0307|
-|agieval_sat_en_without_passage|      0|acc     |0.4417|±  |0.0347|
-|                              |       |acc_norm|0.4223|±  |0.0345|
-|agieval_sat_math              |      0|acc     |0.4000|±  |0.0331|
-|                              |       |acc_norm|0.3455|±  |0.0321|
 ```
-Average: 44.05
 ## BigBench:
 ```
-|                      Task                      |Version|       Metric        |Value |   |Stderr|
-|------------------------------------------------|------:|---------------------|-----:|---|-----:|
-|bigbench_causal_judgement                       |      0|multiple_choice_grade|0.6000|±  |0.0356|
-|bigbench_date_understanding                     |      0|multiple_choice_grade|0.6585|±  |0.0247|
-|bigbench_disambiguation_qa                      |      0|multiple_choice_grade|0.3178|±  |0.0290|
-|bigbench_geometric_shapes                       |      0|multiple_choice_grade|0.2340|±  |0.0224|
-|                                                |       |exact_str_match      |0.0000|±  |0.0000|
-|bigbench_logical_deduction_five_objects         |      0|multiple_choice_grade|0.2980|±  |0.0205|
-|bigbench_logical_deduction_seven_objects        |      0|multiple_choice_grade|0.2057|±  |0.0153|
-|bigbench_logical_deduction_three_objects        |      0|multiple_choice_grade|0.5367|±  |0.0288|
-|bigbench_movie_recommendation                   |      0|multiple_choice_grade|0.4040|±  |0.0220|
-|bigbench_navigate                               |      0|multiple_choice_grade|0.4970|±  |0.0158|
-|bigbench_reasoning_about_colored_objects        |      0|multiple_choice_grade|0.7075|±  |0.0102|
-|bigbench_ruin_names                             |      0|multiple_choice_grade|0.4821|±  |0.0236|
-|bigbench_salient_translation_error_detection    |      0|multiple_choice_grade|0.2295|±  |0.0133|
-|bigbench_snarks                                 |      0|multiple_choice_grade|0.6906|±  |0.0345|
-|bigbench_sports_understanding                   |      0|multiple_choice_grade|0.5375|±  |0.0159|
-|bigbench_temporal_sequences                     |      0|multiple_choice_grade|0.6270|±  |0.0153|
-|bigbench_tracking_shuffled_objects_five_objects |      0|multiple_choice_grade|0.2216|±  |0.0118|
-|bigbench_tracking_shuffled_objects_seven_objects|      0|multiple_choice_grade|0.1594|±  |0.0088|
-|bigbench_tracking_shuffled_objects_three_objects|      0|multiple_choice_grade|0.5367|±  |0.0288|
 ```
-Average: 44.13
 # Prompt Format
@@ -171,7 +175,7 @@ To utilize the prompt format without a system prompt, simply leave the line out.
 ## Prompt Format for Function Calling
-# Note: This version uses USER as both the user prompt and the tool response role. This is due to a bug we experienced when training. It will require modification to the function calling code!
 Our model was trained on specific system prompts and structures for Function Calling.
@@ -200,7 +204,7 @@ The model will then generate a tool call, which your inference code must parse,
 Once you parse the tool call, call the api and get the returned values for the call, and pass it back in as a new role, `tool` like so:
 ```
-<|im_start|>user
 <tool_response>
 {"name": "get_stock_fundamentals", "content": {'symbol': 'TSLA', 'company_name': 'Tesla, Inc.', 'sector': 'Consumer Cyclical', 'industry': 'Auto Manufacturers', 'market_cap': 611384164352, 'pe_ratio': 49.604652, 'pb_ratio': 9.762013, 'dividend_yield': None, 'eps': 4.3, 'beta': 2.427, '52_week_high': 299.29, '52_week_low': 152.37}}
 </tool_response>
@@ -305,6 +309,4 @@ GGUF Quants: https://huggingface.co/NousResearch/Hermes-3-Llama-3.2-3B-GGUF
       primaryClass={cs.CL},
       url={https://arxiv.org/abs/2408.11857},
 }
-```

 ## GPT4All:
 ```
+|    Tasks    |Version|Filter|n-shot| Metric |   |Value |   |Stderr|
+|-------------|------:|------|-----:|--------|---|-----:|---|-----:|
+|arc_challenge|      1|none  |     0|acc     |↑  |0.4411|±  |0.0145|
+|             |       |none  |     0|acc_norm|↑  |0.4377|±  |0.0145|
+|arc_easy     |      1|none  |     0|acc     |↑  |0.7399|±  |0.0090|
+|             |       |none  |     0|acc_norm|↑  |0.6566|±  |0.0097|
+|boolq        |      2|none  |     0|acc     |↑  |0.8327|±  |0.0065|
+|hellaswag    |      1|none  |     0|acc     |↑  |0.5453|±  |0.0050|
+|             |       |none  |     0|acc_norm|↑  |0.7047|±  |0.0046|
+|openbookqa   |      1|none  |     0|acc     |↑  |0.3480|±  |0.0213|
+|             |       |none  |     0|acc_norm|↑  |0.4280|±  |0.0221|
+|piqa         |      1|none  |     0|acc     |↑  |0.7639|±  |0.0099|
+|             |       |none  |     0|acc_norm|↑  |0.7584|±  |0.0100|
+|winogrande   |      1|none  |     0|acc     |↑  |0.6590|±  |0.0133|
 ```
+Average: 64.00
 ## AGIEval:
 ```
+|            Tasks             |Version|Filter|n-shot| Metric |   |Value |   |Stderr|
+|------------------------------|------:|------|-----:|--------|---|-----:|---|-----:|
+|agieval_aqua_rat              |      1|none  |     0|acc     |↑  |0.2283|±  |0.0264|
+|                              |       |none  |     0|acc_norm|↑  |0.2441|±  |0.0270|
+|agieval_logiqa_en             |      1|none  |     0|acc     |↑  |0.3057|±  |0.0181|
+|                              |       |none  |     0|acc_norm|↑  |0.3272|±  |0.0184|
+|agieval_lsat_ar               |      1|none  |     0|acc     |↑  |0.2304|±  |0.0278|
+|                              |       |none  |     0|acc_norm|↑  |0.1957|±  |0.0262|
+|agieval_lsat_lr               |      1|none  |     0|acc     |↑  |0.3784|±  |0.0215|
+|                              |       |none  |     0|acc_norm|↑  |0.3588|±  |0.0213|
+|agieval_lsat_rc               |      1|none  |     0|acc     |↑  |0.4610|±  |0.0304|
+|                              |       |none  |     0|acc_norm|↑  |0.4275|±  |0.0302|
+|agieval_sat_en                |      1|none  |     0|acc     |↑  |0.6019|±  |0.0342|
+|                              |       |none  |     0|acc_norm|↑  |0.5340|±  |0.0348|
+|agieval_sat_en_without_passage|      1|none  |     0|acc     |↑  |0.3981|±  |0.0342|
+|                              |       |none  |     0|acc_norm|↑  |0.3981|±  |0.0342|
+|agieval_sat_math              |      1|none  |     0|acc     |↑  |0.2500|±  |0.0293|
+|                              |       |none  |     0|acc_norm|↑  |0.2636|±  |0.0298|
 ```
+Average: 34.36
 ## BigBench:
 ```
+|                         Tasks                         |Version|Filter|n-shot| Metric |   |Value |   |Stderr|
+|-------------------------------------------------------|------:|------|-----:|--------|---|-----:|---|-----:|
+|leaderboard_bbh_boolean_expressions                    |      1|none  |     3|acc_norm|↑  |0.7560|±  |0.0272|
+|leaderboard_bbh_causal_judgement                       |      1|none  |     3|acc_norm|↑  |0.6043|±  |0.0359|
+|leaderboard_bbh_date_understanding                     |      1|none  |     3|acc_norm|↑  |0.3280|±  |0.0298|
+|leaderboard_bbh_disambiguation_qa                      |      1|none  |     3|acc_norm|↑  |0.5880|±  |0.0312|
+|leaderboard_bbh_formal_fallacies                       |      1|none  |     3|acc_norm|↑  |0.5280|±  |0.0316|
+|leaderboard_bbh_geometric_shapes                       |      1|none  |     3|acc_norm|↑  |0.3560|±  |0.0303|
+|leaderboard_bbh_hyperbaton                             |      1|none  |     3|acc_norm|↑  |0.6280|±  |0.0306|
+|leaderboard_bbh_logical_deduction_five_objects         |      1|none  |     3|acc_norm|↑  |0.3400|±  |0.0300|
+|leaderboard_bbh_logical_deduction_seven_objects        |      1|none  |     3|acc_norm|↑  |0.2880|±  |0.0287|
+|leaderboard_bbh_logical_deduction_three_objects        |      1|none  |     3|acc_norm|↑  |0.4160|±  |0.0312|
+|leaderboard_bbh_movie_recommendation                   |      1|none  |     3|acc_norm|↑  |0.6760|±  |0.0297|
+|leaderboard_bbh_navigate                               |      1|none  |     3|acc_norm|↑  |0.5800|±  |0.0313|
+|leaderboard_bbh_object_counting                        |      1|none  |     3|acc_norm|↑  |0.3640|±  |0.0305|
+|leaderboard_bbh_penguins_in_a_table                    |      1|none  |     3|acc_norm|↑  |0.3836|±  |0.0404|
+|leaderboard_bbh_reasoning_about_colored_objects        |      1|none  |     3|acc_norm|↑  |0.3560|±  |0.0303|
+|leaderboard_bbh_ruin_names                             |      1|none  |     3|acc_norm|↑  |0.4160|±  |0.0312|
+|leaderboard_bbh_salient_translation_error_detection    |      1|none  |     3|acc_norm|↑  |0.3080|±  |0.0293|
+|leaderboard_bbh_snarks                                 |      1|none  |     3|acc_norm|↑  |0.5618|±  |0.0373|
+|leaderboard_bbh_sports_understanding                   |      1|none  |     3|acc_norm|↑  |0.6600|±  |0.0300|
+|leaderboard_bbh_temporal_sequences                     |      1|none  |     3|acc_norm|↑  |0.2320|±  |0.0268|
+|leaderboard_bbh_tracking_shuffled_objects_five_objects |      1|none  |     3|acc_norm|↑  |0.1640|±  |0.0235|
+|leaderboard_bbh_tracking_shuffled_objects_seven_objects|      1|none  |     3|acc_norm|↑  |0.1480|±  |0.0225|
+|leaderboard_bbh_tracking_shuffled_objects_three_objects|      1|none  |     3|acc_norm|↑  |0.3120|±  |0.0294|
+|leaderboard_bbh_web_of_lies                            |      1|none  |     3|acc_norm|↑  |0.5080|±  |0.0317|
 ```
+Average: 43.76
 # Prompt Format
 ## Prompt Format for Function Calling
+# Note: A previous version used USER as both the user prompt and the tool response role, but this has now been fixed. Please use USER for the user prompt role and TOOL for the tool response role.
 Our model was trained on specific system prompts and structures for Function Calling.
 Once you parse the tool call, call the api and get the returned values for the call, and pass it back in as a new role, `tool` like so:
 ```
+<|im_start|>tool
 <tool_response>
 {"name": "get_stock_fundamentals", "content": {'symbol': 'TSLA', 'company_name': 'Tesla, Inc.', 'sector': 'Consumer Cyclical', 'industry': 'Auto Manufacturers', 'market_cap': 611384164352, 'pe_ratio': 49.604652, 'pb_ratio': 9.762013, 'dividend_yield': None, 'eps': 4.3, 'beta': 2.427, '52_week_high': 299.29, '52_week_low': 152.37}}
 </tool_response>
       primaryClass={cs.CL},
       url={https://arxiv.org/abs/2408.11857},
 }
+```