Update README.md
Browse files
README.md
CHANGED
@@ -59,76 +59,80 @@ Hermes 3 is competitive, if not superior, to Llama-3.1 Instruct models at genera
|
|
59 |
|
60 |
## GPT4All:
|
61 |
```
|
62 |
-
|
|
63 |
-
|
64 |
-
|arc_challenge|
|
65 |
-
|
|
66 |
-
|arc_easy
|
67 |
-
|
|
68 |
-
|boolq
|
69 |
-
|hellaswag
|
70 |
-
|
|
71 |
-
|openbookqa
|
72 |
-
|
|
73 |
-
|piqa
|
74 |
-
|
|
75 |
-
|winogrande
|
76 |
```
|
77 |
|
78 |
-
Average:
|
79 |
|
80 |
## AGIEval:
|
81 |
```
|
82 |
-
|
|
83 |
-
|
84 |
-
|agieval_aqua_rat
|
85 |
-
|
|
86 |
-
|agieval_logiqa_en
|
87 |
-
|
|
88 |
-
|agieval_lsat_ar
|
89 |
-
|
|
90 |
-
|agieval_lsat_lr
|
91 |
-
|
|
92 |
-
|agieval_lsat_rc
|
93 |
-
|
|
94 |
-
|agieval_sat_en
|
95 |
-
|
|
96 |
-
|agieval_sat_en_without_passage|
|
97 |
-
|
|
98 |
-
|agieval_sat_math
|
99 |
-
|
|
100 |
```
|
101 |
|
102 |
-
Average:
|
103 |
|
104 |
## BigBench:
|
105 |
|
106 |
```
|
107 |
-
|
108 |
-
|
109 |
-
|
110 |
-
|
|
111 |
-
|
|
112 |
-
|
|
113 |
-
|
|
114 |
-
|
|
115 |
-
|
|
116 |
-
|
|
117 |
-
|
|
118 |
-
|
|
119 |
-
|
|
120 |
-
|
|
121 |
-
|
|
122 |
-
|
|
123 |
-
|
|
124 |
-
|
|
125 |
-
|
|
126 |
-
|
|
127 |
-
|
|
128 |
-
|
|
|
|
|
|
|
|
|
|
129 |
```
|
130 |
|
131 |
-
Average:
|
132 |
|
133 |
|
134 |
# Prompt Format
|
@@ -171,7 +175,7 @@ To utilize the prompt format without a system prompt, simply leave the line out.
|
|
171 |
|
172 |
## Prompt Format for Function Calling
|
173 |
|
174 |
-
# Note:
|
175 |
|
176 |
Our model was trained on specific system prompts and structures for Function Calling.
|
177 |
|
@@ -200,7 +204,7 @@ The model will then generate a tool call, which your inference code must parse,
|
|
200 |
|
201 |
Once you parse the tool call, call the api and get the returned values for the call, and pass it back in as a new role, `tool` like so:
|
202 |
```
|
203 |
-
<|im_start|>
|
204 |
<tool_response>
|
205 |
{"name": "get_stock_fundamentals", "content": {'symbol': 'TSLA', 'company_name': 'Tesla, Inc.', 'sector': 'Consumer Cyclical', 'industry': 'Auto Manufacturers', 'market_cap': 611384164352, 'pe_ratio': 49.604652, 'pb_ratio': 9.762013, 'dividend_yield': None, 'eps': 4.3, 'beta': 2.427, '52_week_high': 299.29, '52_week_low': 152.37}}
|
206 |
</tool_response>
|
@@ -305,6 +309,4 @@ GGUF Quants: https://huggingface.co/NousResearch/Hermes-3-Llama-3.2-3B-GGUF
|
|
305 |
primaryClass={cs.CL},
|
306 |
url={https://arxiv.org/abs/2408.11857},
|
307 |
}
|
308 |
-
```
|
309 |
-
|
310 |
-
|
|
|
59 |
|
60 |
## GPT4All:
|
61 |
```
|
62 |
+
| Tasks |Version|Filter|n-shot| Metric | |Value | |Stderr|
|
63 |
+
|-------------|------:|------|-----:|--------|---|-----:|---|-----:|
|
64 |
+
|arc_challenge| 1|none | 0|acc |↑ |0.4411|± |0.0145|
|
65 |
+
| | |none | 0|acc_norm|↑ |0.4377|± |0.0145|
|
66 |
+
|arc_easy | 1|none | 0|acc |↑ |0.7399|± |0.0090|
|
67 |
+
| | |none | 0|acc_norm|↑ |0.6566|± |0.0097|
|
68 |
+
|boolq | 2|none | 0|acc |↑ |0.8327|± |0.0065|
|
69 |
+
|hellaswag | 1|none | 0|acc |↑ |0.5453|± |0.0050|
|
70 |
+
| | |none | 0|acc_norm|↑ |0.7047|± |0.0046|
|
71 |
+
|openbookqa | 1|none | 0|acc |↑ |0.3480|± |0.0213|
|
72 |
+
| | |none | 0|acc_norm|↑ |0.4280|± |0.0221|
|
73 |
+
|piqa | 1|none | 0|acc |↑ |0.7639|± |0.0099|
|
74 |
+
| | |none | 0|acc_norm|↑ |0.7584|± |0.0100|
|
75 |
+
|winogrande | 1|none | 0|acc |↑ |0.6590|± |0.0133|
|
76 |
```
|
77 |
|
78 |
+
Average: 64.00
|
79 |
|
80 |
## AGIEval:
|
81 |
```
|
82 |
+
| Tasks |Version|Filter|n-shot| Metric | |Value | |Stderr|
|
83 |
+
|------------------------------|------:|------|-----:|--------|---|-----:|---|-----:|
|
84 |
+
|agieval_aqua_rat | 1|none | 0|acc |↑ |0.2283|± |0.0264|
|
85 |
+
| | |none | 0|acc_norm|↑ |0.2441|± |0.0270|
|
86 |
+
|agieval_logiqa_en | 1|none | 0|acc |↑ |0.3057|± |0.0181|
|
87 |
+
| | |none | 0|acc_norm|↑ |0.3272|± |0.0184|
|
88 |
+
|agieval_lsat_ar | 1|none | 0|acc |↑ |0.2304|± |0.0278|
|
89 |
+
| | |none | 0|acc_norm|↑ |0.1957|± |0.0262|
|
90 |
+
|agieval_lsat_lr | 1|none | 0|acc |↑ |0.3784|± |0.0215|
|
91 |
+
| | |none | 0|acc_norm|↑ |0.3588|± |0.0213|
|
92 |
+
|agieval_lsat_rc | 1|none | 0|acc |↑ |0.4610|± |0.0304|
|
93 |
+
| | |none | 0|acc_norm|↑ |0.4275|± |0.0302|
|
94 |
+
|agieval_sat_en | 1|none | 0|acc |↑ |0.6019|± |0.0342|
|
95 |
+
| | |none | 0|acc_norm|↑ |0.5340|± |0.0348|
|
96 |
+
|agieval_sat_en_without_passage| 1|none | 0|acc |↑ |0.3981|± |0.0342|
|
97 |
+
| | |none | 0|acc_norm|↑ |0.3981|± |0.0342|
|
98 |
+
|agieval_sat_math | 1|none | 0|acc |↑ |0.2500|± |0.0293|
|
99 |
+
| | |none | 0|acc_norm|↑ |0.2636|± |0.0298|
|
100 |
```
|
101 |
|
102 |
+
Average: 34.36
|
103 |
|
104 |
## BigBench:
|
105 |
|
106 |
```
|
107 |
+
| Tasks |Version|Filter|n-shot| Metric | |Value | |Stderr|
|
108 |
+
|-------------------------------------------------------|------:|------|-----:|--------|---|-----:|---|-----:|
|
109 |
+
|leaderboard_bbh_boolean_expressions | 1|none | 3|acc_norm|↑ |0.7560|± |0.0272|
|
110 |
+
|leaderboard_bbh_causal_judgement | 1|none | 3|acc_norm|↑ |0.6043|± |0.0359|
|
111 |
+
|leaderboard_bbh_date_understanding | 1|none | 3|acc_norm|↑ |0.3280|± |0.0298|
|
112 |
+
|leaderboard_bbh_disambiguation_qa | 1|none | 3|acc_norm|↑ |0.5880|± |0.0312|
|
113 |
+
|leaderboard_bbh_formal_fallacies | 1|none | 3|acc_norm|↑ |0.5280|± |0.0316|
|
114 |
+
|leaderboard_bbh_geometric_shapes | 1|none | 3|acc_norm|↑ |0.3560|± |0.0303|
|
115 |
+
|leaderboard_bbh_hyperbaton | 1|none | 3|acc_norm|↑ |0.6280|± |0.0306|
|
116 |
+
|leaderboard_bbh_logical_deduction_five_objects | 1|none | 3|acc_norm|↑ |0.3400|± |0.0300|
|
117 |
+
|leaderboard_bbh_logical_deduction_seven_objects | 1|none | 3|acc_norm|↑ |0.2880|± |0.0287|
|
118 |
+
|leaderboard_bbh_logical_deduction_three_objects | 1|none | 3|acc_norm|↑ |0.4160|± |0.0312|
|
119 |
+
|leaderboard_bbh_movie_recommendation | 1|none | 3|acc_norm|↑ |0.6760|± |0.0297|
|
120 |
+
|leaderboard_bbh_navigate | 1|none | 3|acc_norm|↑ |0.5800|± |0.0313|
|
121 |
+
|leaderboard_bbh_object_counting | 1|none | 3|acc_norm|↑ |0.3640|± |0.0305|
|
122 |
+
|leaderboard_bbh_penguins_in_a_table | 1|none | 3|acc_norm|↑ |0.3836|± |0.0404|
|
123 |
+
|leaderboard_bbh_reasoning_about_colored_objects | 1|none | 3|acc_norm|↑ |0.3560|± |0.0303|
|
124 |
+
|leaderboard_bbh_ruin_names | 1|none | 3|acc_norm|↑ |0.4160|± |0.0312|
|
125 |
+
|leaderboard_bbh_salient_translation_error_detection | 1|none | 3|acc_norm|↑ |0.3080|± |0.0293|
|
126 |
+
|leaderboard_bbh_snarks | 1|none | 3|acc_norm|↑ |0.5618|± |0.0373|
|
127 |
+
|leaderboard_bbh_sports_understanding | 1|none | 3|acc_norm|↑ |0.6600|± |0.0300|
|
128 |
+
|leaderboard_bbh_temporal_sequences | 1|none | 3|acc_norm|↑ |0.2320|± |0.0268|
|
129 |
+
|leaderboard_bbh_tracking_shuffled_objects_five_objects | 1|none | 3|acc_norm|↑ |0.1640|± |0.0235|
|
130 |
+
|leaderboard_bbh_tracking_shuffled_objects_seven_objects| 1|none | 3|acc_norm|↑ |0.1480|± |0.0225|
|
131 |
+
|leaderboard_bbh_tracking_shuffled_objects_three_objects| 1|none | 3|acc_norm|↑ |0.3120|± |0.0294|
|
132 |
+
|leaderboard_bbh_web_of_lies | 1|none | 3|acc_norm|↑ |0.5080|± |0.0317|
|
133 |
```
|
134 |
|
135 |
+
Average: 43.76
|
136 |
|
137 |
|
138 |
# Prompt Format
|
|
|
175 |
|
176 |
## Prompt Format for Function Calling
|
177 |
|
178 |
+
# Note: A previous version used USER as both the user prompt and the tool response role, but this has now been fixed. Please use USER for the user prompt role and TOOL for the tool response role.
|
179 |
|
180 |
Our model was trained on specific system prompts and structures for Function Calling.
|
181 |
|
|
|
204 |
|
205 |
Once you parse the tool call, call the api and get the returned values for the call, and pass it back in as a new role, `tool` like so:
|
206 |
```
|
207 |
+
<|im_start|>tool
|
208 |
<tool_response>
|
209 |
{"name": "get_stock_fundamentals", "content": {'symbol': 'TSLA', 'company_name': 'Tesla, Inc.', 'sector': 'Consumer Cyclical', 'industry': 'Auto Manufacturers', 'market_cap': 611384164352, 'pe_ratio': 49.604652, 'pb_ratio': 9.762013, 'dividend_yield': None, 'eps': 4.3, 'beta': 2.427, '52_week_high': 299.29, '52_week_low': 152.37}}
|
210 |
</tool_response>
|
|
|
309 |
primaryClass={cs.CL},
|
310 |
url={https://arxiv.org/abs/2408.11857},
|
311 |
}
|
312 |
+
```
|
|
|
|