Commit
•
275cc70
1
Parent(s):
fe69a25
Revert changes - uploaded to wrong repo
Browse files
README.md
CHANGED
@@ -15,9 +15,11 @@ base_model: alpindale/Mistral-7B-v0.2-hf
|
|
15 |
---
|
16 |
# Model Card: Hercules-4.0-Mistral-v0.2-7B
|
17 |
|
|
|
18 |
|
19 |
## Model Description
|
20 |
-
|
|
|
21 |
|
22 |
- Complex Instruction Following: Understanding and accurately executing multi-step instructions, even those involving specialized terminology.
|
23 |
- Function Calling: Seamlessly interpreting and executing function calls, providing appropriate input and output values.
|
@@ -33,7 +35,7 @@ AWQ: https://huggingface.co/solidrust/Hercules-4.0-Mistral-v0.2-7B-AWQ
|
|
33 |
|
34 |
## Intended Uses & Potential Bias
|
35 |
|
36 |
-
Hercules-4.0-
|
37 |
|
38 |
- Specialized Chatbots: Creating knowledgeable chatbots and conversational agents in scientific and technical fields.
|
39 |
- Instructional Assistants: Supporting users with educational and step-by-step guidance in various disciplines.
|
@@ -44,21 +46,52 @@ Hercules-4.0-Yi-34B is well-suited to the following applications:
|
|
44 |
## Limitations and Risks
|
45 |
|
46 |
- Toxicity: The dataset contains toxic or harmful examples.
|
47 |
-
- Hallucinations and Factual Errors: Like other language models, Hercules-4.0-
|
48 |
- Potential for Misuse: The ability to engage in technical conversations and execute function calls could be misused for malicious purposes.
|
49 |
|
50 |
## Training Procedure
|
51 |
|
52 |
- This model was trained on 8 kaggle TPUs, using torch xla SPMD for high MXU efficiency. There was no expense on my end (meaning you can reproduce this too!)
|
53 |
-
-
|
54 |
- No mixed precision was used, with the default dtype being bfloat16.
|
55 |
- A total batch size of 64 was used.
|
56 |
-
- Trained on
|
57 |
-
-
|
58 |
-
- This model was trained on OpenAI's ChatML prompt format. Because this model has function calling capabilities, the prompt format is slightly different, here's what it would look like: ```<|im_start|>system\n{message}<|im_end|>\n<|im_start|>user\n{user message}<|im_end|>\n<|im_start|>call\n{function call message}<|im_end|>\n<|im_start|>function\n{function response message}<|im_end|>\n<|im_start|>assistant\n{assistant message}
|
59 |
|
60 |
This model was fine-tuned using my TPU-Alignment repository. https://github.com/Locutusque/TPU-Alignment
|
61 |
|
62 |
## Evaluation
|
63 |
|
64 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
15 |
---
|
16 |
# Model Card: Hercules-4.0-Mistral-v0.2-7B
|
17 |
|
18 |
+
![image/png](https://tse3.mm.bing.net/th/id/OIG1.vnrl3xpEcypR3McLW63q?pid=ImgGn)
|
19 |
|
20 |
## Model Description
|
21 |
+
|
22 |
+
Hercules-4.0-Mistral-v0.2-7B is a fine-tuned language model derived from Mistralai/Mistral-7B-v0.2. It is specifically designed to excel in instruction following, function calls, and conversational interactions across various scientific and technical domains. The dataset used for fine-tuning, also named Hercules-v4.0, expands upon the diverse capabilities of OpenHermes-2.5 with contributions from numerous curated datasets. This fine-tuning has hercules-v4.0 with enhanced abilities in:
|
23 |
|
24 |
- Complex Instruction Following: Understanding and accurately executing multi-step instructions, even those involving specialized terminology.
|
25 |
- Function Calling: Seamlessly interpreting and executing function calls, providing appropriate input and output values.
|
|
|
35 |
|
36 |
## Intended Uses & Potential Bias
|
37 |
|
38 |
+
Hercules-4.0-Mistral-v.02-7B is well-suited to the following applications:
|
39 |
|
40 |
- Specialized Chatbots: Creating knowledgeable chatbots and conversational agents in scientific and technical fields.
|
41 |
- Instructional Assistants: Supporting users with educational and step-by-step guidance in various disciplines.
|
|
|
46 |
## Limitations and Risks
|
47 |
|
48 |
- Toxicity: The dataset contains toxic or harmful examples.
|
49 |
+
- Hallucinations and Factual Errors: Like other language models, Hercules-4.0-Mistral-v0.2-7B may generate incorrect or misleading information, especially in specialized domains where it lacks sufficient expertise.
|
50 |
- Potential for Misuse: The ability to engage in technical conversations and execute function calls could be misused for malicious purposes.
|
51 |
|
52 |
## Training Procedure
|
53 |
|
54 |
- This model was trained on 8 kaggle TPUs, using torch xla SPMD for high MXU efficiency. There was no expense on my end (meaning you can reproduce this too!)
|
55 |
+
- A learning rate of 5e-06 with the Adam optimizer. A linear scheduler was used, with an end factor of 0.1. A low learning rate was used to prevent exploding gradients.
|
56 |
- No mixed precision was used, with the default dtype being bfloat16.
|
57 |
- A total batch size of 64 was used.
|
58 |
+
- Trained on 700,000 examples of Hercules-v4.0
|
59 |
+
- No model parameters were frozen and no quantization was used.
|
60 |
+
- This model was trained on OpenAI's ChatML prompt format. Because this model has function calling capabilities, the prompt format is slightly different, here's what it would look like: ```<|im_start|>system\n{message}<|im_end|>\n<|im_start|>user\n{user message}<|im_end|>\n<|im_start|>call\n{function call message}<|im_end|>\n<|im_start|>function\n{function response message}<|im_end|>\n<|im_start|>assistant\n{assistant message}</s>```
|
61 |
|
62 |
This model was fine-tuned using my TPU-Alignment repository. https://github.com/Locutusque/TPU-Alignment
|
63 |
|
64 |
## Evaluation
|
65 |
|
66 |
+
| Tasks |Version|Filter|n-shot| Metric |Value | |Stderr|
|
67 |
+
|---------------------------------|-------|------|-----:|--------|-----:|---|-----:|
|
68 |
+
|agieval_nous |N/A |none | 0|acc_norm|0.3578|± |0.0093|
|
69 |
+
| | |none | 0|acc |0.3598|± |0.0093|
|
70 |
+
| - agieval_aqua_rat | 1|none | 0|acc |0.2362|± |0.0267|
|
71 |
+
| | |none | 0|acc_norm|0.2362|± |0.0267|
|
72 |
+
| - agieval_logiqa_en | 1|none | 0|acc |0.3026|± |0.0180|
|
73 |
+
| | |none | 0|acc_norm|0.3456|± |0.0187|
|
74 |
+
| - agieval_lsat_ar | 1|none | 0|acc |0.2565|± |0.0289|
|
75 |
+
| | |none | 0|acc_norm|0.2522|± |0.0287|
|
76 |
+
| - agieval_lsat_lr | 1|none | 0|acc |0.3588|± |0.0213|
|
77 |
+
| | |none | 0|acc_norm|0.3490|± |0.0211|
|
78 |
+
| - agieval_lsat_rc | 1|none | 0|acc |0.4981|± |0.0305|
|
79 |
+
| | |none | 0|acc_norm|0.4461|± |0.0304|
|
80 |
+
| - agieval_sat_en | 1|none | 0|acc |0.6553|± |0.0332|
|
81 |
+
| | |none | 0|acc_norm|0.6408|± |0.0335|
|
82 |
+
| - agieval_sat_en_without_passage| 1|none | 0|acc |0.3883|± |0.0340|
|
83 |
+
| | |none | 0|acc_norm|0.3738|± |0.0338|
|
84 |
+
| - agieval_sat_math | 1|none | 0|acc |0.3091|± |0.0312|
|
85 |
+
| | |none | 0|acc_norm|0.2773|± |0.0302|
|
86 |
+
|
87 |
+
| Groups |Version|Filter|n-shot| Metric |Value | |Stderr|
|
88 |
+
|------------|-------|------|-----:|--------|-----:|---|-----:|
|
89 |
+
|agieval_nous|N/A |none | 0|acc_norm|0.3578|± |0.0093|
|
90 |
+
| | |none | 0|acc |0.3598|± |0.0093|
|
91 |
+
|mmlu_flan_cot_fewshot |N/A |get-answer| 0|exact_match|0.5787|± |0.0120|
|
92 |
+
| - mmlu_flan_cot_fewshot_humanities |N/A |get-answer| 0|exact_match|0.5174|± |0.0209|
|
93 |
+
| - mmlu_flan_cot_fewshot_other |N/A |get-answer| 0|exact_match|0.6833|± |0.0249|
|
94 |
+
| - mmlu_flan_cot_fewshot_social_sciences|N/A |get-answer| 0|exact_match|0.6528|± |0.0248|
|
95 |
+
| - mmlu_flan_cot_fewshot_stem |N/A |get-answer| 0|exact_match|0.4925|± |0.0266|
|
96 |
+
|ai2_arc |N/A |none | 0|acc |0.6936|± |0.0073|
|
97 |
+
| | |none | 0|acc_norm |0.6984|± |0.0074|
|