Clarified quantization type
Browse filesAlso added original model card
README.md
CHANGED
@@ -18,6 +18,147 @@ language:
|
|
18 |
- en
|
19 |
---
|
20 |
|
21 |
-
This is an
|
22 |
|
23 |
-
Prompt format is likely ChatML.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
18 |
- en
|
19 |
---
|
20 |
|
21 |
+
This is an ExLlamaV2 quantized model in 3bpw of [teknium/OpenHermes-13B](https://huggingface.co/teknium/OpenHermes-13B) using the default calibration dataset.
|
22 |
|
23 |
+
Prompt format is likely ChatML.
|
24 |
+
|
25 |
+
# Original Model card
|
26 |
+
|
27 |
+
# OpenHermes-13B
|
28 |
+
|
29 |
+
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6317aade83d8d2fd903192d9/ovkrkIIUwJ9azhPtW6dAb.png)
|
30 |
+
|
31 |
+
## Model description
|
32 |
+
|
33 |
+
OpenHermes 13B is the first fine tune of the Hermes dataset that has a fully open source dataset!
|
34 |
+
|
35 |
+
OpenHermes was trained on 242,000 entries of primarily GPT-4 generated data, from open datasets across the AI landscape, including:
|
36 |
+
|
37 |
+
- GPTeacher - General Instruct, Roleplay v1, Roleplay v2, and Code Instruct Datasets, by Teknium
|
38 |
+
- WizardLM (v1, evol_instruct 70k), by WizardLM Team/nlpxucan
|
39 |
+
- Airoboros GPT-4 (v1.0), by JonDurbin
|
40 |
+
- Camel-AI's domain expert datasets, by the Camel-AI Team
|
41 |
+
- CodeAlpaca, by Sahil2801
|
42 |
+
- GPT4-LLM and Unnatural Instructions, by Microsoft
|
43 |
+
|
44 |
+
Filtering included removal of OpenAI refusals, disclaimers, and "As an AI" type examples and more
|
45 |
+
|
46 |
+
The base dataset mix the model was trained on is identical to Nous-Hermes', minus the Nous-Instruct and PDACTL datasets which were private datasets.
|
47 |
+
|
48 |
+
The WANDB Project is public and can be examined at this link: https://wandb.ai/teknium1/openhermes/runs/openhermes-v2-fullft-13b
|
49 |
+
|
50 |
+
Huge thank you to [main_horse](https://twitter.com/main_horse) for compute access and a16z for sponsoring my work, and all the dataset creators and other people who's work has contributed to this project!
|
51 |
+
|
52 |
+
## Example Outputs
|
53 |
+
|
54 |
+
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6317aade83d8d2fd903192d9/wMSeFqUSBwCNefm7s6G1-.png)
|
55 |
+
|
56 |
+
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6317aade83d8d2fd903192d9/00iVenvEOMWIO9X6EY2EZ.png)
|
57 |
+
|
58 |
+
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6317aade83d8d2fd903192d9/o7hHbCbtwMLitDy-FWDAg.png)
|
59 |
+
|
60 |
+
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6317aade83d8d2fd903192d9/-9ZT1FBSE2BJhDowoh6Gj.png)
|
61 |
+
|
62 |
+
## Benchmark Information
|
63 |
+
|
64 |
+
## Benchmark Results
|
65 |
+
|
66 |
+
GPT-4All Benchmark Set
|
67 |
+
```
|
68 |
+
| Task |Version| Metric |Value | |Stderr|
|
69 |
+
|-------------|------:|--------|-----:|---|-----:|
|
70 |
+
|arc_challenge| 0|acc |0.5009|± |0.0146|
|
71 |
+
| | |acc_norm|0.5247|± |0.0146|
|
72 |
+
|arc_easy | 0|acc |0.8127|± |0.0080|
|
73 |
+
| | |acc_norm|0.7854|± |0.0084|
|
74 |
+
|boolq | 1|acc |0.8153|± |0.0068|
|
75 |
+
|hellaswag | 0|acc |0.6126|± |0.0049|
|
76 |
+
| | |acc_norm|0.7995|± |0.0040|
|
77 |
+
|openbookqa | 0|acc |0.3660|± |0.0216|
|
78 |
+
| | |acc_norm|0.4600|± |0.0223|
|
79 |
+
|piqa | 0|acc |0.7922|± |0.0095|
|
80 |
+
| | |acc_norm|0.8112|± |0.0091|
|
81 |
+
|winogrande | 0|acc |0.7293|± |0.0125|
|
82 |
+
Average: 0.7036
|
83 |
+
```
|
84 |
+
|
85 |
+
AGI-Eval
|
86 |
+
```
|
87 |
+
| Task |Version| Metric |Value | |Stderr|
|
88 |
+
|------------------------------|------:|--------|-----:|---|-----:|
|
89 |
+
|agieval_aqua_rat | 0|acc |0.2008|± |0.0252|
|
90 |
+
| | |acc_norm|0.2126|± |0.0257|
|
91 |
+
|agieval_logiqa_en | 0|acc |0.3410|± |0.0186|
|
92 |
+
| | |acc_norm|0.3564|± |0.0188|
|
93 |
+
|agieval_lsat_ar | 0|acc |0.2261|± |0.0276|
|
94 |
+
| | |acc_norm|0.2174|± |0.0273|
|
95 |
+
|agieval_lsat_lr | 0|acc |0.3725|± |0.0214|
|
96 |
+
| | |acc_norm|0.3373|± |0.0210|
|
97 |
+
|agieval_lsat_rc | 0|acc |0.4684|± |0.0305|
|
98 |
+
| | |acc_norm|0.4572|± |0.0304|
|
99 |
+
|agieval_sat_en | 0|acc |0.6553|± |0.0332|
|
100 |
+
| | |acc_norm|0.5971|± |0.0343|
|
101 |
+
|agieval_sat_en_without_passage| 0|acc |0.4515|± |0.0348|
|
102 |
+
| | |acc_norm|0.4029|± |0.0343|
|
103 |
+
|agieval_sat_math | 0|acc |0.3273|± |0.0317|
|
104 |
+
| | |acc_norm|0.2636|± |0.0298|
|
105 |
+
Average: 0.3556
|
106 |
+
```
|
107 |
+
BigBench Reasoning Test
|
108 |
+
```
|
109 |
+
| Task |Version| Metric |Value | |Stderr|
|
110 |
+
|------------------------------------------------|------:|---------------------|-----:|---|-----:|
|
111 |
+
|bigbench_causal_judgement | 0|multiple_choice_grade|0.5368|± |0.0363|
|
112 |
+
|bigbench_date_understanding | 0|multiple_choice_grade|0.7127|± |0.0236|
|
113 |
+
|bigbench_disambiguation_qa | 0|multiple_choice_grade|0.3023|± |0.0286|
|
114 |
+
|bigbench_geometric_shapes | 0|multiple_choice_grade|0.1003|± |0.0159|
|
115 |
+
| | |exact_str_match |0.0000|± |0.0000|
|
116 |
+
|bigbench_logical_deduction_five_objects | 0|multiple_choice_grade|0.2720|± |0.0199|
|
117 |
+
|bigbench_logical_deduction_seven_objects | 0|multiple_choice_grade|0.1986|± |0.0151|
|
118 |
+
|bigbench_logical_deduction_three_objects | 0|multiple_choice_grade|0.4500|± |0.0288|
|
119 |
+
|bigbench_movie_recommendation | 0|multiple_choice_grade|0.2880|± |0.0203|
|
120 |
+
|bigbench_navigate | 0|multiple_choice_grade|0.5000|± |0.0158|
|
121 |
+
|bigbench_reasoning_about_colored_objects | 0|multiple_choice_grade|0.5390|± |0.0111|
|
122 |
+
|bigbench_ruin_names | 0|multiple_choice_grade|0.3906|± |0.0231|
|
123 |
+
|bigbench_salient_translation_error_detection | 0|multiple_choice_grade|0.1844|± |0.0123|
|
124 |
+
|bigbench_snarks | 0|multiple_choice_grade|0.5249|± |0.0372|
|
125 |
+
|bigbench_sports_understanding | 0|multiple_choice_grade|0.5335|± |0.0159|
|
126 |
+
|bigbench_temporal_sequences | 0|multiple_choice_grade|0.2980|± |0.0145|
|
127 |
+
|bigbench_tracking_shuffled_objects_five_objects | 0|multiple_choice_grade|0.2048|± |0.0114|
|
128 |
+
|bigbench_tracking_shuffled_objects_seven_objects| 0|multiple_choice_grade|0.1297|± |0.0080|
|
129 |
+
|bigbench_tracking_shuffled_objects_three_objects| 0|multiple_choice_grade|0.4500|± |0.0288|
|
130 |
+
Average: 36.75
|
131 |
+
```
|
132 |
+
|
133 |
+
This is a slight improvement on GPT4ALL Suite and BigBench Suite, with a degredation in AGIEval compared to the original hermes.
|
134 |
+
|
135 |
+
Average Score Comparison between Nous-Hermes Llama-2 and OpenHermes Llama-2:
|
136 |
+
```
|
137 |
+
| Bench | Nous-Hermes | OpenHermes | Change |
|
138 |
+
|------------------------------|------------:|------------|--------|
|
139 |
+
|GPT4All | 70.00| 70.36| +0.36|
|
140 |
+
|------------------------------------------------------------------|
|
141 |
+
|BigBench | 36.57| 36.75| +0.18|
|
142 |
+
|------------------------------------------------------------------|
|
143 |
+
|AGI Eval | 37.20| 35.56| -1.64|
|
144 |
+
```
|
145 |
+
|
146 |
+
## Training procedure
|
147 |
+
|
148 |
+
|
149 |
+
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6317aade83d8d2fd903192d9/pqQ6MrMVy80hHEKSfqIX2.png)
|
150 |
+
|
151 |
+
### Training hyperparameters
|
152 |
+
|
153 |
+
The following hyperparameters were used during training:
|
154 |
+
- learning_rate: 2e-05
|
155 |
+
- train_batch_size: 2
|
156 |
+
- seed: 42
|
157 |
+
- distributed_type: multi-GPU
|
158 |
+
- num_devices: 8
|
159 |
+
- gradient_accumulation_steps: 8
|
160 |
+
- total_train_batch_size: 128
|
161 |
+
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
162 |
+
- lr_scheduler_type: cosine
|
163 |
+
- lr_scheduler_warmup_steps: 300
|
164 |
+
- num_epochs: 3
|