mpasila commited on
Commit
d99b5c1
1 Parent(s): 287b916

Clarified quantization type

Browse files

Also added original model card

Files changed (1) hide show
  1. README.md +143 -2
README.md CHANGED
@@ -18,6 +18,147 @@ language:
18
  - en
19
  ---
20
 
21
- This is an EXL2 quantized model in 3bpw of [teknium/OpenHermes-13B](https://huggingface.co/teknium/OpenHermes-13B) using the default calibration dataset.
22
 
23
- Prompt format is likely ChatML.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
  - en
19
  ---
20
 
21
+ This is an ExLlamaV2 quantized model in 3bpw of [teknium/OpenHermes-13B](https://huggingface.co/teknium/OpenHermes-13B) using the default calibration dataset.
22
 
23
+ Prompt format is likely ChatML.
24
+
25
+ # Original Model card
26
+
27
+ # OpenHermes-13B
28
+
29
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6317aade83d8d2fd903192d9/ovkrkIIUwJ9azhPtW6dAb.png)
30
+
31
+ ## Model description
32
+
33
+ OpenHermes 13B is the first fine tune of the Hermes dataset that has a fully open source dataset!
34
+
35
+ OpenHermes was trained on 242,000 entries of primarily GPT-4 generated data, from open datasets across the AI landscape, including:
36
+
37
+ - GPTeacher - General Instruct, Roleplay v1, Roleplay v2, and Code Instruct Datasets, by Teknium
38
+ - WizardLM (v1, evol_instruct 70k), by WizardLM Team/nlpxucan
39
+ - Airoboros GPT-4 (v1.0), by JonDurbin
40
+ - Camel-AI's domain expert datasets, by the Camel-AI Team
41
+ - CodeAlpaca, by Sahil2801
42
+ - GPT4-LLM and Unnatural Instructions, by Microsoft
43
+
44
+ Filtering included removal of OpenAI refusals, disclaimers, and "As an AI" type examples and more
45
+
46
+ The base dataset mix the model was trained on is identical to Nous-Hermes', minus the Nous-Instruct and PDACTL datasets which were private datasets.
47
+
48
+ The WANDB Project is public and can be examined at this link: https://wandb.ai/teknium1/openhermes/runs/openhermes-v2-fullft-13b
49
+
50
+ Huge thank you to [main_horse](https://twitter.com/main_horse) for compute access and a16z for sponsoring my work, and all the dataset creators and other people who's work has contributed to this project!
51
+
52
+ ## Example Outputs
53
+
54
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6317aade83d8d2fd903192d9/wMSeFqUSBwCNefm7s6G1-.png)
55
+
56
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6317aade83d8d2fd903192d9/00iVenvEOMWIO9X6EY2EZ.png)
57
+
58
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6317aade83d8d2fd903192d9/o7hHbCbtwMLitDy-FWDAg.png)
59
+
60
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6317aade83d8d2fd903192d9/-9ZT1FBSE2BJhDowoh6Gj.png)
61
+
62
+ ## Benchmark Information
63
+
64
+ ## Benchmark Results
65
+
66
+ GPT-4All Benchmark Set
67
+ ```
68
+ | Task |Version| Metric |Value | |Stderr|
69
+ |-------------|------:|--------|-----:|---|-----:|
70
+ |arc_challenge| 0|acc |0.5009|± |0.0146|
71
+ | | |acc_norm|0.5247|± |0.0146|
72
+ |arc_easy | 0|acc |0.8127|± |0.0080|
73
+ | | |acc_norm|0.7854|± |0.0084|
74
+ |boolq | 1|acc |0.8153|± |0.0068|
75
+ |hellaswag | 0|acc |0.6126|± |0.0049|
76
+ | | |acc_norm|0.7995|± |0.0040|
77
+ |openbookqa | 0|acc |0.3660|± |0.0216|
78
+ | | |acc_norm|0.4600|± |0.0223|
79
+ |piqa | 0|acc |0.7922|± |0.0095|
80
+ | | |acc_norm|0.8112|± |0.0091|
81
+ |winogrande | 0|acc |0.7293|± |0.0125|
82
+ Average: 0.7036
83
+ ```
84
+
85
+ AGI-Eval
86
+ ```
87
+ | Task |Version| Metric |Value | |Stderr|
88
+ |------------------------------|------:|--------|-----:|---|-----:|
89
+ |agieval_aqua_rat | 0|acc |0.2008|± |0.0252|
90
+ | | |acc_norm|0.2126|± |0.0257|
91
+ |agieval_logiqa_en | 0|acc |0.3410|± |0.0186|
92
+ | | |acc_norm|0.3564|± |0.0188|
93
+ |agieval_lsat_ar | 0|acc |0.2261|± |0.0276|
94
+ | | |acc_norm|0.2174|± |0.0273|
95
+ |agieval_lsat_lr | 0|acc |0.3725|± |0.0214|
96
+ | | |acc_norm|0.3373|± |0.0210|
97
+ |agieval_lsat_rc | 0|acc |0.4684|± |0.0305|
98
+ | | |acc_norm|0.4572|± |0.0304|
99
+ |agieval_sat_en | 0|acc |0.6553|± |0.0332|
100
+ | | |acc_norm|0.5971|± |0.0343|
101
+ |agieval_sat_en_without_passage| 0|acc |0.4515|± |0.0348|
102
+ | | |acc_norm|0.4029|± |0.0343|
103
+ |agieval_sat_math | 0|acc |0.3273|± |0.0317|
104
+ | | |acc_norm|0.2636|± |0.0298|
105
+ Average: 0.3556
106
+ ```
107
+ BigBench Reasoning Test
108
+ ```
109
+ | Task |Version| Metric |Value | |Stderr|
110
+ |------------------------------------------------|------:|---------------------|-----:|---|-----:|
111
+ |bigbench_causal_judgement | 0|multiple_choice_grade|0.5368|± |0.0363|
112
+ |bigbench_date_understanding | 0|multiple_choice_grade|0.7127|± |0.0236|
113
+ |bigbench_disambiguation_qa | 0|multiple_choice_grade|0.3023|± |0.0286|
114
+ |bigbench_geometric_shapes | 0|multiple_choice_grade|0.1003|± |0.0159|
115
+ | | |exact_str_match |0.0000|± |0.0000|
116
+ |bigbench_logical_deduction_five_objects | 0|multiple_choice_grade|0.2720|± |0.0199|
117
+ |bigbench_logical_deduction_seven_objects | 0|multiple_choice_grade|0.1986|± |0.0151|
118
+ |bigbench_logical_deduction_three_objects | 0|multiple_choice_grade|0.4500|± |0.0288|
119
+ |bigbench_movie_recommendation | 0|multiple_choice_grade|0.2880|± |0.0203|
120
+ |bigbench_navigate | 0|multiple_choice_grade|0.5000|± |0.0158|
121
+ |bigbench_reasoning_about_colored_objects | 0|multiple_choice_grade|0.5390|± |0.0111|
122
+ |bigbench_ruin_names | 0|multiple_choice_grade|0.3906|± |0.0231|
123
+ |bigbench_salient_translation_error_detection | 0|multiple_choice_grade|0.1844|± |0.0123|
124
+ |bigbench_snarks | 0|multiple_choice_grade|0.5249|± |0.0372|
125
+ |bigbench_sports_understanding | 0|multiple_choice_grade|0.5335|± |0.0159|
126
+ |bigbench_temporal_sequences | 0|multiple_choice_grade|0.2980|± |0.0145|
127
+ |bigbench_tracking_shuffled_objects_five_objects | 0|multiple_choice_grade|0.2048|± |0.0114|
128
+ |bigbench_tracking_shuffled_objects_seven_objects| 0|multiple_choice_grade|0.1297|± |0.0080|
129
+ |bigbench_tracking_shuffled_objects_three_objects| 0|multiple_choice_grade|0.4500|± |0.0288|
130
+ Average: 36.75
131
+ ```
132
+
133
+ This is a slight improvement on GPT4ALL Suite and BigBench Suite, with a degredation in AGIEval compared to the original hermes.
134
+
135
+ Average Score Comparison between Nous-Hermes Llama-2 and OpenHermes Llama-2:
136
+ ```
137
+ | Bench | Nous-Hermes | OpenHermes | Change |
138
+ |------------------------------|------------:|------------|--------|
139
+ |GPT4All | 70.00| 70.36| +0.36|
140
+ |------------------------------------------------------------------|
141
+ |BigBench | 36.57| 36.75| +0.18|
142
+ |------------------------------------------------------------------|
143
+ |AGI Eval | 37.20| 35.56| -1.64|
144
+ ```
145
+
146
+ ## Training procedure
147
+
148
+
149
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6317aade83d8d2fd903192d9/pqQ6MrMVy80hHEKSfqIX2.png)
150
+
151
+ ### Training hyperparameters
152
+
153
+ The following hyperparameters were used during training:
154
+ - learning_rate: 2e-05
155
+ - train_batch_size: 2
156
+ - seed: 42
157
+ - distributed_type: multi-GPU
158
+ - num_devices: 8
159
+ - gradient_accumulation_steps: 8
160
+ - total_train_batch_size: 128
161
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
162
+ - lr_scheduler_type: cosine
163
+ - lr_scheduler_warmup_steps: 300
164
+ - num_epochs: 3