koesn commited on
Commit
1141dc8
1 Parent(s): e7a2d63

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +254 -0
README.md CHANGED
@@ -1,3 +1,257 @@
1
  ---
2
  license: apache-2.0
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
  ---
4
+ # NeuralHermes-2.5-Mistral-7B
5
+
6
+ ## Description
7
+ This repo contains GGUF format model files for NeuralHermes-2.5-Mistral-7B.
8
+
9
+ ## Files Provided
10
+ | Name | Quant | Bits | File Size | Remark |
11
+ | ---------------------------- | ------- | ---- | --------- | -------------------------------- |
12
+ | neuralhermes-2.5-mistral-7b.IQ3_S.gguf | IQ3_S | 3 | 3.18 GB | 3.44 bpw quantization |
13
+ | neuralhermes-2.5-mistral-7b.IQ3_M.gguf | IQ3_M | 3 | 3.28 GB | 3.66 bpw quantization mix |
14
+ | neuralhermes-2.5-mistral-7b.Q4_0.gguf | Q4_0 | 4 | 4.11 GB | 3.56G, +0.2166 ppl |
15
+ | neuralhermes-2.5-mistral-7b.IQ4_NL.gguf | IQ4_NL | 4 | 4.16 GB | 4.25 bpw non-linear quantization |
16
+ | neuralhermes-2.5-mistral-7b.Q4_K_M.gguf | Q4_K_M | 4 | 4.37 GB | 3.80G, +0.0532 ppl |
17
+ | neuralhermes-2.5-mistral-7b.Q5_K_M.gguf | Q5_K_M | 5 | 5.13 GB | 4.45G, +0.0122 ppl |
18
+ | neuralhermes-2.5-mistral-7b.Q6_K.gguf | Q6_K | 6 | 5.94 GB | 5.15G, +0.0008 ppl |
19
+ | neuralhermes-2.5-mistral-7b.Q8_0.gguf | Q8_0 | 8 | 7.70 GB | 6.70G, +0.0004 ppl |
20
+
21
+ ## Parameters
22
+ | path | type | architecture | rope_theta | sliding_win | max_pos_embed |
23
+ | ---------------------------- | ------- | ------------------ | ---------- | ----------- | ------------- |
24
+ | teknium/OpenHermes-2.5-Mistral-7B | mistral | MistralForCausalLM | 10000 | 4096 | 32768 |
25
+
26
+ ## Benchmarks
27
+ ![](https://i.ibb.co/N2kwGJY/Neural-Hermes-2-5-Mistral-7-B.png)
28
+
29
+ ## Specific Purpose Notes
30
+ # Original Model Card
31
+
32
+ ---
33
+ language:
34
+ - en
35
+ license: apache-2.0
36
+ tags:
37
+ - mistral
38
+ - instruct
39
+ - finetune
40
+ - chatml
41
+ - gpt4
42
+ - synthetic data
43
+ - distillation
44
+ - dpo
45
+ - rlhf
46
+ datasets:
47
+ - mlabonne/chatml_dpo_pairs
48
+ base_model: teknium/OpenHermes-2.5-Mistral-7B
49
+ model-index:
50
+ - name: NeuralHermes-2.5-Mistral-7B
51
+ results:
52
+ - task:
53
+ type: text-generation
54
+ name: Text Generation
55
+ dataset:
56
+ name: AI2 Reasoning Challenge (25-Shot)
57
+ type: ai2_arc
58
+ config: ARC-Challenge
59
+ split: test
60
+ args:
61
+ num_few_shot: 25
62
+ metrics:
63
+ - type: acc_norm
64
+ value: 66.55
65
+ name: normalized accuracy
66
+ source:
67
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mlabonne/NeuralHermes-2.5-Mistral-7B
68
+ name: Open LLM Leaderboard
69
+ - task:
70
+ type: text-generation
71
+ name: Text Generation
72
+ dataset:
73
+ name: HellaSwag (10-Shot)
74
+ type: hellaswag
75
+ split: validation
76
+ args:
77
+ num_few_shot: 10
78
+ metrics:
79
+ - type: acc_norm
80
+ value: 84.9
81
+ name: normalized accuracy
82
+ source:
83
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mlabonne/NeuralHermes-2.5-Mistral-7B
84
+ name: Open LLM Leaderboard
85
+ - task:
86
+ type: text-generation
87
+ name: Text Generation
88
+ dataset:
89
+ name: MMLU (5-Shot)
90
+ type: cais/mmlu
91
+ config: all
92
+ split: test
93
+ args:
94
+ num_few_shot: 5
95
+ metrics:
96
+ - type: acc
97
+ value: 63.32
98
+ name: accuracy
99
+ source:
100
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mlabonne/NeuralHermes-2.5-Mistral-7B
101
+ name: Open LLM Leaderboard
102
+ - task:
103
+ type: text-generation
104
+ name: Text Generation
105
+ dataset:
106
+ name: TruthfulQA (0-shot)
107
+ type: truthful_qa
108
+ config: multiple_choice
109
+ split: validation
110
+ args:
111
+ num_few_shot: 0
112
+ metrics:
113
+ - type: mc2
114
+ value: 54.93
115
+ source:
116
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mlabonne/NeuralHermes-2.5-Mistral-7B
117
+ name: Open LLM Leaderboard
118
+ - task:
119
+ type: text-generation
120
+ name: Text Generation
121
+ dataset:
122
+ name: Winogrande (5-shot)
123
+ type: winogrande
124
+ config: winogrande_xl
125
+ split: validation
126
+ args:
127
+ num_few_shot: 5
128
+ metrics:
129
+ - type: acc
130
+ value: 78.3
131
+ name: accuracy
132
+ source:
133
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mlabonne/NeuralHermes-2.5-Mistral-7B
134
+ name: Open LLM Leaderboard
135
+ - task:
136
+ type: text-generation
137
+ name: Text Generation
138
+ dataset:
139
+ name: GSM8k (5-shot)
140
+ type: gsm8k
141
+ config: main
142
+ split: test
143
+ args:
144
+ num_few_shot: 5
145
+ metrics:
146
+ - type: acc
147
+ value: 61.33
148
+ name: accuracy
149
+ source:
150
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mlabonne/NeuralHermes-2.5-Mistral-7B
151
+ name: Open LLM Leaderboard
152
+ ---
153
+
154
+ <center><img src="https://i.imgur.com/qIhaFNM.png"></center>
155
+
156
+ # NeuralHermes 2.5 - Mistral 7B
157
+
158
+ NeuralHermes is based on the [teknium/OpenHermes-2.5-Mistral-7B](https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B) model that has been further fine-tuned with Direct Preference Optimization (DPO) using the [mlabonne/chatml_dpo_pairs](https://huggingface.co/datasets/mlabonne/chatml_dpo_pairs) dataset. It surpasses the original model on most benchmarks (see results).
159
+
160
+ It is directly inspired by the RLHF process described by [Intel/neural-chat-7b-v3-1](https://huggingface.co/Intel/neural-chat-7b-v3-1)'s authors to improve performance. I used the same dataset and reformatted it to apply the ChatML template.
161
+
162
+ The code to train this model is available on [Google Colab](https://colab.research.google.com/drive/15iFBr1xWgztXvhrj5I9fBv20c7CFOPBE?usp=sharing) and [GitHub](https://github.com/mlabonne/llm-course/tree/main). It required an A100 GPU for about an hour.
163
+
164
+ ## Quantized models
165
+
166
+ * **GGUF**: https://huggingface.co/TheBloke/NeuralHermes-2.5-Mistral-7B-GGUF
167
+ * **AWQ**: https://huggingface.co/TheBloke/NeuralHermes-2.5-Mistral-7B-AWQ
168
+ * **GPTQ**: https://huggingface.co/TheBloke/NeuralHermes-2.5-Mistral-7B-GPTQ
169
+ * **EXL2**:
170
+ * 3.0bpw: https://huggingface.co/LoneStriker/NeuralHermes-2.5-Mistral-7B-3.0bpw-h6-exl2
171
+ * 4.0bpw: https://huggingface.co/LoneStriker/NeuralHermes-2.5-Mistral-7B-4.0bpw-h6-exl2
172
+ * 5.0bpw: https://huggingface.co/LoneStriker/NeuralHermes-2.5-Mistral-7B-5.0bpw-h6-exl2
173
+ * 6.0bpw: https://huggingface.co/LoneStriker/NeuralHermes-2.5-Mistral-7B-6.0bpw-h6-exl2
174
+ * 8.0bpw: https://huggingface.co/LoneStriker/NeuralHermes-2.5-Mistral-7B-8.0bpw-h8-exl2
175
+
176
+ ## Results
177
+
178
+ **Update:** NeuralHermes-2.5 became the best Hermes-based model on the Open LLM leaderboard and one of the very best 7b models. 🎉
179
+
180
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/yWe6VBFxkHiuOlDVBXtGo.png)
181
+
182
+ Teknium (author of OpenHermes-2.5-Mistral-7B) benchmarked the model ([see his tweet](https://twitter.com/Teknium1/status/1729955709377503660)).
183
+
184
+ Results are improved on every benchmark: **AGIEval** (from 43.07% to 43.62%), **GPT4All** (from 73.12% to 73.25%), and **TruthfulQA**.
185
+
186
+ ### AGIEval
187
+ ![](https://i.imgur.com/7an3B1f.png)
188
+
189
+ ### GPT4All
190
+ ![](https://i.imgur.com/TLxZFi9.png)
191
+
192
+ ### TruthfulQA
193
+ ![](https://i.imgur.com/V380MqD.png)
194
+
195
+ You can check the Weights & Biases project [here](https://wandb.ai/mlabonne/NeuralHermes-2-5-Mistral-7B/overview?workspace=user-mlabonne).
196
+
197
+ ## Usage
198
+
199
+ You can run this model using [LM Studio](https://lmstudio.ai/) or any other frontend.
200
+
201
+ You can also run this model using the following code:
202
+
203
+ ```python
204
+ import transformers
205
+ from transformers import AutoTokenizer
206
+
207
+ # Format prompt
208
+ message = [
209
+ {"role": "system", "content": "You are a helpful assistant chatbot."},
210
+ {"role": "user", "content": "What is a Large Language Model?"}
211
+ ]
212
+ tokenizer = AutoTokenizer.from_pretrained(new_model)
213
+ prompt = tokenizer.apply_chat_template(message, add_generation_prompt=True, tokenize=False)
214
+
215
+ # Create pipeline
216
+ pipeline = transformers.pipeline(
217
+ "text-generation",
218
+ model=new_model,
219
+ tokenizer=tokenizer
220
+ )
221
+
222
+ # Generate text
223
+ sequences = pipeline(
224
+ prompt,
225
+ do_sample=True,
226
+ temperature=0.7,
227
+ top_p=0.9,
228
+ num_return_sequences=1,
229
+ max_length=200,
230
+ )
231
+ print(sequences[0]['generated_text'])
232
+ ```
233
+
234
+ ## Training hyperparameters
235
+
236
+ **LoRA**:
237
+ * r=16
238
+ * lora_alpha=16
239
+ * lora_dropout=0.05
240
+ * bias="none"
241
+ * task_type="CAUSAL_LM"
242
+ * target_modules=['k_proj', 'gate_proj', 'v_proj', 'up_proj', 'q_proj', 'o_proj', 'down_proj']
243
+
244
+ **Training arguments**:
245
+ * per_device_train_batch_size=4
246
+ * gradient_accumulation_steps=4
247
+ * gradient_checkpointing=True
248
+ * learning_rate=5e-5
249
+ * lr_scheduler_type="cosine"
250
+ * max_steps=200
251
+ * optim="paged_adamw_32bit"
252
+ * warmup_steps=100
253
+
254
+ **DPOTrainer**:
255
+ * beta=0.1
256
+ * max_prompt_length=1024
257
+ * max_length=1536