Update README.md
Browse files
README.md
CHANGED
@@ -21,13 +21,13 @@ datasets:
|
|
21 |
|
22 |
# NeuralHermes 2.5 - Mistral 7B
|
23 |
|
24 |
-
NeuralHermes is an [OpenHermes-2.5-Mistral-7B](https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B) model that has been further fine-tuned with Direct Preference Optimization (DPO) using the [mlabonne/chatml_dpo_pairs](https://huggingface.co/datasets/mlabonne/chatml_dpo_pairs) dataset. It surpasses the original model on several benchmarks (see results)
|
25 |
|
26 |
-
It is directly inspired by the RLHF process described by [neural-chat-7b-v3-1](https://huggingface.co/Intel/neural-chat-7b-v3-1)'s authors to improve performance. I used the same dataset and reformatted it to apply the ChatML template.
|
27 |
|
28 |
The code to train this model is available on [Google Colab](https://colab.research.google.com/drive/15iFBr1xWgztXvhrj5I9fBv20c7CFOPBE?usp=sharing) and [GitHub](https://github.com/mlabonne/llm-course/tree/main). It required an A100 GPU for about an hour.
|
29 |
|
30 |
-
GGUF
|
31 |
|
32 |
## Results
|
33 |
|
@@ -38,7 +38,7 @@ Results are improved on every benchmark: **AGIEval** (from 43.07% to 43.62%), **
|
|
38 |
### AGIEval
|
39 |

|
40 |
|
41 |
-
### GPT4All
|
42 |

|
43 |
|
44 |
### TruthfulQA
|
@@ -87,24 +87,24 @@ print(sequences[0]['generated_text'])
|
|
87 |
## Training hyperparameters
|
88 |
|
89 |
**LoRA**:
|
90 |
-
* r=16
|
91 |
-
* lora_alpha=16
|
92 |
-
* lora_dropout=0.05
|
93 |
-
* bias="none"
|
94 |
-
* task_type="CAUSAL_LM"
|
95 |
* target_modules=['k_proj', 'gate_proj', 'v_proj', 'up_proj', 'q_proj', 'o_proj', 'down_proj']
|
96 |
|
97 |
**Training arguments**:
|
98 |
-
* per_device_train_batch_size=4
|
99 |
-
* gradient_accumulation_steps=4
|
100 |
-
* gradient_checkpointing=True
|
101 |
-
* learning_rate=5e-5
|
102 |
-
* lr_scheduler_type="cosine"
|
103 |
-
* max_steps=200
|
104 |
-
* optim="paged_adamw_32bit"
|
105 |
-
* warmup_steps=100
|
106 |
|
107 |
**DPOTrainer**:
|
108 |
-
* beta=0.1
|
109 |
-
* max_prompt_length=1024
|
110 |
-
* max_length=1536
|
|
|
21 |
|
22 |
# NeuralHermes 2.5 - Mistral 7B
|
23 |
|
24 |
+
NeuralHermes is an [teknium/OpenHermes-2.5-Mistral-7B](https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B) model that has been further fine-tuned with Direct Preference Optimization (DPO) using the [mlabonne/chatml_dpo_pairs](https://huggingface.co/datasets/mlabonne/chatml_dpo_pairs) dataset. It surpasses the original model on several benchmarks (see results)
|
25 |
|
26 |
+
It is directly inspired by the RLHF process described by [Intel/neural-chat-7b-v3-1](https://huggingface.co/Intel/neural-chat-7b-v3-1)'s authors to improve performance. I used the same dataset and reformatted it to apply the ChatML template.
|
27 |
|
28 |
The code to train this model is available on [Google Colab](https://colab.research.google.com/drive/15iFBr1xWgztXvhrj5I9fBv20c7CFOPBE?usp=sharing) and [GitHub](https://github.com/mlabonne/llm-course/tree/main). It required an A100 GPU for about an hour.
|
29 |
|
30 |
+
🤗 GGUF: [mlabonne/NeuralHermes-2.5-Mistral-7B-GGUF](https://huggingface.co/mlabonne/NeuralHermes-2.5-Mistral-7B-GGUF).
|
31 |
|
32 |
## Results
|
33 |
|
|
|
38 |
### AGIEval
|
39 |

|
40 |
|
41 |
+
### GPT4All
|
42 |

|
43 |
|
44 |
### TruthfulQA
|
|
|
87 |
## Training hyperparameters
|
88 |
|
89 |
**LoRA**:
|
90 |
+
* r=16
|
91 |
+
* lora_alpha=16
|
92 |
+
* lora_dropout=0.05
|
93 |
+
* bias="none"
|
94 |
+
* task_type="CAUSAL_LM"
|
95 |
* target_modules=['k_proj', 'gate_proj', 'v_proj', 'up_proj', 'q_proj', 'o_proj', 'down_proj']
|
96 |
|
97 |
**Training arguments**:
|
98 |
+
* per_device_train_batch_size=4
|
99 |
+
* gradient_accumulation_steps=4
|
100 |
+
* gradient_checkpointing=True
|
101 |
+
* learning_rate=5e-5
|
102 |
+
* lr_scheduler_type="cosine"
|
103 |
+
* max_steps=200
|
104 |
+
* optim="paged_adamw_32bit"
|
105 |
+
* warmup_steps=100
|
106 |
|
107 |
**DPOTrainer**:
|
108 |
+
* beta=0.1
|
109 |
+
* max_prompt_length=1024
|
110 |
+
* max_length=1536
|