ArianAskari
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,11 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: mit
|
3 |
language:
|
|
|
1 |
+
A variation of NeuralHermes 2.5 - Mistral 7B
|
2 |
+
|
3 |
+
This is a variation of NeuralHermes which is based on the teknium/OpenHermes-2.5-Mistral-7B model that has been further fine-tuned with Direct Preference Optimization (DPO) using the mlabonne/chatml_dpo_pairs dataset. It surpasses the original model on most benchmarks (see results).
|
4 |
+
|
5 |
+
It is directly inspired by the RLHF process described by Intel/neural-chat-7b-v3-1's authors to improve performance. I used the same dataset and reformatted it to apply the ChatML template.
|
6 |
+
|
7 |
+
The code to train this model is available on Google Colab and GitHub. It required an A100 GPU for about an hour.
|
8 |
+
|
9 |
---
|
10 |
license: mit
|
11 |
language:
|